Skip to content

Processing

audio_processing

This module defines the AudioTools class for performing media operations.

Attributes:

Name Type Description
LOGGER Logger

Logger object of module.

AudioTools

Perform operations on media using system installed FFMPEG.

Attributes:

Name Type Description
log_dir Path

Application's log directory

ffmpeg str

System FFmpeg Path.

ffprobe str

System FFprobe Path.

extract_audio(input_path, output_path)

Extract a WAV file from video container. If input file is already an audio file, return the unchanged input file path.

Parameters:

Name Type Description Default
input_path str

Path to the file.

required
output_path str

Output path

required

Returns:

Name Type Description
Path str

The output path of the converted file.

filter_audio(input_path, output_wav, highpass=300, lowpass=3400)

Extracts audio from video or uses an existing audio file, applies a band-pass (highpass→lowpass) and loudness normalization, then writes out a 16 kHz mono WAV ready for Whisper.

Parameters:

Name Type Description Default
input_path str

Path to video (any container) or audio file.

required
output_wav str

Path where the cleaned WAV will be saved.

required
highpass int

Cut everything below this frequency (Hz).

300
lowpass int

Cut everything above this frequency (Hz).

3400

Returns:

Name Type Description
str str

The output_wav path, for chaining into Whisper.

run_command(command, capture_output=True, check=False, cwd=None, hide_and_log=False)

Wrapper for subprocess.run to handle errors and results.

Parameters:

Name Type Description Default
command list

The CMD list of the command to be executed.

required
capture_output bool

Wether to redirect stdout and stderr to subprocess.PIPE. Defaults to True

True
check bool

Wether to raise exception on subprocess error. Defaults to False

False
cwd str

The directory in which the command is run. Defaults to None

None
hide_and_log bool

If True redirect stdout and stderr to subprocess.DEVNULL and subprocess.PIPE respectively.

False

Returns:

Type Description
Optional[CompletedProcess]

Optional[subprocess.CompletedProcess]: The result of subprocess.run or None.

to_mp4(input_path, output_path=None, resolution='1280x720', target_bitrate='2500k', use_nvenc=False)

Convert any video to MP4 (H.264 + AAC).

Parameters:

Name Type Description Default
input_path str

Source file (any container/codec FFmpeg supports).

required
output_path str

Destination .mp4 (defaults to same stem)

None
resolution str

Target canvas WxH. Aspect is preserved.

'1280x720'
target_bitrate str

Video bitrate (e.g. '2500k').

'2500k'
use_nvenc bool

True → try NVIDIA NVENC; False → libx264 CPU.

False

Returns:

Name Type Description
Path Union[Path, None]

Path of the MP4, or None on failure.

to_wav(input_path, output_path)

Convert file to .wav format.

Parameters:

Name Type Description Default
input_path str

Path to the file.

required
output_path str

output path

required

Returns:

Name Type Description
Path Path

The output path of the converted file.

to_webm(input_path, output_path=None, resolution='1280x720', target_bitrate='2500k', use_nvenc=False)

Convert any video to WebM (VP9 + Opus).

Parameters:

Name Type Description Default
input_path str

Source file (any container/codec FFmpeg supports).

required
output_path str

Destination (defaults to same stem)

None
resolution str

Target canvas WxH. Aspect is preserved.

'1280x720'
target_bitrate str

Video bitrate (e.g. '2500k').

'2500k'
use_nvenc bool

True → try NVIDIA NVENC; False → libvpx-vp9 CPU.

False

Returns:

Name Type Description
Path Union[Path, None]

Path of the WebM, or None on failure.

gpt_wrapper

This module defines the GptModel class for sending requests to OpenAI's API.

Attributes:

Name Type Description
LOGGER Logger

Module's logger object

GptModel

Send requests to OpenAI's API and manage sessions.

Parameters:

Name Type Description Default
version str

Which GPT model version to use.

required
system_msg str

The System Message to use for the model.

'default'
from_dotenv bool

Wether to get the OpenAI API key through a .env file.

True
ApiKey str

Option to pass the OpenAI API key directly.

None
max_context int

The session's context limit in tokens.

100000

Attributes:

Name Type Description
info dict

Dictionary containing all object information.

load_from_json(info) staticmethod

Load an object from a serialized JSON string

Parameters:

Name Type Description Default
info str

JSON string.

required

Returns:

Name Type Description
GptModel GptModel

The loaded GptModel object

new_session()

Clear the current session.

request(prompt)

Make a new session request.

Parameters:

Name Type Description Default
prompt str

The prompt for the request.

required

Returns:

Name Type Description
dict Dict

Dictionary containg "prompt" used and the formatted "response"

serialize()

Serialize object information into JSON string

Returns:

Name Type Description
str str

JSON string of serialized object.

stream_request(prompt)

Send a streaming chat request; yield each content chunk as it arrives.

Parameters:

Name Type Description Default
prompt str

Prompt to use.

required

Yields:

Name Type Description
str Any

The string chunks from the request response.

Processor

This module defines the Processor class for managing modal interaction and conditional imports for the CPU version.

Attributes:

Name Type Description
LOGGER Logger

Module's Logger object.

Processor

Manage conditional imports, expose pre-configured instances of various instances from the processing module and manage Modal interaction.

Parameters:

Name Type Description Default
gpt_version str

Which GPT version to use for OpenAI integration.

'gpt-4.1-mini'
dotenv_path Union[str, Path, None]

Optional Path to look for .env file

None
whisper_kwargs dict

Optional additional keyword arguments for FWhisperWrapper

{}
OPENAI_API_KEY str

Optionally pass OpenAI API key directly

None
MODAL_TOKEN_ID str

Optionally pass Modal token id directly

None
MODAL_TOKEN_SECRET str

Optionally pass Modal token secret directly

None

Attributes:

Name Type Description
sentence_breakdown_service SentenceBreakdownService

Instance of SentenceBreakdownService

fwhisper FWhisperWrapper

Instance of FWhisperWrapper

audio_tools AudioTools

Instance of AudioTools

modal_convert_to_mp4(video_fp, outpath, to_mp4_kwargs={}) async

Call Modal function to convert a video to MP4 format.

Parameters:

Name Type Description Default
video_fp Union[str, Path]

Path to the video to convert.

required
outpath Union[str, Path]

Where to save the received video stream.

required
to_mp4_kwargs dict

Additional arguments for AudioTools.to_mp4.

{}

Returns:

Name Type Description
Path Path

The path to converted video from outpath.

modal_transcribe_to_srt(media_fp, transcribe_kwargs={}, fix_with_chat_gpt=True) async

Call Modal function to transcribe video to SRT

Parameters:

Name Type Description Default
media_fp Union[str, Path]

Path to the video to transcribe.

required
transcribe_kwargs dict

Additional arguments for FWhisperWrapper.transcribe

{}
fix_with_chat_gpt bool

If True request ChatGPT to fix transcription. Defaults to True

True

Returns:

Name Type Description
str Union[str, None]

Formatted SRT transcription string.

modal_transcribe_to_str(audio_fp, transcribe_kwargs={}) async

Call Modal function to transcribe audio to string.

Parameters:

Name Type Description Default
audio_fp Union[str, Path]

Path to the audio to transcribe.

required
transcribe_kwargs dict

Additional arguments for FWhisperWrapper.transcribe

{}

Returns:

Name Type Description
str Union[str, None]

String transcription.

text_processing

This module defines the SentenceBreakdownService class for analysing Japanese senteces using fugashi and kotobase.

Attributes:

Name Type Description
LOGGER Logger

Module's Logger object.

GptExplainService

Wrapper for the GptModel class with default sys_msg and utility functions

Parameters:

Name Type Description Default
gpt_model_kwargs dict

Additional keyword arguments for GptModel

{}
version str

OpenAI GPT version to use.

'gpt-4.1-mini'

Attributes:

Name Type Description
SYSTEM_MSG str

Default system message used.

explain(sentence, focus)

Request an explanation from GPT using default system message and prompt.

Parameters:

Name Type Description Default
sentence str

The full Japanese sentence.

required
focus str

The target word to explain in context.

required

Returns:

Name Type Description
str str

GPT-generated explanation with structure, particles, and nuance.

explain_custom(sentence, focus, sysMsg, prompt, version)

Request an explanation from GPT using custom system message and prompt.

Parameters:

Name Type Description Default
sentence str

The full Japanese sentence.

required
focus str

The target word to explain in context.

required
sysMsg str

GPT's system message

required
prompt str

GPT's string prompt containing formatters {0} = sentence and {1} = focus

required
version str

GPT model version to use

required

Returns:

Name Type Description
str Optional[str]

GPT-generated response

explain_sentence(sentence)

Request from GPT for a full sentence without requiring a focus word.

Parameters:

Name Type Description Default
sentence str

A potentially long or informal Japanese sentence.

required

Returns:

Name Type Description
str str

A full breakdown explanation from GPT, including structure and nuance.

explain_sentence_custom(sentence, sysMsg, prompt, version)

Request an explanation from GPT using custom system message and prompt without any focus words.

Parameters:

Name Type Description Default
sentence str

The full Japanese sentence.

required
sysMsg str

ChatGPT's system message

required
prompt str

GPT's string prompt containing formatters {0} = sentence

required
version str

GPT model version to use

required

Returns:

Name Type Description
str Optional[str]

GPT-generated response

SentenceBreakdownService

Provides utlities for analyzing Japanese sentences.

Parameters:

Name Type Description Default
gpt_version str

OpenAI GPT version to use.

'gpt-4.1-mini'
gpt_kwargs dict

Additional keyword arguments for GptModel

{}

explain(sentence, focus=None)

Perform a complete Japanese sentence breakdown.

Parameters:

Name Type Description Default
sentence str

The full Japanese sentence to analyze.

required
focus str

The key word to generate deeper explanation for.

None

Returns:

Name Type Description
BreakdownResponse BreakdownResponse

Includes tokens, word info, and GPT breakdown

explain_custom(sentence, sysMsg, prompt, version, focus=None)

Perform a complete sentence breakdown using custom sys_msg and prompt

Parameters:

Name Type Description Default
sentence str

The full Japanese sentence to analyze.

required
focus str

The key word to generate deeper explanation for.

None
sysMsg str

ChatGPT's system message

required
prompt str

GPT's string prompt containing formatters {0} = sentence and {1} = focus

required
version str

GPT model version to use

required

Returns:

Name Type Description
BreakdownResponse BreakdownResponse

Includes tokens, word info, and GPT breakdown

single_word_lookup(word)

Perform a kotobase search for a specific word and return results formatted into a pydantic model

Parameters:

Name Type Description Default
word str

The word to lookup

required

Returns:

Name Type Description
DictLookup DictLookup

Pydantic model containing kotobase info

tokenize(sentence)

Tokenize a Japanese sentence using fugashi and extract token information.

Parameters:

Name Type Description Default
sentence str

Sentence to tokenize

required

Returns:

Name Type Description
list List[Dict]

List of dictionaries containing token information.

wildcard_lookup(pattern)

Perform a kotobase search for a wildcard pattern and return results formatted into a pydantic model

Parameters:

Name Type Description Default
pattern str

The wildcard pattern

required

Returns:

Name Type Description
DictWildcardLookup DictWildcardLookup

Pydantic model containing kotobase info

word_lookup(sentence)

Tokenize every word in a Japanese sentence, extract information and lookup every token with kotobase.

Parameters:

Name Type Description Default
sentence str

Japanese sentence to tokenize.

required

Returns:

Name Type Description
list List[Dict]

List of dictionaries containing token information.

whisper_wrapper

This module defines the FWhisperWrapper class for running the FasterWhisper model.

Attributes:

Name Type Description
LOGGER Logger

Module's Logging object.

DEFAULT_SYS_MSG str

Default system message to use for OpenAI post-processing.

FWhisperWrapper

Wrapper for FasterWhisper's WhisperModel including post-processing logic.

Parameters:

Name Type Description Default
model_name str

Whisper model name

'large-v3'
lang str

Whisper model language.

'ja'
compute_type str

Which compute type to use.

'float16'
device str

Which device to use.

'cuda'
gpt_sys_msg str

Custom GPT system message when using OpenAI post-processing.

None
gpt_version str

Which GPT version to use for OpenAI post-processing.

'gpt-4.1'

gpt_fix_srt(source, gpt_model_kwargs={})

Post-process transcription with OpenAI API.

Parameters:

Name Type Description Default
source str

Unaltered Transcription

required
gpt_model_kwargs dict

Additonal keyword arguments to pass to GptModel

{}

Returns:

Name Type Description
str str

The formatted response from the OpenAI model.

transcribe(audio_path, language='ja', generator_only=False, add_kargs={})

Transcribe audio with FasterWhisper.

Parameters:

Name Type Description Default
audio_path str

Path to the file to be transcribed.

required
language str

Which language to use.

'ja'
generator_only bool

If true, don't run transcription and return generator instead.

False
add_kwargs dict

Addition keyword arguments for FasterWhisper model.

required

Returns:

Name Type Description
dict Union[Dict, None]

The segment objects returned by FasterWhisper (contains .start, .end, .text and optionally .words)

transcribe_to_srt(audio_path, output_path, fix_with_chat_gpt=True, string_result=False, gpt_model_kwargs={}, transcribe_kwargs={})

Transcribe audio and save as an SRT file with sentence-level cues.

Parameters:

Name Type Description Default
audio_path str

Path to the file to be transcribed.

required
output_path str

Path to save the .srt file.

required
fix_with_chat_gpt bool

If True, post-process with OpenAI API.

True
string_result bool

If True, don't save any files and return an SRT string instead.

False
gpt_model_kwargs dict

Additional keyword arguments for GptModel

{}
transcribe_kwargs dict

Addition keyword arguments for the transcribe function.

{}

Returns:

Name Type Description
str Union[str, None]

Either the file output path or the SRT string if string_result is True

transcribe_to_str(audio_path, transcribe_kwargs={})

Transcribe to single raw string from joining segments.

Parameters:

Name Type Description Default
audio_path str

Path to the audio file.

required
transcribe_kwargs dict

Additional keyword arguments for the transcribe function.

{}

Returns:

Name Type Description
str str

string of joined segments.