Processing¶
audio_processing
¶
This module defines the AudioTools
class for performing media operations.
Attributes:
Name | Type | Description |
---|---|---|
LOGGER |
Logger
|
Logger object of module. |
AudioTools
¶
Perform operations on media using system installed FFMPEG.
Attributes:
Name | Type | Description |
---|---|---|
log_dir |
Path
|
Application's log directory |
ffmpeg |
str
|
System FFmpeg Path. |
ffprobe |
str
|
System FFprobe Path. |
extract_audio(input_path, output_path)
¶
Extract a WAV file from video container. If input file is already an audio file, return the unchanged input file path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_path
|
str
|
Path to the file. |
required |
output_path
|
str
|
Output path |
required |
Returns:
Name | Type | Description |
---|---|---|
Path |
str
|
The output path of the converted file. |
filter_audio(input_path, output_wav, highpass=300, lowpass=3400)
¶
Extracts audio from video or uses an existing audio file, applies a band-pass (highpass→lowpass) and loudness normalization, then writes out a 16 kHz mono WAV ready for Whisper.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_path
|
str
|
Path to video (any container) or audio file. |
required |
output_wav
|
str
|
Path where the cleaned WAV will be saved. |
required |
highpass
|
int
|
Cut everything below this frequency (Hz). |
300
|
lowpass
|
int
|
Cut everything above this frequency (Hz). |
3400
|
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The output_wav path, for chaining into Whisper. |
run_command(command, capture_output=True, check=False, cwd=None, hide_and_log=False)
¶
Wrapper for subprocess.run to handle errors and results.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
command
|
list
|
The CMD list of the command to be executed. |
required |
capture_output
|
bool
|
Wether to redirect stdout and
stderr to |
True
|
check
|
bool
|
Wether to raise exception on subprocess error. Defaults to False |
False
|
cwd
|
str
|
The directory in which the command is run. Defaults to None |
None
|
hide_and_log
|
bool
|
If True redirect stdout and stderr to subprocess.DEVNULL and subprocess.PIPE respectively. |
False
|
Returns:
Type | Description |
---|---|
Optional[CompletedProcess]
|
Optional[subprocess.CompletedProcess]: The result of subprocess.run or None. |
to_mp4(input_path, output_path=None, resolution='1280x720', target_bitrate='2500k', use_nvenc=False)
¶
Convert any video to MP4 (H.264 + AAC).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_path
|
str
|
Source file (any container/codec FFmpeg supports). |
required |
output_path
|
str
|
Destination .mp4 (defaults to same stem) |
None
|
resolution
|
str
|
Target canvas WxH. Aspect is preserved. |
'1280x720'
|
target_bitrate
|
str
|
Video bitrate (e.g. '2500k'). |
'2500k'
|
use_nvenc
|
bool
|
True → try NVIDIA NVENC; False → libx264 CPU. |
False
|
Returns:
Name | Type | Description |
---|---|---|
Path |
Union[Path, None]
|
Path of the MP4, or None on failure. |
to_wav(input_path, output_path)
¶
to_webm(input_path, output_path=None, resolution='1280x720', target_bitrate='2500k', use_nvenc=False)
¶
Convert any video to WebM (VP9 + Opus).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_path
|
str
|
Source file (any container/codec FFmpeg supports). |
required |
output_path
|
str
|
Destination (defaults to same stem) |
None
|
resolution
|
str
|
Target canvas WxH. Aspect is preserved. |
'1280x720'
|
target_bitrate
|
str
|
Video bitrate (e.g. '2500k'). |
'2500k'
|
use_nvenc
|
bool
|
True → try NVIDIA NVENC; False → libvpx-vp9 CPU. |
False
|
Returns:
Name | Type | Description |
---|---|---|
Path |
Union[Path, None]
|
Path of the WebM, or None on failure. |
gpt_wrapper
¶
This module defines the GptModel
class for sending requests to OpenAI's API.
Attributes:
Name | Type | Description |
---|---|---|
LOGGER |
Logger
|
Module's logger object |
GptModel
¶
Send requests to OpenAI's API and manage sessions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
version
|
str
|
Which GPT model version to use. |
required |
system_msg
|
str
|
The System Message to use for the model. |
'default'
|
from_dotenv
|
bool
|
Wether to get the OpenAI API key through a .env file. |
True
|
ApiKey
|
str
|
Option to pass the OpenAI API key directly. |
None
|
max_context
|
int
|
The session's context limit in tokens. |
100000
|
Attributes:
Name | Type | Description |
---|---|---|
info |
dict
|
Dictionary containing all object information. |
Processor
¶
This module defines the Processor
class for managing modal interaction
and conditional imports for the CPU version.
Attributes:
Name | Type | Description |
---|---|---|
LOGGER |
Logger
|
Module's Logger object. |
Processor
¶
Manage conditional imports, expose pre-configured instances of various
instances from the processing
module and manage Modal interaction.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gpt_version
|
str
|
Which GPT version to use for OpenAI integration. |
'gpt-4.1-mini'
|
dotenv_path
|
Union[str, Path, None]
|
Optional Path to look for .env file |
None
|
whisper_kwargs
|
dict
|
Optional additional keyword arguments for
|
{}
|
OPENAI_API_KEY
|
str
|
Optionally pass OpenAI API key directly |
None
|
MODAL_TOKEN_ID
|
str
|
Optionally pass Modal token id directly |
None
|
MODAL_TOKEN_SECRET
|
str
|
Optionally pass Modal token secret directly |
None
|
Attributes:
Name | Type | Description |
---|---|---|
sentence_breakdown_service |
SentenceBreakdownService
|
Instance of
|
fwhisper |
FWhisperWrapper
|
Instance of |
audio_tools |
AudioTools
|
Instance of |
modal_convert_to_mp4(video_fp, outpath, to_mp4_kwargs={})
async
¶
Call Modal function to convert a video to MP4 format.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
video_fp
|
Union[str, Path]
|
Path to the video to convert. |
required |
outpath
|
Union[str, Path]
|
Where to save the received video stream. |
required |
to_mp4_kwargs
|
dict
|
Additional arguments for
|
{}
|
Returns:
Name | Type | Description |
---|---|---|
Path |
Path
|
The path to converted video from |
modal_transcribe_to_srt(media_fp, transcribe_kwargs={}, fix_with_chat_gpt=True)
async
¶
Call Modal function to transcribe video to SRT
Parameters:
Name | Type | Description | Default |
---|---|---|---|
media_fp
|
Union[str, Path]
|
Path to the video to transcribe. |
required |
transcribe_kwargs
|
dict
|
Additional arguments for
|
{}
|
fix_with_chat_gpt
|
bool
|
If |
True
|
Returns:
Name | Type | Description |
---|---|---|
str |
Union[str, None]
|
Formatted SRT transcription string. |
modal_transcribe_to_str(audio_fp, transcribe_kwargs={})
async
¶
Call Modal function to transcribe audio to string.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
audio_fp
|
Union[str, Path]
|
Path to the audio to transcribe. |
required |
transcribe_kwargs
|
dict
|
Additional arguments for
|
{}
|
Returns:
Name | Type | Description |
---|---|---|
str |
Union[str, None]
|
String transcription. |
text_processing
¶
This module defines the SentenceBreakdownService
class for analysing Japanese
senteces using fugashi and kotobase.
Attributes:
Name | Type | Description |
---|---|---|
LOGGER |
Logger
|
Module's Logger object. |
GptExplainService
¶
Wrapper for the GptModel
class with default sys_msg and utility functions
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gpt_model_kwargs
|
dict
|
Additional keyword arguments for |
{}
|
version
|
str
|
OpenAI GPT version to use. |
'gpt-4.1-mini'
|
Attributes:
Name | Type | Description |
---|---|---|
SYSTEM_MSG |
str
|
Default system message used. |
explain(sentence, focus)
¶
Request an explanation from GPT using default system message and prompt.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sentence
|
str
|
The full Japanese sentence. |
required |
focus
|
str
|
The target word to explain in context. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
GPT-generated explanation with structure, particles, and nuance. |
explain_custom(sentence, focus, sysMsg, prompt, version)
¶
Request an explanation from GPT using custom system message and prompt.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sentence
|
str
|
The full Japanese sentence. |
required |
focus
|
str
|
The target word to explain in context. |
required |
sysMsg
|
str
|
GPT's system message |
required |
prompt
|
str
|
GPT's string prompt containing formatters
|
required |
version
|
str
|
GPT model version to use |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
Optional[str]
|
GPT-generated response |
explain_sentence(sentence)
¶
explain_sentence_custom(sentence, sysMsg, prompt, version)
¶
Request an explanation from GPT using custom system message and prompt without any focus words.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sentence
|
str
|
The full Japanese sentence. |
required |
sysMsg
|
str
|
ChatGPT's system message |
required |
prompt
|
str
|
GPT's string prompt containing formatters
|
required |
version
|
str
|
GPT model version to use |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
Optional[str]
|
GPT-generated response |
SentenceBreakdownService
¶
Provides utlities for analyzing Japanese sentences.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gpt_version
|
str
|
OpenAI GPT version to use. |
'gpt-4.1-mini'
|
gpt_kwargs
|
dict
|
Additional keyword arguments for |
{}
|
explain(sentence, focus=None)
¶
Perform a complete Japanese sentence breakdown.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sentence
|
str
|
The full Japanese sentence to analyze. |
required |
focus
|
str
|
The key word to generate deeper explanation for. |
None
|
Returns:
Name | Type | Description |
---|---|---|
BreakdownResponse |
BreakdownResponse
|
Includes tokens, word info, and GPT breakdown |
explain_custom(sentence, sysMsg, prompt, version, focus=None)
¶
Perform a complete sentence breakdown using custom sys_msg and prompt
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sentence
|
str
|
The full Japanese sentence to analyze. |
required |
focus
|
str
|
The key word to generate deeper explanation for. |
None
|
sysMsg
|
str
|
ChatGPT's system message |
required |
prompt
|
str
|
GPT's string prompt containing formatters
|
required |
version
|
str
|
GPT model version to use |
required |
Returns:
Name | Type | Description |
---|---|---|
BreakdownResponse |
BreakdownResponse
|
Includes tokens, word info, and GPT breakdown |
single_word_lookup(word)
¶
Perform a kotobase search for a specific word and return results formatted into a pydantic model
Parameters:
Name | Type | Description | Default |
---|---|---|---|
word
|
str
|
The word to lookup |
required |
Returns:
Name | Type | Description |
---|---|---|
DictLookup |
DictLookup
|
Pydantic model containing kotobase info |
tokenize(sentence)
¶
wildcard_lookup(pattern)
¶
Perform a kotobase search for a wildcard pattern and return results formatted into a pydantic model
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pattern
|
str
|
The wildcard pattern |
required |
Returns:
Name | Type | Description |
---|---|---|
DictWildcardLookup |
DictWildcardLookup
|
Pydantic model containing kotobase info |
word_lookup(sentence)
¶
whisper_wrapper
¶
This module defines the FWhisperWrapper
class for running the FasterWhisper
model.
Attributes:
Name | Type | Description |
---|---|---|
LOGGER |
Logger
|
Module's Logging object. |
DEFAULT_SYS_MSG |
str
|
Default system message to use for OpenAI post-processing. |
FWhisperWrapper
¶
Wrapper for FasterWhisper's WhisperModel including post-processing logic.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name
|
str
|
Whisper model name |
'large-v3'
|
lang
|
str
|
Whisper model language. |
'ja'
|
compute_type
|
str
|
Which compute type to use. |
'float16'
|
device
|
str
|
Which device to use. |
'cuda'
|
gpt_sys_msg
|
str
|
Custom GPT system message when using OpenAI post-processing. |
None
|
gpt_version
|
str
|
Which GPT version to use for OpenAI post-processing. |
'gpt-4.1'
|
gpt_fix_srt(source, gpt_model_kwargs={})
¶
transcribe(audio_path, language='ja', generator_only=False, add_kargs={})
¶
Transcribe audio with FasterWhisper.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
audio_path
|
str
|
Path to the file to be transcribed. |
required |
language
|
str
|
Which language to use. |
'ja'
|
generator_only
|
bool
|
If true, don't run transcription and return generator instead. |
False
|
add_kwargs
|
dict
|
Addition keyword arguments for FasterWhisper model. |
required |
Returns:
Name | Type | Description |
---|---|---|
dict |
Union[Dict, None]
|
The segment objects returned by FasterWhisper (contains .start, .end, .text and optionally .words) |
transcribe_to_srt(audio_path, output_path, fix_with_chat_gpt=True, string_result=False, gpt_model_kwargs={}, transcribe_kwargs={})
¶
Transcribe audio and save as an SRT file with sentence-level cues.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
audio_path
|
str
|
Path to the file to be transcribed. |
required |
output_path
|
str
|
Path to save the .srt file. |
required |
fix_with_chat_gpt
|
bool
|
If True, post-process with OpenAI API. |
True
|
string_result
|
bool
|
If True, don't save any files and return an SRT string instead. |
False
|
gpt_model_kwargs
|
dict
|
Additional keyword arguments for GptModel |
{}
|
transcribe_kwargs
|
dict
|
Addition keyword arguments for the transcribe function. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
str |
Union[str, None]
|
Either the file output path or the SRT string if string_result is True |