Whisper

`whisper` ¶

Defines stateless helpers for local Whisper transcription with faster-whisper

Usage

The heavy WhisperModel is owned by the Processor and is loaded once during the first transcription request
This module only exposes pure functions that operate on a model handle or its output
Post-processing concerns (LLM SRT-fixing, file writing) live in the Processor and use other stateless helper from the processing module

Local Imports

faster-whisper is an optional dependency (whisper-local extra), so it's imported lazily inside load_model
Deployments that offload to Modal don't need it installed

`DEFAULT_MODEL_OPTS = {'model_size_or_path': 'large-v3', 'device': 'cuda', 'compute_type': 'float16'}` `module-attribute` ¶

Default keyword-arguments for faster_whisper.WhisperModel, loads the large-v3 model on cuda, with float16 as the compute-type

`DEFAULT_TRANSCRIBE_OPTS = {'language': 'ja', 'beam_size': 5, 'word_timestamps': False, 'vad_filter': False, 'no_speech_threshold': 0.3, 'log_prob_threshold': -1.0, 'condition_on_previous_text': False, 'compression_ratio_threshold': 2.0}` `module-attribute` ¶

Default keyword-arguments for faster_whisper.WhisperModel.transcribe, tuned for long-form Japanese media

`load_model(w_model_args=None)` ¶

Loads a faster_whisper.WhisperModel object

Model Download

For the the local transcription backend, the first load pulls the weights from the Hugging Face Hub (Docker Image doesn't have the model baked in like the one modal runs when using the modal backend)
huggingface_hub resumes a partial download, so transient network failures are retried with exponential backoff and each attempt continues rather than restarting
Permanent failures are raised immediately

Parameters:

Name	Type	Description	Default
`w_model_args`	`dict \| None`	Additional arguments for `WhisperModel`. Overrides the ones set in `DEFAULT_MODEL_OPTS`	`None`

Returns:

Type	Description
`WhisperModel`	A loaded `WhisperModel` object

Raises:

Type	Description
`WhisperUnavailableError`	If `faster-whisper` is not installed or the model fails to load

`to_srt(segments)` ¶

Composes sentence-level SRT content from transcription segments

Parameters:

Name	Type	Description	Default
`segments`	`list`	Segment objects from `transcribe`	required

Returns:

Type	Description
`str`	The composed `.srt` content as a string

`to_string(segments)` ¶

Joins transcription segments into a single string

Parameters:

Name	Type	Description	Default
`segments`	`list`	Segment objects from `transcribe`	required

Returns:

Type	Description
`str`	The segment texts joined with the Japanese full stop

`transcribe(model, audio_path, *, w_transcribe_args=None)` ¶

Transcribe an audio file into a list of segments

Parameters:

Name	Type	Description	Default
`model`	`WhisperModel`	A loaded `WhisperModel` object	required
`audio_path`	`str \| PathLike[str]`	Path to the audio/video file	required
`w_transcribe_args`	`dict \| None`	Additional arguments for `WhisperModel.transcribe`. Overrides the ones set in `DEFAULT_TRANSCRIBE_OPTS`	`None`

Returns:

Type	Description
`tuple[list[Segment], TranscriptionInfo]`	Tuple containg the list of segment objects (each with `.start`, `.end`, `.text`) and the transcription info as returned by `model.transcribe`

Raises:

Type	Description
`TranscriptionError`	If the file is missing or transcription fails

Whisper

whisper ¶

DEFAULT_MODEL_OPTS = {'model_size_or_path': 'large-v3', 'device': 'cuda', 'compute_type': 'float16'} module-attribute ¶

DEFAULT_TRANSCRIBE_OPTS = {'language': 'ja', 'beam_size': 5, 'word_timestamps': False, 'vad_filter': False, 'no_speech_threshold': 0.3, 'log_prob_threshold': -1.0, 'condition_on_previous_text': False, 'compression_ratio_threshold': 2.0} module-attribute ¶

load_model(w_model_args=None) ¶

to_srt(segments) ¶

to_string(segments) ¶

transcribe(model, audio_path, *, w_transcribe_args=None) ¶

`whisper` ¶

`DEFAULT_MODEL_OPTS = {'model_size_or_path': 'large-v3', 'device': 'cuda', 'compute_type': 'float16'}` `module-attribute` ¶

`DEFAULT_TRANSCRIBE_OPTS = {'language': 'ja', 'beam_size': 5, 'word_timestamps': False, 'vad_filter': False, 'no_speech_threshold': 0.3, 'log_prob_threshold': -1.0, 'condition_on_previous_text': False, 'compression_ratio_threshold': 2.0}` `module-attribute` ¶

`load_model(w_model_args=None)` ¶

`to_srt(segments)` ¶

`to_string(segments)` ¶

`transcribe(model, audio_path, *, w_transcribe_args=None)` ¶