Whisper
whisper
¶
Defines stateless helpers for local Whisper transcription with faster-whisper
Usage
-
The heavy
WhisperModelis owned by theProcessorand is loaded once during the first transcription request -
This module only exposes pure functions that operate on a model handle or its output
-
Post-processing concerns (LLM SRT-fixing, file writing) live in the
Processorand use other stateless helper from theprocessingmodule
Local Imports
-
faster-whisperis an optional dependency (whisper-localextra), so it's imported lazily insideload_model -
Deployments that offload to Modal don't need it installed
DEFAULT_MODEL_OPTS = {'model_size_or_path': 'large-v3', 'device': 'cuda', 'compute_type': 'float16'}
module-attribute
¶
Default keyword-arguments for faster_whisper.WhisperModel, loads the
large-v3 model on cuda, with float16 as the compute-type
DEFAULT_TRANSCRIBE_OPTS = {'language': 'ja', 'beam_size': 5, 'word_timestamps': False, 'vad_filter': False, 'no_speech_threshold': 0.3, 'log_prob_threshold': -1.0, 'condition_on_previous_text': False, 'compression_ratio_threshold': 2.0}
module-attribute
¶
Default keyword-arguments for faster_whisper.WhisperModel.transcribe, tuned
for long-form Japanese media
load_model(w_model_args=None)
¶
Loads a faster_whisper.WhisperModel object
Model Download
-
For the the
localtranscription backend, the first load pulls the weights from the Hugging Face Hub (Docker Image doesn't have the model baked in like the onemodalruns when using themodalbackend) -
huggingface_hubresumes a partial download, so transient network failures are retried with exponential backoff and each attempt continues rather than restarting -
Permanent failures are raised immediately
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
w_model_args
|
dict | None
|
Additional arguments for
|
None
|
Returns:
| Type | Description |
|---|---|
WhisperModel
|
A loaded |
Raises:
| Type | Description |
|---|---|
WhisperUnavailableError
|
If |
to_srt(segments)
¶
to_string(segments)
¶
transcribe(model, audio_path, *, w_transcribe_args=None)
¶
Transcribe an audio file into a list of segments
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model
|
WhisperModel
|
A loaded |
required |
audio_path
|
str | PathLike[str]
|
Path to the audio/video file |
required |
w_transcribe_args
|
dict | None
|
Additional arguments for
|
None
|
Returns:
| Type | Description |
|---|---|
tuple[list[Segment], TranscriptionInfo]
|
Tuple containg the list of segment objects (each with |
Raises:
| Type | Description |
|---|---|
TranscriptionError
|
If the file is missing or transcription fails |