Skip to content

Whisper

whisper

Defines stateless helpers for local Whisper transcription with faster-whisper

Usage
  • The heavy WhisperModel is owned by the Processor and is loaded once during the first transcription request

  • This module only exposes pure functions that operate on a model handle or its output

  • Post-processing concerns (LLM SRT-fixing, file writing) live in the Processor and use other stateless helper from the processing module

Local Imports
  • faster-whisper is an optional dependency (whisper-local extra), so it's imported lazily inside load_model

  • Deployments that offload to Modal don't need it installed

DEFAULT_MODEL_OPTS = {'model_size_or_path': 'large-v3', 'device': 'cuda', 'compute_type': 'float16'} module-attribute

Default keyword-arguments for faster_whisper.WhisperModel, loads the large-v3 model on cuda, with float16 as the compute-type

DEFAULT_TRANSCRIBE_OPTS = {'language': 'ja', 'beam_size': 5, 'word_timestamps': False, 'vad_filter': False, 'no_speech_threshold': 0.3, 'log_prob_threshold': -1.0, 'condition_on_previous_text': False, 'compression_ratio_threshold': 2.0} module-attribute

Default keyword-arguments for faster_whisper.WhisperModel.transcribe, tuned for long-form Japanese media

load_model(w_model_args=None)

Loads a faster_whisper.WhisperModel object

Model Download
  • For the the local transcription backend, the first load pulls the weights from the Hugging Face Hub (Docker Image doesn't have the model baked in like the one modal runs when using the modal backend)

  • huggingface_hub resumes a partial download, so transient network failures are retried with exponential backoff and each attempt continues rather than restarting

  • Permanent failures are raised immediately

Parameters:

Name Type Description Default
w_model_args dict | None

Additional arguments for WhisperModel. Overrides the ones set in DEFAULT_MODEL_OPTS

None

Returns:

Type Description
WhisperModel

A loaded WhisperModel object

Raises:

Type Description
WhisperUnavailableError

If faster-whisper is not installed or the model fails to load

to_srt(segments)

Composes sentence-level SRT content from transcription segments

Parameters:

Name Type Description Default
segments list

Segment objects from transcribe

required

Returns:

Type Description
str

The composed .srt content as a string

to_string(segments)

Joins transcription segments into a single string

Parameters:

Name Type Description Default
segments list

Segment objects from transcribe

required

Returns:

Type Description
str

The segment texts joined with the Japanese full stop

transcribe(model, audio_path, *, w_transcribe_args=None)

Transcribe an audio file into a list of segments

Parameters:

Name Type Description Default
model WhisperModel

A loaded WhisperModel object

required
audio_path str | PathLike[str]

Path to the audio/video file

required
w_transcribe_args dict | None

Additional arguments for WhisperModel.transcribe. Overrides the ones set in DEFAULT_TRANSCRIBE_OPTS

None

Returns:

Type Description
tuple[list[Segment], TranscriptionInfo]

Tuple containg the list of segment objects (each with .start, .end, .text) and the transcription info as returned by model.transcribe

Raises:

Type Description
TranscriptionError

If the file is missing or transcription fails