Skip to content

Transcription

transcription

Defines Modal GPU jobs for Whisper transcription

Transcription-Only
  • Jobs return raw transcription

  • LLM post-processing (SRT-Fixing) is applied by the Processor afterwards through the provider-agnostic LLM layer, so the same path works for both local and Modal transcription

transcribe_job(vol_fp, vol_id, output_format='srt', *, w_model_args=None, w_transcribe_args=None)

Transcribe media on a Modal GPU and return raw SRT content

output_format
  • When output_format="srt", sentence-level SRT content is composed from transcription segments, returning a string ready to be saved as a .srt file

  • When output_format="joined", transcription segment texts are joined with the Japanese full stop into a single string without any timing information

File Transfer
  • The input media is read out of the per-job ephemeral volume into a container-local temp directory before being handed to Whisper

  • That directory is removed once the job finishes

Parameters:

Name Type Description Default
vol_fp str

Path of the input media inside the per-job ephemeral volume

required
vol_id str

ID of the per-job ephemeral volume

required
output_format Literal['srt', 'joined']

srt for sentence-level SRT content, joined for a single joined string. Defaults to srt

'srt'
w_model_args dict | None

Additional arguments for WhisperModel. Overrides the ones set in mirumoji.server.processing.whisper.DEFAULT_MODEL_OPTS

None
w_transcribe_args dict | None

Additional arguments for WhisperModel.transcribe. Overrides the ones set in mirumoji.server.processing.whisper.DEFAULT_TRANSCRIBE_OPTS

None

Returns:

Type Description
str

The raw transcription as SRT content

Raises:

Type Description
ModalVolumeError

If the input can't be read from the volume

WhisperUnavailableError

If the model can't be loaded

TranscriptionError

If transcription fails