Transcription

`transcription` ¶

Defines Modal GPU jobs for Whisper transcription

Transcription-Only

Jobs return raw transcription
LLM post-processing (SRT-Fixing) is applied by the Processor afterwards through the provider-agnostic LLM layer, so the same path works for both local and Modal transcription

Transcribe media on a Modal GPU and return raw SRT content

output_format

When output_format="srt", sentence-level SRT content is composed from transcription segments, returning a string ready to be saved as a .srt file
When output_format="joined", transcription segment texts are joined with the Japanese full stop into a single string without any timing information

File Transfer

The input media is read out of the per-job ephemeral volume into a container-local temp directory before being handed to Whisper
That directory is removed once the job finishes

Parameters:

Name	Type	Description	Default
`vol_fp`	`str`	Path of the input media inside the per-job ephemeral volume	required
`vol_id`	`str`	ID of the per-job ephemeral volume	required
`output_format`	`Literal['srt', 'joined']`	`srt` for sentence-level SRT content, `joined` for a single joined string. Defaults to `srt`	`'srt'`
`w_model_args`	`dict \| None`	Additional arguments for `WhisperModel`. Overrides the ones set in `mirumoji.server.processing.whisper.DEFAULT_MODEL_OPTS`	`None`
`w_transcribe_args`	`dict \| None`	Additional arguments for `WhisperModel.transcribe`. Overrides the ones set in `mirumoji.server.processing.whisper.DEFAULT_TRANSCRIBE_OPTS`	`None`

Returns:

Type	Description
`str`	The raw transcription as `SRT` content

Raises:

Type	Description
`ModalVolumeError`	If the input can't be read from the volume
`WhisperUnavailableError`	If the model can't be loaded
`TranscriptionError`	If transcription fails