Transcription
transcription
¶
Defines Modal GPU jobs for Whisper transcription
Transcription-Only
-
Jobs return raw transcription
-
LLM post-processing (SRT-Fixing) is applied by the
Processorafterwards through the provider-agnostic LLM layer, so the same path works for both local andModaltranscription
transcribe_job(vol_fp, vol_id, output_format='srt', *, w_model_args=None, w_transcribe_args=None)
¶
Transcribe media on a Modal GPU and return raw SRT content
output_format
-
When
output_format="srt", sentence-levelSRTcontent is composed from transcription segments, returning a string ready to be saved as a.srtfile -
When
output_format="joined", transcription segment texts are joined with the Japanese full stop into a single string without any timing information
File Transfer
-
The input media is read out of the per-job ephemeral volume into a container-local temp directory before being handed to
Whisper -
That directory is removed once the job finishes
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vol_fp
|
str
|
Path of the input media inside the per-job ephemeral volume |
required |
vol_id
|
str
|
ID of the per-job ephemeral volume |
required |
output_format
|
Literal['srt', 'joined']
|
|
'srt'
|
w_model_args
|
dict | None
|
Additional arguments for
|
None
|
w_transcribe_args
|
dict | None
|
Additional arguments for
|
None
|
Returns:
| Type | Description |
|---|---|
str
|
The raw transcription as |
Raises:
| Type | Description |
|---|---|
ModalVolumeError
|
If the input can't be read from the volume |
WhisperUnavailableError
|
If the model can't be loaded |
TranscriptionError
|
If transcription fails |