Processor
processor
¶
Defines the Processor class, the server's stateful transcription/conversion
orchestrator
Role
-
Routes transcription/conversion to either the
localbackend orModal, based onconfig.transcribe_backend() -
Stateless concerns (tokenization, dictionary lookups, LLM calls) are handled directly by their own modules and are intentionally not routed through here
K = TypeVar('K')
module-attribute
¶
The id that a caller assigns to one input file in a batch
Batch Operations
-
Each input is passed in under this id
-
Its result is yielded back under the same id
-
Results arrive as containers finish (in no fixed order), so this id is how the caller tells which input a result belongs to
-
The batch job handlers pass each file's child-job id
Processor
¶
Stateful orchestrator for transcription and video conversion
Detects the transcription backend on construction and lazily builds the
required backend dependencies (local Whisper model or the Modal
runtime), caching it for reuse
Attributes:
| Name | Type | Description |
|---|---|---|
backend |
str
|
Resolved transcription backend
( |
convert_batch(sources, out_paths, *, to_mp4_kwargs=None)
async
¶
Converts many videos to MP4 on Modal in parallel
Modal-Only
-
This is the
modalbackend's batch fan-out path and assumes the backend resolved tomodal -
The
localbackend converts its batches one file at a time instead of calling this
Fan-Out
-
All inputs share one ephemeral volume and one app run, each file is
spawn-ed to its own GPU container -
Each converted MP4 is streamed back out to its local destination as the file finishes
Result Keys
-
sourcesandout_pathsmap each input's id to its source video and its destination, and each result is yielded under that same id -
Because results arrive out of order, the id is how the caller knows which input a result belongs to
-
In practice it is the file's child-job id
Per-File Isolation
-
One file failing yields its exception instead of raising, so sibling files still complete
-
Domain exceptions are preserved, other failures are wrapped in
ModalError
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sources
|
dict[K, Path]
|
Each input's local source video, mapped from the id the caller assigned to that input |
required |
out_paths
|
dict[K, Path]
|
Each input's local MP4 destination,
under the same ids as |
required |
to_mp4_kwargs
|
dict | None
|
Argument overrides for |
None
|
Yields:
| Type | Description |
|---|---|
AsyncIterator[tuple[K, Path | MirumojiServerError]]
|
|
convert_to_mp4(input_path, output_path, to_mp4_kwargs=None)
async
¶
Converts a video to MP4 using FFMPEG
Backend Differences
-
Both backends probe
NVENCwhere the conversion actually runs and take the on-device GPU pipeline only when an encoder is present, falling back to CPUlibx264otherwise -
For
localthe probe runs on the host -
For
modalit runs inside the GPU container so that the data center compute GPUs (A100/H100/B200), which have noNVENC, convert on CPU
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_path
|
str | PathLike[str]
|
Absolute path to the source video |
required |
output_path
|
str | PathLike[str]
|
Absolute destination path for the MP4 |
required |
to_mp4_kwargs
|
dict | None
|
Argument overrides for |
None
|
Returns:
| Type | Description |
|---|---|
Path
|
The path to the converted MP4 |
Raises:
| Type | Description |
|---|---|
MissingFFmpegError
|
If the FFMPEG executable can't be located (local backend) |
MissingFFprobeError
|
If the FFPROBE executable can't be located (local backend) |
FFmpegError
|
If an FFMPEG command fails (local or modal backend) |
ValueError
|
If the source isn't a valid file or the resolution is malformed (local or modal backend) |
InvalidMediaPathError
|
If the source path is outside the media directory (Modal backend) |
ModalError
|
If the Modal conversion job fails |
transcribe(media_path, output_format='srt', *, w_model_args=None, w_transcribe_args=None)
async
¶
Transcribes media using either a local WhisperModel or the Modal
app's trancribe_job function depending on backend configuration
output_format
-
When
output_format="srt", sentence-levelSRTcontent is composed from transcription segments, returning a string ready to be saved as a.srtfile -
When
output_format="joined", transcription segment texts are joined with the Japanese full stop into a single string without any timing information
w_model_args
-
When running the
localbackend, the model-loading overrides apply only to the first load, since the model used for subsequent calls is a cached one -
When running the
modalbackend, each job runs in an isolated container which must rebuild theWhisperModelobject, so these overrides apply for every call
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
media_path
|
str | PathLike[str]
|
Absolute path to the media within the media directory |
required |
output_format
|
Literal['srt', 'joined']
|
|
'srt'
|
w_model_args
|
dict | None
|
Additional arguments for
|
None
|
w_transcribe_args
|
dict | None
|
Additional arguments for
|
None
|
Returns:
| Type | Description |
|---|---|
str
|
The raw transcription in the requested format |
Raises:
| Type | Description |
|---|---|
WhisperUnavailableError
|
If no transcription backend is configured, or the local model fails to load |
TranscriptionError
|
If transcription fails (raised locally, or propagated unchanged from the Modal job when preserved) |
InvalidMediaPathError
|
If the media path is outside the media directory (Modal backend) |
ModalError
|
If the Modal job fails for any other reason |
transcribe_batch(sources, output_format='srt', *, w_model_args=None, w_transcribe_args=None)
async
¶
Transcribes many already-prepared audio files on Modal in parallel
Modal-Only
-
This is the
modalbackend's batch fan-out path and assumes the backend resolved tomodal -
The
localbackend has a single GPU, so its batches run the per-file transcription sequentially instead of calling this
Fan-Out
-
All inputs share one ephemeral volume and one app run, then each file is
spawn-ed to its own container, bounded by the account's concurrent-GPU limit -
Results are yielded as each container finishes (in no fixed order), so the caller can update per-file state live
Result Keys
-
sourcesmaps each input's id to its audio file, and each result is yielded under that same id -
Because results arrive out of order, the id is how the caller knows which input a result belongs to
-
In practice it is the file's child-job id, so the handler writes the result straight onto the matching child row
Per-File Isolation
-
One file failing does not fail the batch
-
A failed file yields its exception instead of raising, so sibling files still complete
-
Domain exceptions are preserved across the Modal boundary, other failures are wrapped in
ModalError
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sources
|
dict[K, Path]
|
Each input's local prepared audio file, mapped from the id the caller assigned to that input |
required |
output_format
|
Literal['srt', 'joined']
|
|
'srt'
|
w_model_args
|
dict | None
|
Additional arguments for |
None
|
w_transcribe_args
|
dict | None
|
Additional arguments for
|
None
|
Yields:
| Type | Description |
|---|---|
AsyncIterator[tuple[K, str | MirumojiServerError]]
|
|