Processor

`processor` ¶

Defines the Processor class, the server's stateful transcription/conversion orchestrator

Role

Routes transcription/conversion to either the local backend or Modal, based on config.transcribe_backend()
Stateless concerns (tokenization, dictionary lookups, LLM calls) are handled directly by their own modules and are intentionally not routed through here

`K = TypeVar('K')` `module-attribute` ¶

The id that a caller assigns to one input file in a batch

Batch Operations

Each input is passed in under this id
Its result is yielded back under the same id
Results arrive as containers finish (in no fixed order), so this id is how the caller tells which input a result belongs to
The batch job handlers pass each file's child-job id

`Processor` ¶

Stateful orchestrator for transcription and video conversion

Detects the transcription backend on construction and lazily builds the required backend dependencies (local Whisper model or the Modal runtime), caching it for reuse

Attributes:

Name	Type	Description
`backend`	`str`	Resolved transcription backend (`local` \| `modal` \| `none`)

`convert_batch(sources, out_paths, *, to_mp4_kwargs=None)` `async` ¶

Converts many videos to MP4 on Modal in parallel

Modal-Only

This is the modal backend's batch fan-out path and assumes the backend resolved to modal
The local backend converts its batches one file at a time instead of calling this

Fan-Out

All inputs share one ephemeral volume and one app run, each file is spawn-ed to its own GPU container
Each converted MP4 is streamed back out to its local destination as the file finishes

Result Keys

sources and out_paths map each input's id to its source video and its destination, and each result is yielded under that same id
Because results arrive out of order, the id is how the caller knows which input a result belongs to
In practice it is the file's child-job id

Per-File Isolation

One file failing yields its exception instead of raising, so sibling files still complete
Domain exceptions are preserved, other failures are wrapped in ModalError

Parameters:

Name	Type	Description	Default
`sources`	`dict[K, Path]`	Each input's local source video, mapped from the id the caller assigned to that input	required
`out_paths`	`dict[K, Path]`	Each input's local MP4 destination, under the same ids as `sources`	required
`to_mp4_kwargs`	`dict \| None`	Argument overrides for `audio.to_mp4`	`None`

Yields:

Type	Description
`AsyncIterator[tuple[K, Path \| MirumojiServerError]]`	`(id, result)` for each input as its container finishes, where `id` is the one the input was passed in under and `result` is the converted MP4's local path or that input's failure

`convert_to_mp4(input_path, output_path, to_mp4_kwargs=None)` `async` ¶

Converts a video to MP4 using FFMPEG

Backend Differences

Both backends probe NVENC where the conversion actually runs and take the on-device GPU pipeline only when an encoder is present, falling back to CPU libx264 otherwise
For local the probe runs on the host
For modal it runs inside the GPU container so that the data center compute GPUs (A100 / H100 / B200), which have no NVENC, convert on CPU

Parameters:

Name	Type	Description	Default
`input_path`	`str \| PathLike[str]`	Absolute path to the source video	required
`output_path`	`str \| PathLike[str]`	Absolute destination path for the MP4	required
`to_mp4_kwargs`	`dict \| None`	Argument overrides for `audio.to_mp4` (resolution, target_bitrate, preset)	`None`

Returns:

Type	Description
`Path`	The path to the converted MP4

Raises:

Type	Description
`MissingFFmpegError`	If the FFMPEG executable can't be located (local backend)
`MissingFFprobeError`	If the FFPROBE executable can't be located (local backend)
`FFmpegError`	If an FFMPEG command fails (local or modal backend)
`ValueError`	If the source isn't a valid file or the resolution is malformed (local or modal backend)
`InvalidMediaPathError`	If the source path is outside the media directory (Modal backend)
`ModalError`	If the Modal conversion job fails

`transcribe(media_path, output_format='srt', *, w_model_args=None, w_transcribe_args=None)` `async` ¶

Transcribes media using either a local WhisperModel or the Modal app's trancribe_job function depending on backend configuration

output_format

When output_format="srt", sentence-level SRT content is composed from transcription segments, returning a string ready to be saved as a .srt file
When output_format="joined", transcription segment texts are joined with the Japanese full stop into a single string without any timing information

w_model_args

When running the local backend, the model-loading overrides apply only to the first load, since the model used for subsequent calls is a cached one
When running the modal backend, each job runs in an isolated container which must rebuild the WhisperModel object, so these overrides apply for every call

Parameters:

Name	Type	Description	Default
`media_path`	`str \| PathLike[str]`	Absolute path to the media within the media directory	required
`output_format`	`Literal['srt', 'joined']`	`srt` for sentence-level SRT content, `joined` for a single joined string. Defaults to `srt`	`'srt'`
`w_model_args`	`dict \| None`	Additional arguments for `WhisperModel`. Overrides the ones set in `mirumoji.server.processing.whisper.DEFAULT_MODEL_OPTS`	`None`
`w_transcribe_args`	`dict \| None`	Additional arguments for `WhisperModel.transcribe`. Overrides the ones set in `mirumoji.server.processing.whisper.DEFAULT_TRANSCRIBE_OPTS`	`None`

Returns:

Type	Description
`str`	The raw transcription in the requested format

Raises:

Type	Description
`WhisperUnavailableError`	If no transcription backend is configured, or the local model fails to load
`TranscriptionError`	If transcription fails (raised locally, or propagated unchanged from the Modal job when preserved)
`InvalidMediaPathError`	If the media path is outside the media directory (Modal backend)
`ModalError`	If the Modal job fails for any other reason

`transcribe_batch(sources, output_format='srt', *, w_model_args=None, w_transcribe_args=None)` `async` ¶

Transcribes many already-prepared audio files on Modal in parallel

Modal-Only

This is the modal backend's batch fan-out path and assumes the backend resolved to modal
The local backend has a single GPU, so its batches run the per-file transcription sequentially instead of calling this

Fan-Out

All inputs share one ephemeral volume and one app run, then each file is spawn-ed to its own container, bounded by the account's concurrent-GPU limit
Results are yielded as each container finishes (in no fixed order), so the caller can update per-file state live

Result Keys

sources maps each input's id to its audio file, and each result is yielded under that same id
Because results arrive out of order, the id is how the caller knows which input a result belongs to
In practice it is the file's child-job id, so the handler writes the result straight onto the matching child row

Per-File Isolation

One file failing does not fail the batch
A failed file yields its exception instead of raising, so sibling files still complete
Domain exceptions are preserved across the Modal boundary, other failures are wrapped in ModalError

Parameters:

Name	Type	Description	Default
`sources`	`dict[K, Path]`	Each input's local prepared audio file, mapped from the id the caller assigned to that input	required
`output_format`	`Literal['srt', 'joined']`	`srt` for sentence-level SRT content, `joined` for a single joined string	`'srt'`
`w_model_args`	`dict \| None`	Additional arguments for `WhisperModel`	`None`
`w_transcribe_args`	`dict \| None`	Additional arguments for `WhisperModel.transcribe`	`None`

Yields:

Type	Description
`AsyncIterator[tuple[K, str \| MirumojiServerError]]`	`(id, result)` for each input as its container finishes, where `id` is the one the input was passed in under and `result` is the raw transcription or that input's failure

Processor

processor ¶

K = TypeVar('K') module-attribute ¶

Processor ¶

convert_batch(sources, out_paths, *, to_mp4_kwargs=None) async ¶

convert_to_mp4(input_path, output_path, to_mp4_kwargs=None) async ¶

transcribe(media_path, output_format='srt', *, w_model_args=None, w_transcribe_args=None) async ¶

transcribe_batch(sources, output_format='srt', *, w_model_args=None, w_transcribe_args=None) async ¶

`processor` ¶

`K = TypeVar('K')` `module-attribute` ¶

`Processor` ¶

`convert_batch(sources, out_paths, *, to_mp4_kwargs=None)` `async` ¶

`convert_to_mp4(input_path, output_path, to_mp4_kwargs=None)` `async` ¶

`transcribe(media_path, output_format='srt', *, w_model_args=None, w_transcribe_args=None)` `async` ¶

`transcribe_batch(sources, output_format='srt', *, w_model_args=None, w_transcribe_args=None)` `async` ¶