Skip to content

Processor

processor

Defines the Processor class, the server's stateful transcription/conversion orchestrator

Role
  • Routes transcription/conversion to either the local backend or Modal, based on config.transcribe_backend()

  • Stateless concerns (tokenization, dictionary lookups, LLM calls) are handled directly by their own modules and are intentionally not routed through here

K = TypeVar('K') module-attribute

The id that a caller assigns to one input file in a batch

Batch Operations
  • Each input is passed in under this id

  • Its result is yielded back under the same id

  • Results arrive as containers finish (in no fixed order), so this id is how the caller tells which input a result belongs to

  • The batch job handlers pass each file's child-job id

Processor

Stateful orchestrator for transcription and video conversion

Detects the transcription backend on construction and lazily builds the required backend dependencies (local Whisper model or the Modal runtime), caching it for reuse

Attributes:

Name Type Description
backend str

Resolved transcription backend (local | modal | none)

convert_batch(sources, out_paths, *, to_mp4_kwargs=None) async

Converts many videos to MP4 on Modal in parallel

Modal-Only
  • This is the modal backend's batch fan-out path and assumes the backend resolved to modal

  • The local backend converts its batches one file at a time instead of calling this

Fan-Out
  • All inputs share one ephemeral volume and one app run, each file is spawn-ed to its own GPU container

  • Each converted MP4 is streamed back out to its local destination as the file finishes

Result Keys
  • sources and out_paths map each input's id to its source video and its destination, and each result is yielded under that same id

  • Because results arrive out of order, the id is how the caller knows which input a result belongs to

  • In practice it is the file's child-job id

Per-File Isolation
  • One file failing yields its exception instead of raising, so sibling files still complete

  • Domain exceptions are preserved, other failures are wrapped in ModalError

Parameters:

Name Type Description Default
sources dict[K, Path]

Each input's local source video, mapped from the id the caller assigned to that input

required
out_paths dict[K, Path]

Each input's local MP4 destination, under the same ids as sources

required
to_mp4_kwargs dict | None

Argument overrides for audio.to_mp4

None

Yields:

Type Description
AsyncIterator[tuple[K, Path | MirumojiServerError]]

(id, result) for each input as its container finishes, where id is the one the input was passed in under and result is the converted MP4's local path or that input's failure

convert_to_mp4(input_path, output_path, to_mp4_kwargs=None) async

Converts a video to MP4 using FFMPEG

Backend Differences
  • Both backends probe NVENC where the conversion actually runs and take the on-device GPU pipeline only when an encoder is present, falling back to CPU libx264 otherwise

  • For local the probe runs on the host

  • For modal it runs inside the GPU container so that the data center compute GPUs (A100 / H100 / B200), which have no NVENC, convert on CPU

Parameters:

Name Type Description Default
input_path str | PathLike[str]

Absolute path to the source video

required
output_path str | PathLike[str]

Absolute destination path for the MP4

required
to_mp4_kwargs dict | None

Argument overrides for audio.to_mp4 (resolution, target_bitrate, preset)

None

Returns:

Type Description
Path

The path to the converted MP4

Raises:

Type Description
MissingFFmpegError

If the FFMPEG executable can't be located (local backend)

MissingFFprobeError

If the FFPROBE executable can't be located (local backend)

FFmpegError

If an FFMPEG command fails (local or modal backend)

ValueError

If the source isn't a valid file or the resolution is malformed (local or modal backend)

InvalidMediaPathError

If the source path is outside the media directory (Modal backend)

ModalError

If the Modal conversion job fails

transcribe(media_path, output_format='srt', *, w_model_args=None, w_transcribe_args=None) async

Transcribes media using either a local WhisperModel or the Modal app's trancribe_job function depending on backend configuration

output_format
  • When output_format="srt", sentence-level SRT content is composed from transcription segments, returning a string ready to be saved as a .srt file

  • When output_format="joined", transcription segment texts are joined with the Japanese full stop into a single string without any timing information

w_model_args
  • When running the local backend, the model-loading overrides apply only to the first load, since the model used for subsequent calls is a cached one

  • When running the modal backend, each job runs in an isolated container which must rebuild the WhisperModel object, so these overrides apply for every call

Parameters:

Name Type Description Default
media_path str | PathLike[str]

Absolute path to the media within the media directory

required
output_format Literal['srt', 'joined']

srt for sentence-level SRT content, joined for a single joined string. Defaults to srt

'srt'
w_model_args dict | None

Additional arguments for WhisperModel. Overrides the ones set in mirumoji.server.processing.whisper.DEFAULT_MODEL_OPTS

None
w_transcribe_args dict | None

Additional arguments for WhisperModel.transcribe. Overrides the ones set in mirumoji.server.processing.whisper.DEFAULT_TRANSCRIBE_OPTS

None

Returns:

Type Description
str

The raw transcription in the requested format

Raises:

Type Description
WhisperUnavailableError

If no transcription backend is configured, or the local model fails to load

TranscriptionError

If transcription fails (raised locally, or propagated unchanged from the Modal job when preserved)

InvalidMediaPathError

If the media path is outside the media directory (Modal backend)

ModalError

If the Modal job fails for any other reason

transcribe_batch(sources, output_format='srt', *, w_model_args=None, w_transcribe_args=None) async

Transcribes many already-prepared audio files on Modal in parallel

Modal-Only
  • This is the modal backend's batch fan-out path and assumes the backend resolved to modal

  • The local backend has a single GPU, so its batches run the per-file transcription sequentially instead of calling this

Fan-Out
  • All inputs share one ephemeral volume and one app run, then each file is spawn-ed to its own container, bounded by the account's concurrent-GPU limit

  • Results are yielded as each container finishes (in no fixed order), so the caller can update per-file state live

Result Keys
  • sources maps each input's id to its audio file, and each result is yielded under that same id

  • Because results arrive out of order, the id is how the caller knows which input a result belongs to

  • In practice it is the file's child-job id, so the handler writes the result straight onto the matching child row

Per-File Isolation
  • One file failing does not fail the batch

  • A failed file yields its exception instead of raising, so sibling files still complete

  • Domain exceptions are preserved across the Modal boundary, other failures are wrapped in ModalError

Parameters:

Name Type Description Default
sources dict[K, Path]

Each input's local prepared audio file, mapped from the id the caller assigned to that input

required
output_format Literal['srt', 'joined']

srt for sentence-level SRT content, joined for a single joined string

'srt'
w_model_args dict | None

Additional arguments for WhisperModel

None
w_transcribe_args dict | None

Additional arguments for WhisperModel.transcribe

None

Yields:

Type Description
AsyncIterator[tuple[K, str | MirumojiServerError]]

(id, result) for each input as its container finishes, where id is the one the input was passed in under and result is the raw transcription or that input's failure