Skip to content

Audio

audio

Defines functions that run FFMPEG in a subprocess to perform various media operations

extract_audio(ffmpeg_path, input_path, output_path)

Extracts a WAV audio file from a video container using FFMPEG

If input file is already an audio file, returns the unchanged input file path

File Extensions

This function checks the input_path file's extension to determine if it's an audio file. The following extensions are considered to be audio files

  • .wav
  • .mp3
  • .m4a
  • .flac
  • .aac

If the file at input_path has any of these extensions, its path will be returned unchanged

Parameters:

Name Type Description Default
ffmpeg_path str

Path to the system's FFMPEG executable

required
input_path str | PathLike[str]

Path to the video file

required
output_path str | PathLike[str]

Output path

required

Raises:

Type Description
FFmpegError

If the FFMPEG command returned a non-zero exit code

ValueError

If input_path doesn't exist or is not a file

Returns:

Type Description
Path

The output path of the converted file, or input_path if it is already an audio file

filter_audio(ffmpeg_path, input_path, output_path, highpass=300, lowpass=3400)

Applies a band-pass and loudness normalization to a media file, extracting a 16 kHz mono WAV audio file from it using FFMPEG

Parameters:

Name Type Description Default
ffmpeg_path str

Path to the system's FFMPEG executable

required
input_path str

Path to the audio / video file to use

required
output_path str

Path in which to save the resulting WAV file

required
highpass int

Cut everything below this frequency (Hz)

300
lowpass int

Cut everything above this frequency (Hz)

3400

Raises:

Type Description
FFmpegError

If the FFMPEG command returned a non-zero exit code

ValueError

If input_path doesn't exist or is not a file

Returns:

Type Description
Path

The path to the extracted 16 kHz mono WAV audio file

get_ffmpeg_path()

Uses shutil.which to find the absolute path to the system's FFMPEG and FFPROBE executables, raising an exception when either one of them can't be found

Raises:

Type Description
MissingFFmpegError

If the FFMPEG executable couldn't be located

MissingFFprobeError

If the FFROBE executable couldn't be located

Returns:

Type Description
dict[str, str]

dictionary mapping ffmpeg and ffprobe to their respective paths

run_command(command, check=True, cwd=None)

Wraps subprocess.run in order to log errors and results

Parameters:

Name Type Description Default
command list[str]

The CMD list of the command to be executed

required
check bool

When True, raises an exception on subprocess error. Defaults to True

True
cwd (str | PathLike[str], None)

Directory from which the command will be executed. Defaults to None (the currently running python process' directory)

None

Raises:

Type Description
CalledProcessError

When check=True and the process returns a non-zero exit code

Returns:

Type Description
CompletedProcess[str]

A subprocess.CompletedProcess object

to_mp4(ffmpeg_path, input_path, output_path=None, resolution='1280x720', target_bitrate='2500k', preset=DEFAULT_CONVERSION_PRESET, use_gpu=False)

Converts any video supported by FFMPEG to an MP4 file using H.264 + AAC encoding

Hardware Acceleration
  • When use_gpu is True, the whole pipeline stays on the GPU

  • NVDEC decodes, scale_cuda scales, and NVENC (h264_nvenc) encodes, so no frame ever crosses the PCIe bus to system memory

  • Callers gate use_gpu on config.nvenc_available, since the data center compute GPUs (A100, H100, B200) have NVDEC but no NVENC

  • The GPU command still falls back to CPU libx264 if it fails for any other reason

Geometry

The video is scaled to fit within resolution while preserving aspect ratio. It is not padded to a fixed canvas, so the output keeps the source's aspect ratio (the player letterboxes as needed)

Rate Control

preset sets the encoder speed and quality (-crf / -cq), while target_bitrate is applied as a ceiling (-maxrate + -bufsize), so the file targets a quality level but never exceeds the bitrate cap

Parameters:

Name Type Description Default
ffmpeg_path str

Path to the system's FFMPEG executable

required
input_path str | PathLike[str]

Source file (any container/codec supported by FFMPEG)

required
output_path str | PathLike[str] | None

Path in which to save the MP4 file. When set to None, the file is saved to input_path with a .mp4 suffix

None
resolution str

Maximum output size WxH. Aspect is preserved. Defaults to 1280x720

'1280x720'
target_bitrate str

Video bitrate ceiling. Defaults to 2500k

'2500k'
preset str

One of performance, balanced, quality. Defaults to balanced

DEFAULT_CONVERSION_PRESET
use_gpu bool

When True, runs the fully on-device NVIDIA pipeline, falling back to CPU libx264 if it fails for any reason

False

Raises:

Type Description
FFmpegError

If any of the FFMPEG commands have returned a non-zero exit code

ValueError

If input_path doesn't exist or is not a file, if an invalid resolution is provided, or if preset is unknown

Returns:

Type Description
Path

Path to the resulting MP4

to_wav(ffmpeg_path, input_path, output_path)

Converts a file to .wav format using FFMPEG

Parameters:

Name Type Description Default
ffmpeg_path str

Path to the system's FFMPEG executable

required
input_path str | PathLike[str]

Path to the file to convert

required
output_path str | PathLike[str]

Output file path

required

Raises:

Type Description
FFmpegError

If the FFMPEG command returned a non-zero exit code

ValueError

If input_path doesn't exist or is not a file

Returns:

Type Description
Path

The output path of the converted file

to_webm(ffmpeg_path, input_path, output_path=None, resolution='1280x720', target_bitrate='2500k')

Converts any video supported by FFMPEG to an WebM file using VP9 + Opus encoding

CPU-Only
  • VP9 has no NVENC encoder, so the encode is always CPU libvpx-vp9

  • Decoding on NVDEC then copying the frames back to system memory for the CPU encode buys nothing on a short clip, so there is no GPU path

Geometry

The video is scaled to fit within resolution while preserving aspect ratio (no fixed-canvas padding), so the output keeps the source's aspect ratio

Rate Control

This is used for saved clips, which are short and become Anki card media, so it encodes at high quality (-cpu-used 1, -crf 28) with target_bitrate applied as a constrained-quality ceiling (-b:v)

Parameters:

Name Type Description Default
ffmpeg_path str

Path to the system's FFMPEG executable

required
input_path str | PathLike[str]

Source file (any container/codec supported by FFMPEG)

required
output_path str | PathLike[str] | None

Path in which to save the WebM file. When set to None, the file is saved to input_path with a.webm` suffix

None
resolution str

Maximum output size WxH. Aspect is preserved. Defaults to 1280x720

'1280x720'
target_bitrate str

Video bitrate ceiling. Defaults to 2500k

'2500k'

Raises:

Type Description
FFmpegError

If the FFMPEG command returned a non-zero exit code

ValueError

If input_path doesn't exist or is not a file, or if an invalid resolution is provided

Returns:

Type Description
Path

Path to the resulting WebM