Skip to content

Dict

dict

This module defines the dict_router of the Mirumoji API

Attributes:

Name Type Description
LOGGER Logger

Module's logging object

dict_router APIRouter

The FastAPI router object

analyze(sentence=Query(...), mode=Query(BundleMode.grammar)) async

Tokenizes a sentence and enriches every stitched word with dictionary data

Slower than /dict/tokenize (one dictionary lookup per word). Intended for on-demand analysis rather than bulk rendering

Parameters:

Name Type Description Default
sentence str

The Japanese sentence to analyze

Query(...)
mode BundleMode

How aggressively to group tokens into words

Query(grammar)

Returns:

Type Description
list[EnrichedJapaneseWord]

A list of EnrichedJapaneseWord models, one per stitched word

Raises:

Type Description
FugashiError

If tokenization fails

KotobaseError

If a dictionary lookup fails

query(word=Query(...), wildcard=Query(False)) async

Looks up dictionary data for a single word or a wildcard pattern

Parameters:

Name Type Description Default
word str

Word or wildcard pattern to look up

Query(...)
wildcard bool

When True, treat word as a wildcard pattern matching multiple words

Query(False)

Returns:

Type Description
KotobaseData

The KotobaseData for the query

Raises:

Type Description
KotobaseError

If the lookup fails

tokenize(sentence=Query(...), mode=Query(BundleMode.grammar)) async

Tokenizes a sentence into useful, stitched words (no dictionary lookups)

This is the fast path for rendering clickable text. Call /dict/analyze or /dict/query to fetch dictionary data for a word

Parameters:

Name Type Description Default
sentence str

The Japanese sentence to tokenize

Query(...)
mode BundleMode

How aggressively to group tokens into words

Query(grammar)

Returns:

Type Description
list[JapaneseWord]

A list of JapaneseWord models, one per stitched word

Raises:

Type Description
FugashiError

If tokenization fails

tokenize_batch(req) async

Tokenizes many sentences in one request (stitched words, no dict data)

Lets a client tokenize a whole subtitle file up front in a single call, so playback never tokenizes per-cue

Parameters:

Name Type Description Default
req TokenizeBatchRequest

The sentences to tokenize

required

Returns:

Type Description
list[list[JapaneseWord]]

One JapaneseWord list per input sentence, in the same order

Raises:

Type Description
FugashiError

If tokenization fails