Jpdict

`jpdict` ¶

`BundleMode` ¶

Bases: str, Enum

How aggressively the stitch function groups UniDic short-unit tokens into words

The modes trade granularity for a learner's needs, from whole dictionary words down to raw morphemes

words

The coarsest grouping
Merges compound nouns and the whole inflected tail of a predicate, including the connecting て, bound auxiliary verbs, and politeness, into single dictionary words (e.g. 図書館, 食べてみたかった)
Best for looking words up

grammar (default)

The learning view
Keeps compound nouns and a predicate's inflectional auxiliaries together, but breaks off the pieces a learner parses separately, like the connecting て, bound auxiliary verbs (みる/いる/出す), and the politeness stems (ます/です) (e.g. 食べ | て | みたかった, 読み | ました)

morphemes

The finest grouping
No stitching at all, one word per UniDic short unit (e.g. 図書 | 館, 読み | まし | た)

`EnrichedJapaneseWord` ¶

Bases: BaseModel

A JapaneseWord paired with its dictionary data

Returned by the enrichment endpoint (one kotobase lookup per stitched word), as opposed to the fast tokenize endpoint which returns bare JapaneseWord models

Parameters:

Name	Type	Description	Default
`word`	`JapaneseWord`	The stitched word	required
`kotobase_data`	`KotobaseData`	The dictionary data for the word's lemma	required

`JMEntry` ¶

Bases: BaseModel

Represents a single word entry in the Japanase-Multilingual Dictionary

rank

Kotobase calculates the rank attribute based on JMDict's <pri> tags. The following are the possible values and their meanings

0 → High-frequency words found across standard textbooks (ichi1) and newspapers (news1)
1-48 → The specific 500-word corpus interval the word belongs to (e.g., tier 5 means the word is within the top 2001-2500 most common words)
99 → Low-priority or niche words containing auxiliary tags

Parameters:

Name	Type	Description	Default
`rank`	`int`	A categorized numerical value mapping the word's real-world popularity	required
`kana`	`list[str] \| None`	List of kana readings	required
`kanji`	`list[str] \| None`	List of kanji readings	required
`senses`	`list[WordSense] \| None`	List of `JMWordSense` models	required

`JMNEntry` ¶

Bases: BaseModel

Represents a single name entry in the Japanese Multi-Lingual Dictionary

Parameters:

Name	Type	Description	Default
`kana`	`list[str] \| None`	List of kana readings	required
`kanji`	`list[str] \| None`	List of kanji readings	required
`translation_type`	`str \| None`	Type of name	required
`gloss`	`list[str] \| None`	list of translation strings	required

`JMWordSense` ¶

Bases: BaseModel

Represents a single sense (distinct meanings, translations, or nuances of a Japanese word) for a word within the Japanese-Multilingual Dictionary

order

Represents the sequential arrangement of word meanings based on lexicographical hierarchy
Senses progress logically from primary, literal definitions to secondary, figurative, or technical nuances
This order is editorially curated and does not reflect mathematical usage frequency

Parameters:

Name	Type	Description	Default
`order`	`int`	editorial priority order	required
`pos`	`str`	Grammatical classifications like verb (v5u), noun (n), or adjective (adj-no) that apply to this specific meaning	required
`gloss`	`str`	The English equivalent of the word	required

`JapaneseWord` ¶

Bases: BaseModel

Represents a single useful word stitched from one or more UniDic short-unit tokens

Stitching

UniDic segments at the short-unit level, which is often too granular to be useful (e.g. 図書館 -> 図書 + 館, or a verb split from its auxiliaries)
A JapaneseWord re-bundles those short units into the word a learner actually wants to click
The original short-unit Token models are kept in tokens so no morphological detail is lost

the word built from 読み + まし + た

surface = "読みました" (the pieces joined as written)
reading = "ヨミマシタ" (their katakana readings joined)
lemma = "読む" (the dictionary form, for look-ups)
pos = "動詞" (verb -- the head piece's part of speech)
tokens = [読み, まし, た] (the three original short units)

Parameters:

Name	Type	Description	Default
`surface`	`str`	The bundle's combined surface form	required
`reading`	`str`	The combined katakana reading of the component tokens	required
`lemma`	`str`	Dictionary-lookup form (the head token's UniDic's `orthBase` for inflected words, or the combined surface for noun compounds)	required
`pos`	`str`	The head token's top-level part of speech	required
`tokens`	`list[Token]`	The component short-unit tokens, in order	required

`KanjiInfo` ¶

Bases: BaseModel

Represents a single Kanji entry in KANJIDIC2

Parameters:

Name	Type	Description	Default
`literal`	`str`	Kanji literal	required
`grade`	`int \| None`	Optional Japanese grade in which Kanji is learned	required
`stroke_count`	`int \| None`	Number of strokes in handwriting	required
`meanings`	`list[str] \| None`	List of known meanings	required
`onyomi`	`list[str] \| None`	List of `on` readings	required
`kunyomi`	`list[str] \| None`	List of `kun` readings	required
`jlpt_kanjidic`	`int \| None`	Optional JLPT level present in `KANJIDIC2`	required
`jlpt_tanos`	`int \| None`	Optional JLPT level in `Tanos` list	required

`KotobaseData` ¶

Bases: BaseModel

Represents all information extracted from kotobase for a single query (either a single Japanese word, or a wildcard pattern matching multiple words)

meanings

Exposes the gloss attributes (English equivalent of the word) of all JMWordSense models contained inside the first Japanase-Multilingual Dictionary entry for the query
If the query has only JMNEntry entries, the first entry's gloss attribute is used

Parameters:

Name	Type	Description	Default
`query`	`str`	query literal (either a single Japanese word or a wildcard pattern)	required
`jmentries`	`list[JMEntry]`	All Japanese-Multilingual Dictionary entries for the query	required
`jmnentries`	`list[JMNEntry]`	All `Japanese-Multilingual Dictionary` name entries for the query	required
`kanji`	`list[KanjiInfo]`	`KANJIDIC2` entries for all Kanji present in the query	required
`meanings`	`list[str]`	All English equivalents contained in the first `JMEntry`, or `JMNEntry`	required
`jlpt`	`str`	JLPT vocabulary level for the word extracted from the `Tanos` list. Defaults to `Unknown` when it's a wildcard query or the word is not in the list	required
`examples`	`list[str]`	List of example sentences containing the single word or any words matched by the wildcard query	required

`Token` ¶

Bases: BaseModel

Represents morphological data extracted for a single Japanese token

Maps all core token features and deep UniDic morphological data produced by Fugashi. Converts internal dictionary symbols (like asterisks) into clean pythonic types.

Attributes:

Name	Type	Description
`surface`	`str`	The raw string exactly as it appears in the text.
`lemma`	`str`	The dictionary base form (語彙素) of the word.
`reading`	`str`	The standard reading of the token in Katakana.
`pos`	`str`	The broad, top-level part of speech (品詞).
`pos2`	`str`	Sub-category level 2 part of speech.
`pos3`	`str`	Sub-category level 3 part of speech.
`pos4`	`str`	Sub-category level 4 part of speech.
`c_type`	`str`	Conjugation type (活用型) if applicable.
`c_form`	`str`	Conjugation form (活用形) if applicable.
`l_form`	`str`	Lemma reading in Katakana.
`orth`	`str`	Orthographic surface representation.
`pron`	`str`	Actual pronunciation including long vowels.
`orth_base`	`str`	Base form using current orthography.
`pron_base`	`str`	Pronunciation of the base form.
`goshu`	`str`	Word origin type (語種) e.g., Native, Sino-Japanese.
`i_type`	`str`	Word-initial transformation type.
`i_form`	`str`	Word-initial transformation form.
`f_type`	`str`	Word-final transformation type.
`f_form`	`str`	Word-final transformation form.

`clear_asterisks(data)` `classmethod` ¶

Cleans incoming dictionary fields by converting UniDic's "*" sentinel and any missing (None) feature to an empty string, since unknown / out-of-vocabulary tokens leave some features unset

Parameters:

Name	Type	Description	Default
`data`	`dict`	Raw dictionary data containing morphological fields	required

Returns:

Name	Type	Description
`dict`	`dict[str, Any]`	The modified dictionary with "*"/`None` values replaced by ""

Jpdict

jpdict ¶

BundleMode ¶

EnrichedJapaneseWord ¶

JMEntry ¶

JMNEntry ¶

JMWordSense ¶

JapaneseWord ¶

KanjiInfo ¶

KotobaseData ¶

Token ¶

clear_asterisks(data) classmethod ¶

`jpdict` ¶

`BundleMode` ¶

`EnrichedJapaneseWord` ¶

`JMEntry` ¶

`JMNEntry` ¶

`JMWordSense` ¶

`JapaneseWord` ¶

`KanjiInfo` ¶

`KotobaseData` ¶

`Token` ¶

`clear_asterisks(data)` `classmethod` ¶