Jpdict
jpdict
¶
BundleMode
¶
How aggressively the stitch function groups UniDic short-unit tokens into
words
The modes trade granularity for a learner's needs, from whole dictionary words down to raw morphemes
words
-
The coarsest grouping
-
Merges compound nouns and the whole inflected tail of a predicate, including the connecting て, bound auxiliary verbs, and politeness, into single dictionary words (e.g.
図書館,食べてみたかった) -
Best for looking words up
grammar (default)
-
The
learningview -
Keeps compound nouns and a predicate's inflectional auxiliaries together, but breaks off the pieces a learner parses separately, like the connecting て, bound auxiliary verbs (みる/いる/出す), and the politeness stems (ます/です) (e.g.
食べ | て | みたかった,読み | ました)
morphemes
-
The finest grouping
-
No stitching at all, one word per UniDic short unit (e.g.
図書 | 館,読み | まし | た)
EnrichedJapaneseWord
¶
Bases: BaseModel
A JapaneseWord paired with its dictionary data
Returned by the enrichment endpoint (one kotobase lookup per stitched
word), as opposed to the fast tokenize endpoint which returns bare
JapaneseWord models
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
word
|
JapaneseWord
|
The stitched word |
required |
kotobase_data
|
KotobaseData
|
The dictionary data for the word's lemma |
required |
JMEntry
¶
Bases: BaseModel
Represents a single word entry in the Japanase-Multilingual Dictionary
rank
Kotobase calculates the rank attribute based on JMDict's <pri>
tags. The following are the possible values and their meanings
-
0→ High-frequency words found across standard textbooks (ichi1) and newspapers (news1) -
1-48→ The specific 500-word corpus interval the word belongs to (e.g., tier 5 means the word is within the top2001-2500most common words) -
99→ Low-priority or niche words containing auxiliary tags
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rank
|
int
|
A categorized numerical value mapping the word's real-world popularity |
required |
kana
|
list[str] | None
|
List of kana readings |
required |
kanji
|
list[str] | None
|
List of kanji readings |
required |
senses
|
list[WordSense] | None
|
List of |
required |
JMNEntry
¶
Bases: BaseModel
Represents a single name entry in the Japanese Multi-Lingual
Dictionary
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kana
|
list[str] | None
|
List of kana readings |
required |
kanji
|
list[str] | None
|
List of kanji readings |
required |
translation_type
|
str | None
|
Type of name |
required |
gloss
|
list[str] | None
|
list of translation strings |
required |
JMWordSense
¶
Bases: BaseModel
Represents a single sense (distinct meanings, translations, or nuances
of a Japanese word) for a word within the Japanese-Multilingual Dictionary
order
-
Represents the sequential arrangement of word meanings based on lexicographical hierarchy
-
Senses progress logically from primary, literal definitions to secondary, figurative, or technical nuances
-
This order is editorially curated and does not reflect mathematical usage frequency
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
order
|
int
|
editorial priority order |
required |
pos
|
str
|
Grammatical classifications like verb (v5u), noun (n), or adjective (adj-no) that apply to this specific meaning |
required |
gloss
|
str
|
The English equivalent of the word |
required |
JapaneseWord
¶
Bases: BaseModel
Represents a single useful word stitched from one or more UniDic short-unit tokens
Stitching
-
UniDic segments at the short-unit level, which is often too granular to be useful (e.g.
図書館->図書+館, or a verb split from its auxiliaries) -
A
JapaneseWordre-bundles those short units into the word a learner actually wants to click -
The original short-unit
Tokenmodels are kept intokensso no morphological detail is lost
the word built from 読み + まし + た
- surface = "読みました" (the pieces joined as written)
- reading = "ヨミマシタ" (their katakana readings joined)
- lemma = "読む" (the dictionary form, for look-ups)
- pos = "動詞" (verb -- the head piece's part of speech)
- tokens = [読み, まし, た] (the three original short units)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
surface
|
str
|
The bundle's combined surface form |
required |
reading
|
str
|
The combined katakana reading of the component tokens |
required |
lemma
|
str
|
Dictionary-lookup form (the head token's UniDic's
|
required |
pos
|
str
|
The head token's top-level part of speech |
required |
tokens
|
list[Token]
|
The component short-unit tokens, in order |
required |
KanjiInfo
¶
Bases: BaseModel
Represents a single Kanji entry in KANJIDIC2
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
literal
|
str
|
Kanji literal |
required |
grade
|
int | None
|
Optional Japanese grade in which Kanji is learned |
required |
stroke_count
|
int | None
|
Number of strokes in handwriting |
required |
meanings
|
list[str] | None
|
List of known meanings |
required |
onyomi
|
list[str] | None
|
List of |
required |
kunyomi
|
list[str] | None
|
List of |
required |
jlpt_kanjidic
|
int | None
|
Optional JLPT level present in |
required |
jlpt_tanos
|
int | None
|
Optional JLPT level in |
required |
KotobaseData
¶
Bases: BaseModel
Represents all information extracted from kotobase for a single
query (either a single Japanese word, or a wildcard pattern matching
multiple words)
meanings
-
Exposes the
glossattributes (English equivalent of the word) of allJMWordSensemodels contained inside the first Japanase-Multilingual Dictionary entry for the query -
If the query has only
JMNEntryentries, the first entry'sglossattribute is used
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
str
|
query literal (either a single Japanese word or a wildcard pattern) |
required |
jmentries
|
list[JMEntry]
|
All Japanese-Multilingual Dictionary entries for the query |
required |
jmnentries
|
list[JMNEntry]
|
All |
required |
kanji
|
list[KanjiInfo]
|
|
required |
meanings
|
list[str]
|
All English equivalents contained in the first
|
required |
jlpt
|
str
|
JLPT vocabulary level for the word extracted from the |
required |
examples
|
list[str]
|
List of example sentences containing the single word or any words matched by the wildcard query |
required |
Token
¶
Bases: BaseModel
Represents morphological data extracted for a single Japanese token
Maps all core token features and deep UniDic morphological data produced by Fugashi. Converts internal dictionary symbols (like asterisks) into clean pythonic types.
Attributes:
| Name | Type | Description |
|---|---|---|
surface |
str
|
The raw string exactly as it appears in the text. |
lemma |
str
|
The dictionary base form (語彙素) of the word. |
reading |
str
|
The standard reading of the token in Katakana. |
pos |
str
|
The broad, top-level part of speech (品詞). |
pos2 |
str
|
Sub-category level 2 part of speech. |
pos3 |
str
|
Sub-category level 3 part of speech. |
pos4 |
str
|
Sub-category level 4 part of speech. |
c_type |
str
|
Conjugation type (活用型) if applicable. |
c_form |
str
|
Conjugation form (活用形) if applicable. |
l_form |
str
|
Lemma reading in Katakana. |
orth |
str
|
Orthographic surface representation. |
pron |
str
|
Actual pronunciation including long vowels. |
orth_base |
str
|
Base form using current orthography. |
pron_base |
str
|
Pronunciation of the base form. |
goshu |
str
|
Word origin type (語種) e.g., Native, Sino-Japanese. |
i_type |
str
|
Word-initial transformation type. |
i_form |
str
|
Word-initial transformation form. |
f_type |
str
|
Word-final transformation type. |
f_form |
str
|
Word-final transformation form. |
clear_asterisks(data)
classmethod
¶
Cleans incoming dictionary fields by converting UniDic's "*" sentinel
and any missing (None) feature to an empty string, since unknown /
out-of-vocabulary tokens leave some features unset
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
dict
|
Raw dictionary data containing morphological fields |
required |
Returns:
| Name | Type | Description |
|---|---|---|
dict |
dict[str, Any]
|
The modified dictionary with "*"/ |