Models
models
¶
Defines kotobase's database SQLAlchemy 2.0 typed ORM schema
Schema
-
Preserves every field from the upstream sources
-
Includes the
JMdictpart of speech, register, field, dialect, sense information and priority tags + furigana segmentation fromjmdict_furigana -
Inclues the
JMNedictkanji, kana, and translation information + furigana segmentation fromjmnedict_furigana -
Includes the full
KanjiDic2reading, meaning and reference set, radical decompositions fromKRADFILE, stroke orders fromKanjiVG, and pronunciation audios fromkanjialive(separate database, attached when present) -
Includes japanese example sentence / translation pairs and audio provenance from
Tatoeba -
Includes grammar, kanji, and vocabulary information extracted from the
Tanos JLPT Lists
Data Format
-
List shaped, read-only values such as tag code lists, cross references and furigana segments are stored in JSON columns using the SQLite
json1extension -
Anything that is searched or joined is normalized into its own table
-
Full text search is provided by
FTS5virtual tables that theBuild Pipelinecreates at build time, so they are not declared here
Versioning
-
The schema is versioned by
db_meta['schema_version'] -
Bump
SCHEMA_VERSIONwhenever the table layout changes so that a stale database can be detected
SCHEMA_VERSION = 1
module-attribute
¶
Layout version stored in db_meta and checked by the read layer
Audio
¶
Bases: Base
A pronunciation audio clip together with its provenance
Optional Audio Pack
-
The
datacolumn is filled only in the optional audio pack -
In the core database, a row may instead carry a
urlthat points at a remote clip, for example aTatoebarecording
Attributes:
| Name | Type | Description |
|---|---|---|
id |
int
|
Primary key row identifier |
kind |
str
|
What the clip pronounces, such as |
key |
str
|
Lookup key for the clip, such as a kanji, a word or a sentence identifier |
reading |
str | None
|
The reading the clip pronounces when relevant |
fmt |
str | None
|
Audio container or codec such as |
sample_rate |
int | None
|
Sample rate of the clip in hertz |
data |
bytes | None
|
Raw audio bytes when bundled in the audio pack |
url |
str | None
|
Remote location of the clip when it is not bundled |
source |
str
|
Name of the upstream source, such as |
license |
str | None
|
License identifier for the clip |
attribution |
str | None
|
Required attribution text or link |
Base
¶
Bases: DeclarativeBase
Declarative base class shared by every kotobase ORM model
DbMeta
¶
Furigana
¶
Bases: Base
Furigana segmentation for a spelling and reading pair
Attributes:
| Name | Type | Description |
|---|---|---|
id |
int
|
Primary key row identifier |
text |
str
|
The written spelling, usually containing kanji |
reading |
str
|
The full kana reading of the spelling |
segments |
list[dict]
|
The JmdictFurigana segmentation, a list of
|
JMDictEntry
¶
Bases: Base
A single JMdict dictionary entry
Represents one ent_seq record from JMdict, which is the root of a
Japanese to English entry. Its surface forms, readings and senses are
attached through relationships
Attributes:
| Name | Type | Description |
|---|---|---|
id |
int
|
Primary key, the JMdict |
is_common |
bool
|
True when any form of the entry carries a priority marker that classifies it as common |
freq_rank |
int | None
|
Frequency rank where a lower value is more frequent, or None when the entry has no priority information |
kanji |
list[JMDictKanji]
|
Ordered kanji surface forms of the entry |
kana |
list[JMDictKana]
|
Ordered kana readings of the entry |
senses |
list[JMDictSense]
|
Ordered senses, each holding its glosses |
JMDictGloss
¶
Bases: Base
A single gloss (translation) belonging to a JMdict sense
Attributes:
| Name | Type | Description |
|---|---|---|
id |
int
|
Primary key row identifier |
sense_id |
int
|
Foreign key to
|
position |
int
|
Zero based order of the gloss within the sense |
lang |
str
|
ISO 639 language code of the gloss, defaults to |
text |
str
|
The translated text |
gender |
str | None
|
Grammatical gender of the gloss when given |
gtype |
str | None
|
Gloss type such as |
sense |
JMDictSense
|
The owning sense |
JMDictKana
¶
Bases: Base
A kana (reading) form of a JMdict entry
Attributes:
| Name | Type | Description |
|---|---|---|
id |
int
|
Primary key row identifier |
entry_id |
int
|
Foreign key to JMDictEntry |
position |
int
|
Zero based order of the reading within the entry |
text |
str
|
The kana reading |
is_common |
bool
|
True when this reading carries a common priority marker |
no_kanji |
bool
|
True when the reading is not a reading of any kanji form |
restrictions |
list[str]
|
Kanji forms this reading is restricted to, empty when the reading applies to all kanji forms |
info |
list[str]
|
Reading information tag codes such as |
priority |
list[str]
|
Priority code list such as |
entry |
JMDictEntry
|
The owning entry |
JMDictKanji
¶
Bases: Base
A kanji (written) surface form of a JMdict entry
Attributes:
| Name | Type | Description |
|---|---|---|
id |
int
|
Primary key row identifier |
entry_id |
int
|
Foreign key to JMDictEntry |
position |
int
|
Zero based order of the form within the entry |
text |
str
|
The kanji spelling |
is_common |
bool
|
True when this form carries a common priority marker |
info |
list[str]
|
Spelling information tag codes such as |
priority |
list[str]
|
Priority code list such as |
entry |
JMDictEntry
|
The owning entry |
JMDictSense
¶
Bases: Base
A sense (one meaning) of a JMdict entry
A sense groups one or more glosses that share the same part of speech and usage information
Misc Tags
The misc list carries register and slang markers such as sl
(slang), net-sl (internet slang), col (colloquial) and vulg
(vulgar)
Attributes:
| Name | Type | Description |
|---|---|---|
id |
int
|
Primary key row identifier |
entry_id |
int
|
Foreign key to
|
position |
int
|
Zero based order of the sense within the entry |
pos |
list[str]
|
Part of speech tag codes such as |
field |
list[str]
|
Field of application tag codes such as |
misc |
list[str]
|
Miscellaneous register tag codes, see the note above |
dialect |
list[str]
|
Dialect tag codes such as |
info |
list[str]
|
Free text sense information notes |
xref |
list[str]
|
Cross reference targets to related entries |
antonym |
list[str]
|
Antonym references for the sense |
applies_to_kanji |
list[str]
|
Kanji forms the sense is restricted to, empty when it applies to all kanji forms |
applies_to_kana |
list[str]
|
Kana forms the sense is restricted to, empty when it applies to all kana forms |
lsource |
list[dict]
|
Source language records, each holding the language, text, type and a wasei flag for loanwords |
entry |
JMDictEntry
|
The owning entry |
glosses |
list[JMDictGloss]
|
Ordered glosses belonging to the sense |
JMnedictEntry
¶
Bases: Base
A JMnedict proper name entry
Attributes:
| Name | Type | Description |
|---|---|---|
id |
int
|
Primary key, the JMnedict sequence number |
kanji |
list[JMnedictKanji]
|
Kanji forms of the name |
kana |
list[JMnedictKana]
|
Kana forms of the name |
translations |
list[JMnedictTranslation]
|
Ordered translation blocks |
JMnedictGloss
¶
Bases: Base
A single translated name belonging to a JMnedict translation block
Attributes:
| Name | Type | Description |
|---|---|---|
id |
int
|
Primary key row identifier |
translation_id |
int
|
Foreign key to
|
position |
int
|
Zero based order of the gloss within the block |
lang |
str
|
ISO 639 language code of the gloss, defaults to |
text |
str
|
The translated name text |
translation |
JMnedictTranslation
|
The owning translation block |
JMnedictKana
¶
Bases: Base
A kana form of a JMnedict entry
Attributes:
| Name | Type | Description |
|---|---|---|
id |
int
|
Primary key row identifier |
entry_id |
int
|
Foreign key to
|
position |
int
|
Zero based order of the form within the entry |
text |
str
|
The kana reading of the name |
entry |
JMnedictEntry
|
The owning entry |
JMnedictKanji
¶
Bases: Base
A kanji form of a JMnedict entry
Attributes:
| Name | Type | Description |
|---|---|---|
id |
int
|
Primary key row identifier |
entry_id |
int
|
Foreign key to
|
position |
int
|
Zero based order of the form within the entry |
text |
str
|
The kanji spelling of the name |
entry |
JMnedictEntry
|
The owning entry |
JMnedictTranslation
¶
Bases: Base
A translation block of a JMnedict entry
Each block records the kind of name and holds one or more translated glosses
Attributes:
| Name | Type | Description |
|---|---|---|
id |
int
|
Primary key row identifier |
entry_id |
int
|
Foreign key to JMnedictEntry |
position |
int
|
Zero based order of the block within the entry |
name_type |
list[str]
|
Name type tag codes such as |
xref |
list[str]
|
Cross reference targets to related entries |
entry |
JMnedictEntry
|
The owning entry |
glosses |
list[JMnedictGloss]
|
Ordered translated names in this block |
JlptGrammar
¶
Bases: Base
A Tanos JLPT grammar point
Note
The formation and examples columns are kept for forward
compatibility. The current Tanos data does not populate them, so they
are normally empty
Attributes:
| Name | Type | Description |
|---|---|---|
id |
int
|
Primary key row identifier |
level |
int
|
JLPT level from 1 (hardest) to 5 (easiest) |
grammar |
str
|
The grammar point itself |
formation |
str | None
|
How the grammar point is formed when known |
examples |
list[str]
|
Example sentences for the grammar point |
JlptKanji
¶
Bases: Base
A Tanos JLPT kanji item
Attributes:
| Name | Type | Description |
|---|---|---|
id |
int
|
Primary key row identifier |
level |
int
|
JLPT level from 1 (hardest) to 5 (easiest) |
kanji |
str
|
The kanji character |
on_yomi |
str | None
|
On readings, space separated |
kun_yomi |
str | None
|
Kun readings, space separated |
meaning |
str | None
|
The English meaning, with senses comma separated |
JlptVocab
¶
Bases: Base
A Tanos JLPT vocabulary item
Attributes:
| Name | Type | Description |
|---|---|---|
id |
int
|
Primary key row identifier |
level |
int
|
JLPT level from 1 (hardest) to 5 (easiest) |
word |
str | None
|
The headword, written with kanji when one exists and falling back to the kana reading otherwise |
reading |
str | None
|
The kana reading of the headword |
meaning |
str | None
|
The English meaning, with senses comma separated |
Kanji
¶
Bases: Base
A KanjiDic2 character and its scalar attributes
The repeating attributes of a character such as readings, meanings and references live in dedicated child tables that are reachable through the relationships below
Attributes:
| Name | Type | Description |
|---|---|---|
literal |
str
|
Primary key, the kanji character itself |
grade |
int | None
|
School grade in which the kanji is taught |
stroke_count |
int | None
|
Accepted stroke count |
freq |
int | None
|
Newspaper frequency rank where a lower value is more frequent |
jlpt_old |
int | None
|
Pre 2010 four level JLPT class from KanjiDic2 |
rad_classical |
int | None
|
Classical (Kangxi) radical number |
rad_nelson |
int | None
|
Nelson radical number when it differs |
stroke_miscounts |
list[int]
|
Alternative miscount stroke values |
readings |
list[KanjiReading]
|
On, kun and foreign readings |
meanings |
list[KanjiMeaning]
|
Meanings keyed by language |
nanori |
list[KanjiNanori]
|
Name only readings |
dic_refs |
list[KanjiDicRef]
|
External dictionary references |
query_codes |
list[KanjiQueryCode]
|
Lookup codes such as SKIP |
variants |
list[KanjiVariant]
|
Variant form references |
codepoints |
list[KanjiCodepoint]
|
Character encoding codepoints |
strokes |
KanjiStrokes | None
|
KanjiVG stroke order data when present |
KanjiCodepoint
¶
KanjiDicRef
¶
Bases: Base
An external dictionary reference for a kanji
Attributes:
| Name | Type | Description |
|---|---|---|
id |
int
|
Primary key row identifier |
literal |
str
|
Foreign key to |
type |
str
|
Reference type such as |
value |
str
|
The reference value within that dictionary |
extra |
dict | None
|
Extra metadata, for example volume and page for Morohashi references |
kanji |
Kanji
|
The owning kanji |
KanjiMeaning
¶
Bases: Base
A meaning of a kanji in a given language
Attributes:
| Name | Type | Description |
|---|---|---|
id |
int
|
Primary key row identifier |
literal |
str
|
Foreign key to |
lang |
str
|
ISO 639 language code of the meaning, defaults to |
value |
str
|
The meaning text |
position |
int
|
Zero based order of the meaning for its language |
kanji |
Kanji
|
The owning kanji |
KanjiNanori
¶
KanjiQueryCode
¶
Bases: Base
A lookup query code for a kanji
Attributes:
| Name | Type | Description |
|---|---|---|
id |
int
|
Primary key row identifier |
literal |
str
|
Foreign key to |
type |
str
|
Code type such as |
value |
str
|
The code value |
skip_misclass |
str | None
|
SKIP misclassification kind when present |
kanji |
Kanji
|
The owning kanji |
KanjiRadical
¶
Bases: Base
A kanji to radical decomposition edge, taken from KRADFILE
Each row records that a kanji contains a given radical component. The pair of kanji and radical is unique
Attributes:
| Name | Type | Description |
|---|---|---|
id |
int
|
Primary key row identifier |
literal |
str
|
The kanji that contains the radical |
radical |
str
|
The radical component contained in the kanji |
KanjiReading
¶
Bases: Base
A reading of a kanji
Attributes:
| Name | Type | Description |
|---|---|---|
id |
int
|
Primary key row identifier |
literal |
str
|
Foreign key to |
type |
str
|
Reading type such as |
value |
str
|
The reading text |
position |
int
|
Zero based order of the reading for its type |
kanji |
Kanji
|
The owning kanji |
KanjiStrokes
¶
Bases: Base
KanjiVG stroke order data for a kanji
Licensing
Provenance and licensing for KanjiVG are recorded once in db_meta
rather than on every row to keep the table small
Attributes:
| Name | Type | Description |
|---|---|---|
literal |
str
|
Primary key and foreign key to
|
stroke_count |
int | None
|
Number of strokes in the diagram |
svg |
str
|
Serialized KanjiVG |
kanji |
Kanji
|
The owning kanji |
KanjiVariant
¶
Radical
¶
Sentence
¶
Bases: Base
A Tatoeba sentence in a single language
A row is either a Japanese sentence or an English translation of one. The
lang column distinguishes them
Attributes:
| Name | Type | Description |
|---|---|---|
id |
int
|
Primary key, the Tatoeba sentence identifier |
lang |
str
|
ISO 639 language code of the sentence |
text |
str
|
The sentence text |
SentenceLink
¶
Tag
¶
Bases: Base
An entity tag code and its human readable description
Populated from the JMdict and JMnedict <!ENTITY> definitions so that
codes such as sl (slang) or ksb (Kansai dialect) can be expanded to
text
Attributes:
| Name | Type | Description |
|---|---|---|
code |
str
|
Primary key part, the tag code as it appears in the source |
category |
str
|
Primary key part, the tag family such as |
description |
str
|
Human readable description of the tag |