Config
config
¶
Defines a registry for upstream sources and file-system constants for
kotobase's Build Pipeline
Data Storage + Acquisition
-
This module is the single source of truth for where raw data comes from and where build artifacts are written
-
Every upstream source is described by a
Sourcerecord inSOURCES -
Large, regenerable artifacts like raw downloads, the compiled database , and the optional audio pack live in a per user cache directory (
platformdirs) rather than inside the package -
The cache location can be overridden with the
KOTOBASE_CACHE_DIRenvironment variable -
The only data shipped inside the wheel is the small, already-processed Tanos JLPT
JSONfiles underkotobase.data
AUDIO_ASSET
module-attribute
¶
Defines the file name of the zstandard compressed audio database released as
an asset in RELEASE_REPO
AUDIO_DB_FILENAME
module-attribute
¶
Defines the File name of the optional audio database within the cache directory
DB_ASSET
module-attribute
¶
Defines the file name of of the zstandard compressed core database released
as an asset in RELEASE_REPO
DB_FILENAME
module-attribute
¶
Defines the file name of the core database within the cache directory
DEFAULT_CACHE_DIR
module-attribute
¶
The default directory in which the databases are stored when ENV_CACHE_DIR is not set
ENV_CACHE_DIR
module-attribute
¶
Defines the name of the environment variable that overrides the default cache directory in which the databses are saved
HOST_STORAGE
module-attribute
¶
Filesystem locations used to store the kotobase databases
All user-writable paths derive from platformdirs and are keyed by the app's
name
JLPT_KINDS
module-attribute
¶
The 3 processed Tanos JLPT list kinds in JSON format shipped with the
package
JLPT_LEVELS
module-attribute
¶
The five JLPT levels, where 1 is hardest and 5 is easiest
RELEASE_REPO
module-attribute
¶
Defines the GitHub repository which hosts the pre-built database assets
SOURCES
module-attribute
¶
SOURCES = {
"jmdict": Source(
key="jmdict",
license="CC-BY-SA-4.0",
homepage="https://www.edrdg.org/edrdg/licence.html",
url="http://ftp.edrdg.org/pub/Nihongo/JMdict_e.gz",
),
"jmnedict": Source(
key="jmnedict",
license="CC-BY-SA-4.0",
homepage="https://www.edrdg.org/edrdg/licence.html",
url="http://ftp.edrdg.org/pub/Nihongo/JMnedict.xml.gz",
),
"kanjidic2": Source(
key="kanjidic2",
license="CC-BY-SA-4.0",
homepage="https://www.edrdg.org/edrdg/licence.html",
url="http://www.edrdg.org/kanjidic/kanjidic2.xml.gz",
),
"kradzip": Source(
key="kradzip",
license="CC-BY-SA-4.0",
homepage="https://www.edrdg.org/edrdg/licence.html",
url="http://ftp.edrdg.org/pub/Nihongo/kradzip.zip",
),
"kanjivg": Source(
key="kanjivg",
license="CC-BY-SA-3.0",
homepage="https://kanjivg.tagaini.net/",
github_repo="KanjiVG/kanjivg",
asset_pattern="kanjivg-\\d+\\.xml\\.gz",
),
"jmdict_furigana": Source(
key="jmdict_furigana",
license="CC-BY-SA-4.0",
homepage="https://github.com/Doublevil/JmdictFurigana",
github_repo="Doublevil/JmdictFurigana",
asset="JmdictFurigana.json.tar.gz",
),
"jmnedict_furigana": Source(
key="jmnedict_furigana",
license="CC-BY-SA-4.0",
homepage="https://github.com/Doublevil/JmdictFurigana",
github_repo="Doublevil/JmdictFurigana",
asset="JmnedictFurigana.json.tar.gz",
optional=True,
),
"tatoeba_jpn": Source(
key="tatoeba_jpn",
license="CC-BY-2.0-FR",
homepage="https://tatoeba.org/eng/terms_of_use",
url="https://downloads.tatoeba.org/exports/per_language/jpn/jpn_sentences.tsv.bz2",
),
"tatoeba_eng": Source(
key="tatoeba_eng",
license="CC-BY-2.0-FR",
homepage="https://tatoeba.org/eng/terms_of_use",
url="https://downloads.tatoeba.org/exports/per_language/eng/eng_sentences.tsv.bz2",
),
"tatoeba_links": Source(
key="tatoeba_links",
license="CC-BY-2.0-FR",
homepage="https://tatoeba.org/eng/terms_of_use",
url="https://downloads.tatoeba.org/exports/links.tar.bz2",
),
"tatoeba_audio": Source(
key="tatoeba_audio",
license="CC-BY-2.0-FR",
homepage="https://tatoeba.org/eng/terms_of_use",
url="https://downloads.tatoeba.org/exports/sentences_with_audio.tar.bz2",
optional=True,
),
"kanjialive": Source(
key="kanjialive",
license="CC-BY-4.0",
homepage="https://kanjialive.com/",
url="https://media.kanjialive.com/examples_audio/audio-mp3.zip",
optional=True,
),
"kanjialive_data": Source(
key="kanjialive_data",
license="CC-BY-4.0",
homepage="https://kanjialive.com/",
url="https://raw.githubusercontent.com/kanjialive/kanji-data-media/master/language-data/ka_data.csv",
optional=True,
),
}
A dictionary mapping upstream source's names to their
Source definition
Includes all upstream sources used to build the kotobase database
USER_AGENT
module-attribute
¶
The User-Agent used for all HTTP requests made by the kotobase package
Source
dataclass
¶
Represents a single upstream data sources
Acquisition
-
A source is fetched either from a direct
urlor, for projects that publish GitHub releases, from the latest release ofgithub_repo -
In the release case, the asset is selected by the exact
assetname , or by theasset_patternregular expression
Attributes:
| Name | Type | Description |
|---|---|---|
key |
str
|
Stable identifier used as the registry key and cache file stem |
license |
str
|
SPDX style license identifier for the source data |
homepage |
str
|
Canonical homepage or repository for attribution |
url |
str | None
|
Direct download URL when the source is a plain file |
github_repo |
str | None
|
GitHub |
asset |
str | None
|
Exact release asset filename to download |
asset_pattern |
str | None
|
Regular expression matching the release asset filename when the name carries a version or date |
optional |
bool
|
True when the source may be skipped without failing a core build, for example the audio sources |
audio_db_path
¶
Resolve the path of the optional audio pack database
Returns:
| Type | Description |
|---|---|
Path
|
The path of the audio |
cache_dir
¶
Resolve the kotobase cache directory
The directory is taken from the KOTOBASE_CACHE_DIR environment variable
when set, otherwise it falls back to the per-user cache location reported
by platformdirs
Returns:
| Type | Description |
|---|---|
Path
|
The resolved cache directory, which may not exist yet |
db_path
¶
Resolves the file path of the compiled core database
Returns:
| Type | Description |
|---|---|
Path
|
The path of the core |
jlpt_file
¶
Resolves the path to a single processed Tanos JLPT list JSON file
shipped in the package
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kind
|
str
|
One of the values in
|
required |
level
|
int
|
A JLPT level from 1 to 5 |
required |
Returns:
| Type | Description |
|---|---|
Path
|
The absolute path to the file within the installed package |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |