Kotobase¶
¶
Kotobase is a Japanese language Python package which provides simple programmatic access to various data sources via a pre-built database which is updated weekly via a GitHub action.
Data Sources¶
Kotobase uses data from these sources to build its Database.
-
JMDict
: Japanese-Multilingual Dictionary. -
JMnedict
: A dictionary of Japanese proper names. -
KanjiDic2
: A comprehensive kanji dictionary. -
Tatoeba
: A large database of example sentences. -
JLPT Lists
: Curated list of Grammar, Vocabulary and Kanji separated by Japanese Language Proficiency Test levels, made available on Jonathan Weller's website.
Licenses¶
The licenses of these data sources and the NOTICE is available at
docs/licenses
in this repository.
Features¶
-
Comprehensive Lookups → Search for words (kanji, kana, or romaji), kanji, and proper names.
-
Organized Data → Get detailed information including readings, senses, parts of speech, kanji stroke counts, meanings, and JLPT levels formatted into Python Data Objects.
-
Example Sentences → Find example sentences from Tatoeba that contain the searched query.
-
Wildcard Search → Use
*
or%
for wildcard searches. -
Command-Line Interface → User-friendly CLI for quick lookups from the terminal.
-
Self-Contained → All data is stored in a local SQLite database, so it's fast and works offline.
-
Easy Database Management → Includes commands to automatically download the latest pre-built database from the public Drive or download source files and build the database locally.
Installation¶
- Install the package
pip install kotobase
This will install the
kotobase
package and its dependencies, and it will also make thekotobase
command-line tool available in your shell.
- Pull the Database from Drive or Build it locally by running of the commands below in the environment you installed kotobase
# Pull from Drive
kotobase pull-db
# Build locally
kotobase build
The database will be downloaded or built internally in the package at
kotobase/src/db/kotobase.db
and will be available for use.