Skip to content

Kotobase

A comprehensive, openly-licensed Japanese language database.

Kotobase aggregates several openly-licensed Japanese language data sources into one SQLite database and exposes simple programmatic and command line access to it

Install

pip install kotobase

Get The Database

Database
  • The compiled database is a pre-requisite and is not bundled with the package due to its size

  • The core database contains all sources except for the Kanji Alive audio clips and is a ~400MB SQLite file

  • The optional audio database adds ~150MB to that size

  • There are 2 way to get both of them

Both databases are rebuilt weekly with updated sources via a GitHub Action and appended as assets to the Latest Kotobase GitHub Release

kotobase db pull  # (1)!

kotobase db pull --no-audio # (2)!

  1. Download The Core + Audio Databases
  2. Download Only The Core Database

You can also easily download the most up-to-date sources and build both databases yourself via the CLI in ~2-3 minutes

kotobase db build  # (1)!

kotobase db build --no-audio  # (2)!

  1. Download All Sources & Build The Core + Audio Databases
  2. Download All Sources & Build Only The Core Database

Use It

kotobase lookup all 日本語  # (1)!

kotobase lookup kanji   # (2)!
  1. Comprehensive Lookup Across Every Source
  2. A Single Kanji Profile
from kotobase import Kotobase

kb = Kotobase()

result = kb("日本語")  # (1)!
print(result.to_json())

kanji = kb.kanji("語")
print(kanji.meanings, kanji.onyomi, kanji.kunyomi)
  1. Alias For kb.lookup("日本語")

Features

Comprehensive Lookups One lookup all Query Aggregates Data From All Souces
Organized Data Every Source Is Fully Extracted Into A Normalized SQLite Schema & Exposed As Typed, Serializable DTOs
Example Sentences Search Tatoeba Example Sentences + Their English Translation By Text
Wildcard Search Match Written / Reading Forms With * & % Wildcard Patterns
CLI A Typer + Rich CLI With Readable, Panelled Output & --json For Scripting
Self-Contained A Single SQLite (~400MB) File + Optional Audio Pack (~150MB) With No Server / Network Access Needed At Query Time
Easy Database Management Pull Pre-Built Databases From GitHub Releases Or Build It Locally + Manage The Cache From The CLI

More Information