cihai - Python library for CJK (chinese, japanese, korean) data

Python Package Docs Build Status Code Coverage License

This project is under active development. Follow our progress and check back for updates!


API / Library (this repository)

$ pip install --user cihai
from cihai.core import Cihai

c = Cihai()

if not c.unihan.is_bootstrapped:  # download and install Unihan to db

query = c.unihan.lookup_char('好')
glyph = query.first()
print("lookup for 好: %s" % glyph.kDefinition)
# lookup for 好: good, excellent, fine; well

query = c.unihan.reverse_char('good')
print('matches for "good": %s ' % ', '.join([glph.char for glph in query]))
# matches for "good": 㑘, 㑤, 㓛, 㘬, 㙉, 㚃, 㚒, 㚥, 㛦, 㜴, 㜺, 㝖, 㤛, 㦝, ...

See API documentation and /examples.

CLI (cihai-cli)

$ pip install --user cihai[cli]
# character lookup
$ cihai info 好
char: 好
kCantonese: hou2 hou3
kDefinition: good, excellent, fine; well
kHangul: 호
kJapaneseOn: KOU
kKorean: HO
kMandarin: hǎo
kTang: '*xɑ̀u *xɑ̌u'
kTotalStrokes: '6'
kVietnamese: háo
ucn: U+597D

# reverse lookup
$ cihai reverse library
char: 圕
kCangjie: WLGA
kCantonese: syu1
kCihaiT: '308.302'
kDefinition: library
kMandarin: tú
kTotalStrokes: '13'
ucn: U+5715


All datasets that cihai uses have stand-alone tools to export their data. No library required.


poetry is a required package to develop.

git clone

cd cihai

poetry install -E "docs test coverage lint format"

Makefile commands prefixed with watch_ will watch files and rerun.


poetry run py.test

Helpers: make test Rerun tests on file change: make watch_test (requires entr(1))


Default preview server: http://localhost:8035

cd docs/ and make html to build. make serve to start http server.

Helpers: make build_docs, make serve_docs

Rebuild docs on file change: make watch_docs (requires entr(1))

Rebuild docs and run server via one terminal: make dev_docs (requires above, and a make(1) with -J support, e.g. GNU Make)

Formatting / Linting

The project uses black and isort (one after the other) and runs flake8 via CI. See the configuration in pyproject.toml and setup.cfg:

make black isort: Run black first, then isort to handle import nuances make flake8, to watch (requires entr(1)): make watch_flake8


As of 0.10, poetry handles virtualenv creation, package requirements, versioning, building, and publishing. Therefore there is no or requirements files.

Update __version__ in and pyproject.toml:

git commit -m 'build(cihai): Tag v0.1.1'
git tag v0.1.1
git push
git push --tags
poetry build
poetry deploy