cihai · ¶
Python library for CJK (chinese, japanese, korean) data.
This project is under active development. Follow our progress and check back for updates!
Quickstart¶
API / Library (this repository)¶
$ pip install --user cihai
from cihai.core import Cihai
c = Cihai()
if not c.unihan.is_bootstrapped: # download and install Unihan to db
c.unihan.bootstrap()
query = c.unihan.lookup_char('好')
glyph = query.first()
print("lookup for 好: %s" % glyph.kDefinition)
# lookup for 好: good, excellent, fine; well
query = c.unihan.reverse_char('good')
print('matches for "good": %s ' % ', '.join([glph.char for glph in query]))
# matches for "good": 㑘, 㑤, 㓛, 㘬, 㙉, 㚃, 㚒, 㚥, 㛦, 㜴, 㜺, 㝖, 㤛, 㦝, ...
CLI (cihai-cli)¶
$ pip install --user cihai-cli
Character lookup:
$ cihai info 好
char: 好
kCantonese: hou2 hou3
kDefinition: good, excellent, fine; well
kHangul: 호
kJapaneseOn: KOU
kKorean: HO
kMandarin: hǎo
kTang: "*xɑ̀u *xɑ̌u"
kTotalStrokes: "6"
kVietnamese: háo
ucn: U+597D
Reverse lookup:
$ cihai reverse library
char: 圕
kCangjie: WLGA
kCantonese: syu1
kCihaiT: '308.302'
kDefinition: library
kMandarin: tú
kTotalStrokes: '13'
ucn: U+5715
--------
UNIHAN data¶
All datasets that cihai uses have stand-alone tools to export their data. No library required.
unihan-etl - UNIHAN data exports for csv, yaml and json.
Developing¶
$ git clone https://github.com/cihai/cihai.git`
$ cd cihai/
Bootstrap your environment and learn more about contributing. We use the same conventions / tools across all cihai projects: pytest
, sphinx
, mypy
, ruff
, tmuxp
, and file watcher helpers (e.g. entr(1)
).
Python versions¶
0.19.0: Last Python 3.7 release
Quick links¶
Datasets a full list of current and future data sets
Python API
Python support: >= 3.8, pypy
Source: https://github.com/cihai/cihai
Changelog: https://cihai.git-pull.com/history.html
Test coverage: https://codecov.io/gh/cihai/cihai
OpenHub: https://www.openhub.net/p/cihai
License: MIT