Internal Design decisions¶
Convenient relational, cohesive output of datasets¶
Whether you are Cihai, python, creating a CJK tool in another programming language, or simple looking to use a dataset, cihai will provide tools to turn raw datasets into a familiar table / relational friendly format.
Certain datasets may also offer to download the latest datasets from a server as well.
Cihai is a tool to convert CJK datasets to a common, relational format and provide a convient Python API for studying deeply.
Cihai’s core functionality relies on a
an entry on a master table of chinese characters / string which are
assigned an integer ID for use as a
ForeignKey. This way, datasets
added to Cihai can achieve fast lookups and JOIN’s via integers instead of
If the word has multiple characters, Cihai will create the character ID for you and cite it for you and all future entries added to your dataset will reference it.
This makes installing datasets in Cihai expensive (requiring a lookup of a unicode string in a central table) before inserting into database, with the benefit of potentially returning a huge cross-section of available CJK data in one swift stroke.