Handling CJK Variants
cihai builds upon UNIHAN to handle variants: “thousands of years worth of writing have produced thousands of pairs which can be used more-or-less interchangeably.” For more information, see “Unification Rules” on page 679 of The Unicode Standard (.pdf).
cihai will be able to pull remote CJK datasets.
In addition, the handling of variants will create new ways to discover and interpret CJK characters while using these datasets.
Python API and CLI application
Cihai can be used as a Python API as well as a command line application via
Asian encoding swiss army knife
Functions under the hood such as cihai.conversion are tested across python implementations to handle a growing assortment of Asian encodings.