Features#

  • Handling CJK Variants

    cihai builds upon UNIHAN: “thousands of years worth of writing have produced thousands of pairs which can be used more-or-less interchangeably.” For more information, see “Unification Rules” on page 679 of The Unicode Standard (.pdf).

  • Extensibie

    cihai will be able to pull remote CJK datasets.

    In addition, the handling of variants will create new ways to discover and interpret CJK characters while using these datasets.

  • Python API and CLI application

    Cihai can be used as a Python API as well as a command line application via $ cihai.

  • Asian encoding swiss army knife

    Functions under the hood such as cihai.conversion are tested across python implementations to handle a growing assortment of Asian encodings.