Cihai core functionality.

class cihai.core.Cihai(config={})

Central application object.


Inspired by the early pypa/warehouse applicaton object [1].

Configuration templates

The config dict parameter supports a basic template system for replacing XDG Base Directory directory variables, tildes and environmentas variables. This is done by passing the option dict through cihai.conf.expand_config() during initialization.


Invocation from Python

Cihai must be bootstrapped with data from the UNIHAN [2] database.

is_bootstrapped can check if the system has the database installed. It checks against the application’s configuration settings.

To bootstrap the cihai environment programatically, create the Cihai object and pass its metadata:

from cihai.core import Cihai
from cihai.bootstrap import bootstrap_unihan

c = Cihai()
if not c.is_bootstrapped:  # download and install Unihan to db
    c.reflect_db()         # automap new table created during bootstrap

query = c.lookup_char('好')
glyph = query.first()

query = c.reverse_char('good')
print(', '.join([glyph_.char for glyph_ in query]))


[1]UNICODE HAN DATABASE (UNIHAN) documentation. https://www.unicode.org/reports/tr38/. Accessed March 31st, 2018.
[2]PyPA Warehouse on GitHub. https://github.com/pypa/warehouse. Accessed sometime in 2013.
classmethod from_file(config_path=None, *args, **kwargs)

Create a Cihai instance from a JSON or YAML config.

Parameters:config_path (str, optional) – path to custom config file
Returns:application object
Return type:Cihai

Return character information from datasets.

Parameters:char (str) – character / string to lookup
Returns:list of matches
Return type:sqlalchemy.orm.query.Query

No-op to reflect db info.

This is available as a method so the database can be reflected outside initialization (such bootstrapping unihan during CLI usage).


Return QuerySet of objects from SQLAlchemy of results.

Parameters:hints (list of str) – strings to lookup
Returns:reverse matches
Return type:sqlalchemy.orm.query.Query
base = None

sqlalchemy.ext.automap.AutomapBase instance.

config = None

configuration dictionary.

default_config = {u'database': {u'url': u'sqlite:///{user_data_dir}/cihai.db'}, u'debug': False, u'dirs': {u'cache': u'{user_cache_dir}', u'data': u'{user_data_dir}', u'log': u'{user_log_dir}'}}

dict of default config, can be monkey-patched during tests

engine = None

sqlalchemy.engine.Engine instance.


Return True if UNIHAN and database is set up.

Returns:True if Unihan application fixture data installed.
Return type:bool
metadata = None

sqlalchemy.schema.MetaData instance.

session = None

sqlalchemy.orm.session.Session instance.


cihai.bootstrap.bootstrap_unihan(metadata, options={})

Download, extract and import unihan to database.

cihai.bootstrap.create_unihan_table(columns, metadata)

Create table and return sqlalchemy.Table.


Newly created table with columns and index.

Return type:



Return True if cihai is correctly bootstrapped.



Expand configuration XDG variables.

Parameters:d (dict) – config information


Environmentable variables are expanded via os.path.expandvars(). So ${PWD} would be replaced by the current PWD in the shell, ${USER} would be the user running the app.

XDG variables are expanded via str.format(). These do not have a dollar sign. They are:

  • {user_cache_dir}
  • {user_config_dir}
  • {user_data_dir}
  • {user_log_dir}
  • {site_config_dir}
  • {site_data_dir}
cihai.conf.DEFAULT_CONFIG = {u'database': {u'url': u'sqlite:///{user_data_dir}/cihai.db'}, u'debug': False, u'dirs': {u'cache': u'{user_cache_dir}', u'data': u'{user_data_dir}', u'log': u'{user_log_dir}'}}

Default configuration

cihai.conf.dirs = <appdirs.AppDirs object>

XDG App directory locations



Return EUC-CN (GB2312) hex to a Python unicode.

Parameters:hexstr (bytes) –
Returns:Python unicode e.g. u'\u4e00' / ‘一’.
Return type:unicode


>>> u'一'.encode('gb2312').decode('utf-8')
>>> (b'\x' + b'd2' + b'\x' + b'bb').replace('\x', '') \
... .decode('hex').decode('utf-8')

Note: bytes don’t have a .replace:

>>> gb_enc = gb_enc.replace('\x', '').decode('hex')
>>> gb_enc.decode('string_escape')  # Won't work with Python 3.x.

Convert EUC hex (e.g. “d2bb”) to UTF8 hex (e.g. “e4 b8 80”).


Convert GB2312-1980 hex (internal representation) to EUC-CN hex (the “external encoding”)


Convert GB kuten / quwei form (94 zones * 94 points) to GB2312-1980 / ISO-2022-CN hex (internal representation)

cihai.conversion.python_to_euc(uni_char, as_bytes=False)

Return EUC character from a Python Unicode character.

Converts a one character Python unicode string (e.g. u’u4e00’) to the corresponding EUC hex (‘d2bb’).

cihai.conversion.python_to_ucn(uni_char, as_bytes=False)

Return UCN character from Python Unicode character.

Converts a one character Python unicode string (e.g. u’u4e00’) to the corresponding Unicode UCN (‘U+4E00’).


Convert a Unicode Universal Character Number (e.g. “U+4E00” or “4E00”) to Python unicode (u’u4e00’)


Return string with Unicode UCN (e.g. “U+4E00”) to native Python Unicode (u’u4e00’).


Return ucnstring as Unicode.


When using cihai via Python, you can catch Cihai-specific exceptions via these. All Cihai-specific exceptions are catchable via CihaiException since its the base exception.

Exceptions raised from the Cihai library.

exception cihai.exc.CihaiException

Bases: exceptions.Exception

Base Cihai Exception class.


Utility and helper methods for cihai.

cihai.util.merge_dict(base, additional)

Combine two dictionary-like objects.


Code from https://github.com/pypa/warehouse Copyright 2013 Donald Stufft

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at


Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.


Return affirmative if python interpreter supports wide characters.

Returns:True if python supports wide character sets
Return type:bool