API

Cihai core functionality.

class cihai.core.Cihai(config={})

Cihai application object.

Inspired by the early pypa/warehouse applicaton object.

Invocation from python:

Note: For Cihai to be used properly, it must be first bootstrapped with the UNIHAN database. is_bootstrapped to return if the database is installed for the app’s configuration settings.

To bootstrap the cihai environment programatically, create the Cihai object and pass its metadata:

from cihai.core import Cihai
from cihai.bootstrap import bootstrap_unihan

c = Cihai()
if not c.is_bootstrapped:  # download and install Unihan to db
    bootstrap_unihan(c.metadata)
    c.reflect_db()         # automap new table created during bootstrap

query = c.lookup_char('好')
glyph = query.first()
print(glyph.kDefinition)

query = c.reverse_char('good')
print(', '.join([glyph_.char for glyph_ in query]))

Configuration templates:

The config dict parameter supports a basic template system for replacing XDG Base Directory directory variables, tildes and environmentas variables. This is done by passing the option dict through cihai.conf.expand_config() during initialization.

classmethod from_file(config_path=None, *args, **kwargs)

Create a Cihai instance from a JSON or YAML config.

Parameters:config_path (str) – path to custom config file
Return type:Cihai
lookup_char(char)

Return character information from datasets.

Parameters:char (str) – character / string to lookup
Return type:sqlalchemy.orm.query.Query
Returns:list of matches
reflect_db()

No-op to reflect db info.

This is available as a method so the database can be reflected outside initialization (such bootstrapping unihan during CLI usage).

reverse_char(hints)

Return QuerySet of objects from SQLAlchemy of results.

Parameters:hints (list of str) – list of matches
Return type:sqlalchemy.orm.query.Query
Returns:List of matching results
base = None

sqlalchemy.ext.automap.AutomapBase instance.

config = None

configuration dictionary.

default_config = {u'debug': False, u'dirs': {u'cache': u'{user_cache_dir}', u'data': u'{user_data_dir}', u'log': u'{user_log_dir}'}, u'database': {u'url': u'sqlite:///{user_data_dir}/cihai.db'}}

dict of default config, can be monkey-patched during tests

engine = None

sqlalchemy.engine.Engine instance.

is_bootstrapped

Return True if UNIHAN and database is set up.

metadata = None

sqlalchemy.schema.MetaData instance.

session = None

sqlalchemy.orm.session.Session instance.

Configuration

cihai.conf.expand_config(d)

Expand configuration XDG variables.

Environmentable variables are expanded via os.path.expandvars(). So ${PWD} would be replaced by the current PWD in the shell, ${USER} would be the user running the app.

XDG variables are expanded via str.format(). These do not have a dollar sign. They are:

  • {user_cache_dir}
  • {user_config_dir}
  • {user_data_dir}
  • {user_log_dir}
  • {site_config_dir}
  • {site_data_dir}
Parameters:d (dict) – dictionary of config info
cihai.conf.DEFAULT_CONFIG = {u'debug': False, u'dirs': {u'cache': u'{user_cache_dir}', u'data': u'{user_data_dir}', u'log': u'{user_log_dir}'}, u'database': {u'url': u'sqlite:///{user_data_dir}/cihai.db'}}

Default configuration

cihai.conf.dirs = <appdirs.AppDirs object>

XDG App directory locations

Conversion

cihai.conversion.euc_to_unicode(hexstr)

Return EUC-CN (GB2312) hex to a Python unicode.

Parameters:hexstr – bytestring
Returns:Python unicode e.g. u'\u4e00' / ‘一’.
Return type:unicode
'Ò»'
>>> u'一'.encode('gb2312').decode('utf-8')
u'һ'
>>> (b'\x' + b'd2' + b'\x' + b'bb').replace('\x', '') \
... .decode('hex').decode('utf-8')
u'һ'

# bytes won't have ``.replace``.
gb_enc = gb_enc.replace('\x', '').decode('hex')

gb_enc.decode('string_escape')  # Won't work with Python 3.x.
cihai.conversion.euc_to_utf8(euchex)

Convert EUC hex (e.g. “d2bb”) to UTF8 hex (e.g. “e4 b8 80”)

cihai.conversion.gb2312_to_euc(gb2312hex)

Convert GB2312-1980 hex (internal representation) to EUC-CN hex (the “external encoding”)

cihai.conversion.kuten_to_gb2312(kuten)

Convert GB kuten / quwei form (94 zones * 94 points) to GB2312-1980 / ISO-2022-CN hex (internal representation)

cihai.conversion.python_to_euc(uni_char, as_bytes=False)

Return EUC character from a Python Unicode character.

Converts a one character Python unicode string (e.g. u’u4e00’) to the corresponding EUC hex (‘d2bb’).

cihai.conversion.python_to_ucn(uni_char, as_bytes=False)

Return UCN character from Python Unicode character.

Converts a one character Python unicode string (e.g. u’u4e00’) to the corresponding Unicode UCN (‘U+4E00’).

cihai.conversion.ucn_to_unicode(ucn)

Convert a Unicode Universal Character Number (e.g. “U+4E00” or “4E00”) to Python unicode (u’u4e00’)

cihai.conversion.ucnstring_to_python(ucn_string)

Return string with Unicode UCN (e.g. “U+4E00”) to native Python Unicode (u’u4e00’).

cihai.conversion.ucnstring_to_unicode(ucn_string)

Return ucnstring as Unicode.

Exceptions

When using cihai via Python, you can catch Cihai-specific exceptions via these. All Cihai-specific exceptions are catchable via CihaiException since its the base exception.

Exceptions for Cihai.

cihai.exc

exception cihai.exc.CihaiException

Bases: exceptions.Exception

Base Cihai Exception class.

Utilities

Utility and helper methods for cihai.

cihai.util

cihai.util.supports_wide()

Return if python interpreter supports wide characters.

Returns:Returns True if python supports wide character sets.
Return type:bool