API#
Cihai core functionality.
- exception cihai.core.CihaiConfigError[source]#
Bases:
CihaiException
- class cihai.core.Cihai(config=None, unihan=True)[source]#
Bases:
object
Central application object.
By default, this automatically adds the UNIHAN dataset.
Notes
Inspired by the early pypa/warehouse application object [1]_.
Configuration templates
The
config
dict
parameter supports a basic template system for replacing XDG Base Directory directory variables, tildes and environmentas variables. This is done by passing the option dict throughcihai.config.expand_config()
during initialization.Examples
To use cihai programmatically, invoke and install the UNIHAN [2]_ dataset:
#!/usr/bin/env python import typing as t from cihai.core import Cihai def run(unihan_options: t.Optional[t.Dict[str, object]] = None) -> None: if unihan_options is None: unihan_options = {} c = Cihai() if not c.unihan.is_bootstrapped: # download and install Unihan to db c.unihan.bootstrap(unihan_options) query = c.unihan.lookup_char("好") glyph = query.first() assert glyph is not None print("lookup for 好: %s" % glyph.kDefinition) query = c.unihan.reverse_char("good") print('matches for "good": %s ' % ", ".join([glph.char for glph in query])) if __name__ == "__main__": run()
Above:
is_bootstrapped
can check if the system has the database installed.References
- Parameters:
config (dict, optional) –
unihan (boolean, optional) – Bootstrap the core UNIHAN dataset (recommended)
- default_config: UntypedDict = {'database': {'url': 'sqlite:///{user_data_dir}/cihai.db'}, 'datasets': {}, 'debug': False, 'dirs': {'cache': PosixPath('/home/runner/.cache/cihai'), 'data': PosixPath('/home/runner/.local/share/cihai'), 'log': PosixPath('/home/runner/.cache/cihai/log')}, 'plugins': {}}#
dict
of default config, can be monkey-patched during tests
- config: ConfigDict#
Configuration#
- cihai.config.expand_config(d, dirs=<appdirs.AppDirs object>)[source]#
Expand configuration XDG variables, environmental variables, and tildes.
- Return type:
- Parameters:
d (dict) – config information
dirs (appdirs.AppDirs) – XDG application mapping
Notes
Environmentable variables are expanded via
os.path.expandvars()
. So${PWD}
would be replaced by the current PWD in the shell,${USER}
would be the user running the app.XDG variables are expanded via
str.format()
. These do not have a dollar sign. They are:{user_cache_dir}
{user_config_dir}
{user_data_dir}
{user_log_dir}
{site_config_dir}
{site_data_dir}
See also
Database#
Cihai core functionality.
- class cihai.db.Database(config)[source]#
Bases:
object
Cihai SQLAlchemy instance
- base: AutomapBase#
sqlalchemy.ext.automap.AutomapBase
instance.
- engine: Engine#
sqlalchemy.engine.Engine
instance.
-
metadata:
MetaData
# sqlalchemy.schema.MetaData
instance.
-
session:
Session
# sqlalchemy.orm.session.Session
instance.
Extending#
Cihai Plugin System
Status: Experimental, API can change
As a pilot, the UNIHAN library, and an plugin for it, in #131 [1]_
You can bring any data layout / backend you like to cihai.
For convenience, you can use cihai’s configuration namespace and SQLAlchemy settings.
You can also create plugins which extend another. So if Unihan doesn’t have a lookup for variant glyphs, this can be added.
- class cihai.extend.ConfigMixin[source]#
Bases:
object
This piggybacks cihai’s global config state, as well as your datasets.
Cihai will automatically manage the user’s config, as well as your datasets, neatly in XDG.
- Raises:
Functions inside, and what you write relating to dataset config should return –
CihaiDatasetConfigException (CihaiDatasetException) –
config.cihai = links directly back to Cihai's configuration dictionary –
(todo note – make this non-mutable property):
:raises config : dict: your local user’s config :raises check() : function, optional: this is ran on start. it can raise DatasetConfigException :raises default_config : your dataset’s default configuration: :raises get_default_config : override function in case you’d like custom configs (for: instance if you want a platform to use a different db driver, or do version checks, etc.) internal functions use get_default_config()
- class cihai.extend.SQLAlchemyMixin[source]#
Bases:
object
Your dataset can use any backend you’d like, we provide a backend for you, that automatically piggybacks on cihai’s zero-config, XDG / SQLAchemy configuration. So it’s preconfigured for the user.
In addition, this mixin gives you access to any other of the user’s sqlalchemy sql that use this mixin. So if you want a dataset that utilizes UNIHAN, you can access that easily.
This will provide the following instance-level properties in methods:
When you have access, it’s expected to keep your tables / databases namespaced so they don’t clobber.
- sql: Database#
- engine: Engine#
sqlalchemy.engine.Engine
instance.
- metadata: MetaData#
sqlalchemy.schema.MetaData
instance.
- session: Session#
sqlalchemy.orm.session.Session
instance.
- base: AutomapBase#
sqlalchemy.ext.automap.AutomapBase
instance.
- class cihai.extend.Dataset[source]#
Bases:
object
Cihai dataset, e.g. UNIHAN.
See also
cihai.data.unihan.dataset.Unihan
reference implementation
- class cihai.extend.DatasetPlugin[source]#
Bases:
object
Extend the functionality of datasets with custom methods, actions, etc.
See also
cihai.data.unihan.dataset.UnihanVariants
reference implementation
Constants#
- cihai.constants.app_dirs = <appdirs.AppDirs object>#
XDG App directory locations
- cihai.constants.DEFAULT_CONFIG: UntypedDict = {'database': {'url': 'sqlite:///{user_data_dir}/cihai.db'}, 'datasets': {}, 'debug': False, 'dirs': {'cache': PosixPath('/home/runner/.cache/cihai'), 'data': PosixPath('/home/runner/.local/share/cihai'), 'log': PosixPath('/home/runner/.cache/cihai/log')}, 'plugins': {}}#
Default configuration
- cihai.constants.UNIHAN_CONFIG: UntypedDict = {'datasets': {'unihan': 'cihai.data.unihan.dataset.Unihan'}}#
User will be prompted to automatically configure their installation for UNIHAN
UNIHAN Dataset#
Bootstrapping#
- cihai.data.unihan.bootstrap.is_bootstrapped(metadata)[source]#
Return True if cihai is correctly bootstrapped.
- Return type:
- cihai.data.unihan.bootstrap.create_unihan_table(columns, metadata)[source]#
Create table and return
sqlalchemy.Table
.- Return type:
- Parameters:
columns (list) – columns for table, e.g.
['kDefinition', 'kCantonese']
metadata (
sqlalchemy.schema.MetaData
) – Instance of sqlalchemy metadata
- Returns:
Newly created table with columns and index.
- Return type:
- class cihai.data.unihan.dataset.Unihan[source]#
Bases:
Dataset
,SQLAlchemyMixin
- lookup_char(char)[source]#
Return character information from datasets.
- Return type:
- Parameters:
char (str) – character / string to lookup
- Returns:
list of matches
- Return type:
- reverse_char(hints)[source]#
Return QuerySet of objects from SQLAlchemy of results.
- Return type:
- Parameters:
- Returns:
reverse matches
- Return type:
- with_fields(fields)[source]#
Returns list of characters with information for certain fields.
- property is_bootstrapped: bool#
Return True if UNIHAN and database is set up.
- Returns:
True if Unihan application fixture data installed.
- Return type:
- sql: Database#
- engine: Engine#
sqlalchemy.engine.Engine
instance.
- metadata: MetaData#
sqlalchemy.schema.MetaData
instance.
- session: Session#
sqlalchemy.orm.session.Session
instance.
- base: AutomapBase#
sqlalchemy.ext.automap.AutomapBase
instance.
- cihai.data.unihan.constants.UNIHAN_FILES = ['Unihan_DictionaryLikeData.txt', 'Unihan_IRGSources.txt', 'Unihan_NumericValues.txt', 'Unihan_RadicalStrokeCounts.txt', 'Unihan_Readings.txt', 'Unihan_Variants.txt']#
Mapping of files from unihan-etl (UNIHAN database)
- cihai.data.unihan.constants.UNIHAN_FIELDS: List[str] = ['kAccountingNumeric', 'kCangjie', 'kCantonese', 'kCheungBauer', 'kCihaiT', 'kCompatibilityVariant', 'kDefinition', 'kFenn', 'kFourCornerCode', 'kFrequency', 'kGradeLevel', 'kHDZRadBreak', 'kHKGlyph', 'kHangul', 'kHanyuPinlu', 'kHanyuPinyin', 'kJapaneseKun', 'kJapaneseOn', 'kKorean', 'kMandarin', 'kOtherNumeric', 'kPhonetic', 'kPrimaryNumeric', 'kRSAdobe_Japan1_6', 'kRSJapanese', 'kRSKanWa', 'kRSKangXi', 'kRSKorean', 'kRSUnicode', 'kSemanticVariant', 'kSimplifiedVariant', 'kSpecializedSemanticVariant', 'kTang', 'kTotalStrokes', 'kTraditionalVariant', 'kVietnamese', 'kXHC1983', 'kZVariant']#
Mapping of field names from unihan-etl (UNIHAN database)
- cihai.data.unihan.constants.UNIHAN_ETL_DEFAULT_OPTIONS = {'expand': False, 'fields': ['kAccountingNumeric', 'kCangjie', 'kCantonese', 'kCheungBauer', 'kCihaiT', 'kCompatibilityVariant', 'kDefinition', 'kFenn', 'kFourCornerCode', 'kFrequency', 'kGradeLevel', 'kHDZRadBreak', 'kHKGlyph', 'kHangul', 'kHanyuPinlu', 'kHanyuPinyin', 'kJapaneseKun', 'kJapaneseOn', 'kKorean', 'kMandarin', 'kOtherNumeric', 'kPhonetic', 'kPrimaryNumeric', 'kRSAdobe_Japan1_6', 'kRSJapanese', 'kRSKanWa', 'kRSKangXi', 'kRSKorean', 'kRSUnicode', 'kSemanticVariant', 'kSimplifiedVariant', 'kSpecializedSemanticVariant', 'kTang', 'kTotalStrokes', 'kTraditionalVariant', 'kVietnamese', 'kXHC1983', 'kZVariant'], 'format': 'python', 'input_files': ['Unihan_DictionaryLikeData.txt', 'Unihan_IRGSources.txt', 'Unihan_NumericValues.txt', 'Unihan_RadicalStrokeCounts.txt', 'Unihan_Readings.txt', 'Unihan_Variants.txt']}#
Default settings passed to unihan-etl
Variants plugin#
- class cihai.data.unihan.dataset.UnihanVariants[source]#
Bases:
DatasetPlugin
,SQLAlchemyMixin
- sql: Database#
- engine: Engine#
sqlalchemy.engine.Engine
instance.
- metadata: MetaData#
sqlalchemy.schema.MetaData
instance.
- session: Session#
sqlalchemy.orm.session.Session
instance.
- base: AutomapBase#
sqlalchemy.ext.automap.AutomapBase
instance.
Conversion#
- cihai.conversion.euc_to_unicode(hexstr)[source]#
Return EUC-CN (GB2312) hex to a Python unicode.
- Return type:
- Parameters:
hexstr (bytes) –
- Returns:
Python unicode e.g.
u'\\u4e00'
/ ‘一’.- Return type:
unicode
Examples
>>> u'\u4e00'.encode('gb2312').decode('utf-8') '\u04bb'
>>> (b'\\x' + b'd2' + b'\\x' + b'bb').replace('\\x', '') \ ... .decode('hex').decode('utf-8') u'\u04bb'
Note: bytes don’t have a
.replace
:>>> gb_enc = gb_enc.replace('\\x', '').decode('hex') >>> gb_enc.decode('string_escape') # Won't work with Python 3.x.
- cihai.conversion.euc_to_utf8(euchex)[source]#
Convert EUC hex (e.g. “d2bb”) to UTF8 hex (e.g. “e4 b8 80”).
- Return type:
- cihai.conversion.gb2312_to_euc(gb2312hex)[source]#
Convert GB2312-1980 hex (internal representation) to EUC-CN hex (the “external encoding”)
- Return type:
- cihai.conversion.kuten_to_gb2312(kuten)[source]#
Convert GB kuten / quwei form (94 zones * 94 points) to GB2312-1980 / ISO-2022-CN hex (internal representation)
- Return type:
- cihai.conversion.python_to_euc(uni_char, as_bytes=False)[source]#
Return EUC character from a Python Unicode character.
Converts a one character Python unicode string (e.g. u’u4e00’) to the corresponding EUC hex (‘d2bb’).
- cihai.conversion.python_to_ucn(uni_char, as_bytes=False)[source]#
Return UCN character from Python Unicode character.
Converts a one character Python unicode string (e.g. u’u4e00’) to the corresponding Unicode UCN (‘U+4E00’).
- cihai.conversion.ucn_to_unicode(ucn)[source]#
Convert a Unicode Universal Character Number (e.g. “U+4E00” or “4E00”) to Python unicode (u’u4e00’)
- Return type:
Exceptions#
When using cihai via Python, you can catch Cihai-specific exceptions via these. All Cihai-specific
exceptions are catchable via CihaiException
since its the base exception.
Exceptions raised from the Cihai library.
- exception cihai.exc.CihaiException[source]#
Bases:
Exception
Base Cihai Exception class.
- add_note()#
Exception.add_note(note) – add a note to the exception
- args#
- with_traceback()#
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception cihai.exc.ImportStringError(import_name, exception)[source]#
Bases:
ImportError
,CihaiException
Provides information about a failed
import_string()
attempt.Notes
This is from werkzeug.utils d36aaf1 on August 20 2022, LICENSE BSD. https://github.com/pallets/werkzeug
Changes: - Deferred load import import_string from cihai.util - Format with black
- add_note()#
Exception.add_note(note) – add a note to the exception
- args#
- msg#
exception message
- name#
module name
- path#
module path
- with_traceback()#
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
-
exception:
BaseException
# Wrapped exception.
Utilities#
Utility and helper methods for cihai.
- cihai.utils.supports_wide()[source]#
Return affirmative if python interpreter supports wide characters.
- cihai.utils.import_string(import_name, silent=False)[source]#
Imports an object based on a string.
This is useful if you want to use import paths as endpoints or something similar. An import path can be specified either in dotted notation (
xml.sax.saxutils.escape
) or with a colon as object delimiter (xml.sax.saxutils:escape
).If silent is True the return value will be None if the import fails.
- Return type:
- Parameters:
import_name (string) – the dotted name for the object to import.
silent (bool) – if set to True import errors are ignored and None is returned instead.
- Return type:
imported object
- Raises:
cihai.exc.ImportStringError (ImportError, cihai.exc.CihaiException) –
Notes
This is from werkzeug.utils d36aaf1 on May 23, 2022, LICENSE BSD. https://github.com/pallets/werkzeug
Changes: - Exception raised is cihai.exc.ImportStringError - Format with black