UNIHAN - cihai.data.unihan
¶
Bootstrapping¶
Fetch + extract + transform + load UNIHAN dataset to Cihai.
- cihai.data.unihan.bootstrap.bootstrap_unihan(engine, metadata, options=None)[source]¶
UNIHAN bootstrap script (download from web, import to database).
- Return type:
- cihai.data.unihan.bootstrap.is_bootstrapped(metadata)[source]¶
Return True if cihai is correctly bootstrapped.
- Return type:
- cihai.data.unihan.bootstrap.create_unihan_table(columns, metadata)[source]¶
Create table and return
sqlalchemy.sql.schema.Table
.- Return type:
- Parameters:
columns (list) – columns for table, e.g.
['kDefinition', 'kCantonese']
metadata (
sqlalchemy.schema.MetaData
) – Instance of sqlalchemy metadata
- Returns:
Newly created table with columns and index.
- Return type:
- class cihai.data.unihan.dataset.Unihan[source]¶
Bases:
Dataset
,SQLAlchemyMixin
UNIHAN Dataset for cihai.
- bootstrap(options=None)[source]¶
Fetch, extract, import UNIHAN to DB, and initialize DB mapping.
- Return type:
- lookup_char(char)[source]¶
Return character information from datasets.
- Return type:
Query[Unihan]
- Parameters:
char (str) – character / string to lookup
- Returns:
list of matches
- Return type:
- reverse_char(hints)[source]¶
Return QuerySet of objects from SQLAlchemy of results.
- Return type:
Query[Unihan]
- Parameters:
- Returns:
reverse matches
- Return type:
- with_fields(fields)[source]¶
Return list of characters with information for certain fields.
- property is_bootstrapped: bool¶
Return True if UNIHAN and database is set up.
- Returns:
True if Unihan application fixture data installed.
- Return type:
- sql: Database¶
- engine: Engine¶
sqlalchemy.engine.Engine
instance.
- metadata: MetaData¶
sqlalchemy.schema.MetaData
instance.
- session: Session¶
sqlalchemy.orm.session.Session
instance.
- base: AutomapBase¶
sqlalchemy.ext.automap.AutomapBase
instance.
Constants for UNIHAN cihai dataset.
- cihai.data.unihan.constants.UNIHAN_FILES = ['Unihan_DictionaryLikeData.txt', 'Unihan_IRGSources.txt', 'Unihan_NumericValues.txt', 'Unihan_RadicalStrokeCounts.txt', 'Unihan_Readings.txt', 'Unihan_Variants.txt']¶
Mapping of files from unihan-etl (UNIHAN database)
- cihai.data.unihan.constants.UNIHAN_FIELDS: List[str] = ['kAccountingNumeric', 'kCangjie', 'kCantonese', 'kCheungBauer', 'kCihaiT', 'kCompatibilityVariant', 'kDefinition', 'kFenn', 'kFourCornerCode', 'kGradeLevel', 'kHDZRadBreak', 'kHKGlyph', 'kHangul', 'kHanyuPinlu', 'kHanyuPinyin', 'kJapaneseKun', 'kJapaneseOn', 'kKorean', 'kMandarin', 'kOtherNumeric', 'kPhonetic', 'kPrimaryNumeric', 'kRSAdobe_Japan1_6', 'kRSUnicode', 'kSemanticVariant', 'kSimplifiedVariant', 'kSpecializedSemanticVariant', 'kTang', 'kTotalStrokes', 'kTraditionalVariant', 'kVietnamese', 'kXHC1983', 'kZVariant']¶
Mapping of field names from unihan-etl (UNIHAN database)
- cihai.data.unihan.constants.UNIHAN_ETL_DEFAULT_OPTIONS = {'expand': False, 'fields': ['kAccountingNumeric', 'kCangjie', 'kCantonese', 'kCheungBauer', 'kCihaiT', 'kCompatibilityVariant', 'kDefinition', 'kFenn', 'kFourCornerCode', 'kGradeLevel', 'kHDZRadBreak', 'kHKGlyph', 'kHangul', 'kHanyuPinlu', 'kHanyuPinyin', 'kJapaneseKun', 'kJapaneseOn', 'kKorean', 'kMandarin', 'kOtherNumeric', 'kPhonetic', 'kPrimaryNumeric', 'kRSAdobe_Japan1_6', 'kRSUnicode', 'kSemanticVariant', 'kSimplifiedVariant', 'kSpecializedSemanticVariant', 'kTang', 'kTotalStrokes', 'kTraditionalVariant', 'kVietnamese', 'kXHC1983', 'kZVariant'], 'format': 'python', 'input_files': ['Unihan_DictionaryLikeData.txt', 'Unihan_IRGSources.txt', 'Unihan_NumericValues.txt', 'Unihan_RadicalStrokeCounts.txt', 'Unihan_Readings.txt', 'Unihan_Variants.txt']}¶
Default settings passed to unihan-etl
Variants plugin¶
- class cihai.data.unihan.dataset.UnihanVariants[source]¶
Bases:
DatasetPlugin
,SQLAlchemyMixin
Support for CJK Variant lookups through UNIHAN dataset.
- sql: Database¶
- engine: Engine¶
sqlalchemy.engine.Engine
instance.
- metadata: MetaData¶
sqlalchemy.schema.MetaData
instance.
- session: Session¶
sqlalchemy.orm.session.Session
instance.
- base: AutomapBase¶
sqlalchemy.ext.automap.AutomapBase
instance.