Extending cihai

Use cihai’s abstraction and your dataset’s users can receive easy configuration, SQL access, and be available in a growing list of CJKV information.

Creating new dataset

Expand cihai’s knowledge! Create a cihai.extend.Dataset.

You can also make your dataset available in open source so other cihai users can use it! If you do, bring it up on the issue tracker!


#!/usr/bin/env python
# -*- coding: utf8 - *-
from __future__ import print_function, unicode_literals

from cihai.core import Cihai
from cihai.extend import Dataset

data = {}  # any data source, internal, a file, on the internet, in a database...

class MyDataset(Dataset):
    def bootstrap(self):  # automatically ran with .add_dataset, if exists
        # Use this to setup your dataset, check if updates are needed, etc.
        data.update({'好': 'Good', '好好': 'Hello'})

    def givemedata(self, key):
        return data[key]

    def search(self, needle):
        return {k: v for k, v in data.items() if needle in k}

    def backwards(self, needle):
        return [k for k, v in data.items() if needle in v]

def run():
    c = Cihai(unihan=False)

    c.add_dataset(MyDataset, namespace='moo')

    print('Definitions exactly for 好', c.moo.givemedata('好'))

    print('Definitions matching with 你好:', ', '.join(c.moo.search('好')))

    print('Reverse definition with Good:', ', '.join(c.moo.backwards('Good')))

if __name__ == '__main__':

In addition, view our reference implementation of UNIHAN, which is incorporated as a dataset. See cihai.data.unihan.dataset.Unihan

Plugins: Adding features to a dataset

Extend a dataset with custom behavior to avoid repetition. Create a cihai.extend.DatasetPlugin.

See our reference implementation of cihai.data.unihan.dataset.UnihanVariants

Datasets can be augmented with computed methods.

These utilize a dataset to pull information out, but are frequently used / generic enough to write a

An example of this would be the suggestion to add variant lookups for UNIHAN.

Combining datasets

Combining general datasets in general is usually considered general library usage. But if you’re usage is common, saves from repetition, it’s worth considering making into a reuseable extension and open sourcing it.

Using the library to mix and match data from various sources is what cihai is meant to do! If you have a way you’re using cihai that you think would be helpful, definitely create an issue, a gist, github repo, etc! License it permissively please (MIT, BSD, ISC, etc!)