Extending cihai#

Use cihai’s abstraction and your dataset’s users can receive easy configuration, SQL access, and be available in a growing list of CJKV information.

Creating new dataset#

Expand cihai’s knowledge! Create a cihai.extend.Dataset.

You can also make your dataset available in open source so other cihai users can use it! If you do, bring it up on the issue tracker!

examples/dataset.py:

#!/usr/bin/env python
from cihai.core import Cihai
from cihai.extend import Dataset

data = {}  # any data source, internal, a file, on the internet, in a database...


class MyDataset(Dataset):
    def bootstrap(self):  # automatically ran with .add_dataset, if exists
        # Use this to setup your dataset, check if updates are needed, etc.
        data.update({"好": "Good", "好好": "Hello"})

    def givemedata(self, key):
        return data[key]

    def search(self, needle):
        return {k: v for k, v in data.items() if needle in k}

    def backwards(self, needle):
        return [k for k, v in data.items() if needle in v]


def run():
    c = Cihai(unihan=False)

    c.add_dataset(MyDataset, namespace="moo")
    c.moo.bootstrap()

    print("Definitions exactly for 好", c.moo.givemedata("好"))

    print("Definitions matching with 你好:", ", ".join(c.moo.search("好")))

    print("Reverse definition with Good:", ", ".join(c.moo.backwards("Good")))


if __name__ == "__main__":
    run()

In addition, view our reference implementation of UNIHAN, which is incorporated as a dataset. See cihai.data.unihan.dataset.Unihan

Plugins: Adding features to a dataset#

Extend a dataset with custom behavior to avoid repetition. Create a cihai.extend.DatasetPlugin.

See our reference implementation of cihai.data.unihan.dataset.UnihanVariants

Datasets can be augmented with computed methods.

These utilize a dataset to pull information out, but are frequently used / generic enough to write a

An example of this would be the suggestion to add variant lookups for UNIHAN.

Combining datasets#

Combining general datasets in general is usually considered general library usage. But if you’re usage is common, saves from repetition, it’s worth considering making into a reuseable extension and open sourcing it.

Using the library to mix and match data from various sources is what cihai is meant to do! If you have a way you’re using cihai that you think would be helpful, definitely create an issue, a gist, github repo, etc! License it permissively please (MIT, BSD, ISC, etc!)