Extending cihai#
Use cihai’s abstraction and your dataset’s users can receive easy configuration, SQL access, and be available in a growing list of CJKV information.
Creating new dataset#
Expand cihai’s knowledge! Create a cihai.extend.Dataset
.
You can also make your dataset available in open source so other cihai users can use it! If you do, bring it up on the issue tracker!
examples/dataset.py:
#!/usr/bin/env python
from typing import Dict, List
from cihai.core import Cihai
from cihai.extend import Dataset
data = {} # any data source, internal, a file, on the internet, in a database...
class MyDataset(Dataset):
def bootstrap(self) -> None: # automatically ran with .add_dataset, if exists
# Use this to setup your dataset, check if updates are needed, etc.
data.update({"好": "Good", "好好": "Hello"})
def givemedata(self, key: str) -> str:
return data[key]
def search(self, needle: str) -> Dict[str, object]:
return {k: v for k, v in data.items() if needle in k}
def backwards(self, needle: str) -> List[str]:
return [k for k, v in data.items() if needle in v]
def run() -> None:
c = Cihai(unihan=False)
c.add_dataset(MyDataset, namespace="moo")
my_dataset = MyDataset()
my_dataset.bootstrap()
print("Definitions exactly for 好", my_dataset.givemedata("好"))
print("Definitions matching with 你好:", ", ".join(my_dataset.search("好")))
print("Reverse definition with Good:", ", ".join(my_dataset.backwards("Good")))
if __name__ == "__main__":
run()
In addition, view our reference implementation of UNIHAN, which is incorporated as a dataset. See
cihai.data.unihan.dataset.Unihan
Plugins: Adding features to a dataset#
Extend a dataset with custom behavior to avoid repetition. Create a
cihai.extend.DatasetPlugin
.
See our reference implementation of cihai.data.unihan.dataset.UnihanVariants
Datasets can be augmented with computed methods.
These utilize a dataset to pull information out, but are frequently used / generic enough to write a
An example of this would be the suggestion to add variant lookups for UNIHAN.
Combining datasets#
Combining general datasets in general is usually considered general library usage. But if you’re usage is common, saves from repetition, it’s worth considering making into a reuseable extension and open sourcing it.
Using the library to mix and match data from various sources is what cihai is meant to do! If you have a way you’re using cihai that you think would be helpful, definitely create an issue, a gist, github repo, etc! License it permissively please (MIT, BSD, ISC, etc!)