Natural Language Processing library for the Shan language (တႆး), inspired by PyThaiNLP.
pip install git+https://github.com/NoerNova/ShanNLPOr from source:
git clone https://github.com/NoerNova/ShanNLP
pip install -r requirements.txtfrom shannlp import word_tokenize
text = "တိူၵ်ႈသွၼ်လိၵ်ႈသင်ၶၸဝ်ႈ တီႈဝဵင်းမိူင်းၶၢၵ်ႇ"
print(word_tokenize(text, keep_whitespace=False))
# ['တိူၵ်ႈ', 'သွၼ်လိၵ်ႈ', 'သင်ၶ', 'ၸဝ်ႈ', 'တီႈ', 'ဝဵင်း', 'မိူင်းၶၢၵ်ႇ']from shannlp import spell_correct, correct_sentence
# Single word
print(spell_correct("မိုင်း"))
# [('မိူင်း', 0.95), ('မိူင်', 0.82), ...]
# Full sentence
print(correct_sentence("ၵူၼ်မိူင်း ၵိၼ် ၶဝ်ႈ"))from shannlp.util import num_to_shanword, convert_years
print(num_to_shanword(2117)) # သွင်ႁဵင်ၼိုင်ႈပၢၵ်ႇသိပ်းၸဵတ်း
print(convert_years(2023, "ad", "mo")) # 2117| Module | Description |
|---|---|
shannlp.tokenize |
Word and syllable tokenization |
shannlp.spell |
Spell correction (word-level and context-aware) |
shannlp.corpus |
Shan language corpus (~19,904 words) |
shannlp.util |
Number, digit, date, and keyboard utilities |
Wannaphong Phatthiyaphaibun, Korakot Chaovavanich, Charin Polpanumas, Arthit Suriyawongkul,
Lalita Lowphansirikul, & Pattarawat Chormai. (2016, Jun 27). PyThaiNLP: Thai Natural Language
Processing in Python. Zenodo. http://doi.org/10.5281/zenodo.3519354@misc{pythainlp,
author = {Wannaphong Phatthiyaphaibun and Korakot Chaovavanich and Charin Polpanumas
and Arthit Suriyawongkul and Lalita Lowphansirikul and Pattarawat Chormai},
title = {{PyThaiNLP: Thai Natural Language Processing in Python}},
month = Jun,
year = 2016,
doi = {10.5281/zenodo.3519354},
publisher = {Zenodo},
url = {http://doi.org/10.5281/zenodo.3519354}
}