cjk-tokenizer
v0.1.0
Published
A CJK text tokenizer
Downloads
52
Maintainers
Readme
cjk-tokenizer
Extract terms from CJK text. The origin idea is stolen from timdream/wordfreq.
Why?
A CJK text tokenizer that works as expected is missing in the javascript magic world. So I decided to build one with these features:
- Chinese, Japanese and Korean support
- Terms extracted would contain score, position in origin text, etc.
- A more common stop words collection
Install
Use in project:
npm i cjk-tokenizer --save
Cli:
npm i cjk-tokenizer -g