manee
v0.2.5
Published
a Thai / English text classification using ML lib
Downloads
5
Readme
Manee : Thai / English General-purpose text classification tool
An easy-to-use and simple text classification in Node.js based on TNThai and ml. The analyzer support Thai and English text. From text to vector, one-hot encoding technique is used. See Basic-Usage for more details.
Briefly, one-hot encoding represents a word in the text by a vector of the size of the vocabulary, where only the entry corresponding to the word is a one and all the other entries are zero.
Feature
- Training using Freetext string support Thai / English
- a Label must be string representing a category.
- Only support Multinomial Naive Bayes
Installation
npm install manee
or
npm install manee --save
Basic usage
const manee = require('manee');
var textClassifier = new manee()
Texts = ["FedEx Parcel Support: Delivery Problem, 1st Attempt Hello We've tried"
,"Package Delivery Notification Dear Customer, Please review your parcel delivery label"
,"Improve trustmail SERP Position Very powerful SERP Booster Plan"
,"re: G Analytics traffic for trustmail hi Che%ap Social and Search traffic i%n Google Analyt*ics"
,"Delivery problem, parcel USPS Your item has arrived at the Post Office at Mon, 03 Apr 2017 12:36:51 -0700"
,"สมัครงานตำแหน่ง IT Support Web เรียน ฝ่ายบุคคล กระผมมีความสนใจที่จะสมัครงานในตำแหน่ง สมัครงานในตำแหน่ง"
,"สมัครงานตำแหน่ง Production Supervisor (พระนครศรีอยุธยา) เรียนผู้จัดการฝ่ายบุคคล บริษัท "
,"Application (Planning) T. Maenumkhu A.Pluakdaeng Rayong 21140 April 12 2017 Personal Manager"
,"สมัครงานตำแหน่ง ผู้จัดการแผนกบุคคล เรียน ผู้จัดการฝ่ายทรัพยากรบุคคล เนื่องจากดิฉันนางสาว มีความสนใจร่วม"
,"ส่งเอกสารสมัครงาน เรียน ฝ่ายบุคคล กระผมมีความประสงค์ที่จะสมัครงานในตำแหน่ง \" เจ้าหน้าที่ RD; \""]
Labels = ["Spam", "Spam", "Spam", "Spam", "Spam", "Good", "Good", "Good", "Good", "Unknown"]
textClassifier.train(Texts, Labels)
textClassifier.classify(Texts)
//["Spam", "Spam", "Spam", "Spam", "Spam", "Good", "Good", "Good", "Good", "Unknown"]
textClassifier.evaluate()
/* Training Set has : 10 Samples
Correct Label : Spam,Spam,Spam,Spam,Spam,Good,Good,Good,Good,Unknown
Classified Label : Spam,Spam,Spam,Spam,Spam,Good,Good,Good,Good,Unknown
Whole set evaluation : 100% */
textClassifier.save('test.model')
newTextClassifier = new manee()
newTextClassifier.load('test.model')
newTextClassifier.classify(Texts)
//["Spam", "Spam", "Spam", "Spam", "Spam", "Good", "Good", "Good", "Good", "Unknown"]
To-Do List
- generalize ml interface so that many more ML models are supported
- implement cross validation evaluation
- filter high entropy word
- add word embedding technique reference
License
MIT