npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

sqlite-bayes

v0.0.2

Published

Naive Bayes Classifier for NodeJs using sqlite as data storage

Downloads

5

Readme

Jeez! Not another!

NodeJs is teaming with Naive Bayes Classifiers so before someone asks why we need another, let me explain why this classifier is different and why you must love it! :-)

Motivation

Almost all naive classifiers out there save and consume their data in JSON format, allowing you to persist the data to file.

While this works for most cases, it is problematic when you want to train your classifier over several thousands of large documents. It becomes worse when you want to train persistently over a long time.

Imagine you were tracking BuzzFeed headlines and training your classifier to understand clickbait. Would it be convenient to train over a period of months using a JSON file that has to be loaded & held in memory?

What happens if your code exits unexpectedly on the millionth document just before you had persisted to disk?

Is this method of training sustainable and most of scalable?

If You see my point read on....

So, turns out there's not a simple sql based Naive Bayes classifier out there. Know one? Show me please.

Actually, there are a few gists and examples but must are written for a specific dataset and their logic is often convoluted, involving copying this data to that temporary table and so on.

But Naive Bayes classifiers, in their simplest form are simple. All they need to know is which document goes into what class. The rest, really, is just arithmetic.

So this classifier implements a database schema that mimicks the JSON objects encoded with classes, documents and their respective counts.

Using simple, straightforward SQL, your database is atomically updated each time you classify a new document and the probabilities change automagically.

You will never need to load heavy files ever again, and because this is SQL(Lite), you can carry and plugin your data wherever you go!

Best of all, you can train whenever you come across new documents without affecting any ongoing classifications.

npm install bayes

##Usage


var bayes = require('./lib/naive_bayes');
var path=require('path');

//Some Options
var options={
	 "dbPath":path.join(__dirname,'data'), //path to save database
	 "dbName":'sentiment-db', //database name
	 "stopwords":['en','sw'], //stopwords to use. See https://www.npmjs.com/package/multi-stopwords for more
     "stemmer":'lancaster', //what stemmers do you want to use. Currently suppports 'lancaster' & 'porter' stemmers via https://www.npmjs.com/package/natural.
     "returnProbabilities":3, //how many probabilities do you want returned. Important especially where you have many classes
     "trace":true //do you want log what's happening?
};


var classifier = bayes(options/*All options are optional!*/);

//  teach our classifier a few facts
classifier.learn('amazing, awesome movie!! Yeah!! Oh boy.', 'positive')
classifier.learn('Sweet, this is incredibly, amazing, perfect, great!!', 'positive')
classifier.learn('terrible, shitty thing. Damn. Sucks!!', 'negative');


// //must save docs...to commit data to database
// Also, if it's the first time you are traing the classifier, then run categorize after data has been commited
classifier.saveDocs(function(){

	 //now ask it to categorize a document it has never seen before
	 classifier.categorize('this is some incredibly shitty day',function(classification){
			 console.log(JSON.stringify(classification,0,4));
	 });

});

API

var classifier = bayes([options])

Returns an instance of a Sqlite-Bayes Classifier.

Pass in an optional options object to configure the instance. If you specify a stemmer function in options, it will be used as the instance's tokenizer. The default tokenizer removes punctuation and splits on spaces.

NOTE:

  • Once you have created a database using one stemmer (or none), you cannot then change this stemmer in the future. SQLite-Bayes is stores your initial stemmer and will always use it no matter what. This helps avert a situation where you would have data stemmed differently used to categorize the same piece of text. Which would be extremely inacurate.
  • Text entered to be classified goes through the same stemming that the database was initialized with to harmonize it with the data prior to any classification. This process is automatic and requires no intervention from you!

classifier.learn(text, category)

Teach your classifier what category the text belongs to. The more you teach your classifier, the more reliable it becomes. It will use what it has learned to identify new documents that it hasn't seen before.

classifier.categorize(text)

Returns the category it thinks text belongs to. Its judgement is based on what you have taught it with .learn().

Heads Up

This classifier borrows a lot fron Bayes by Tolga Tezel.