npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

ml-classify-text

v2.0.1

Published

Text classification using n-grams and cosine similarity

Downloads

1,279

Readme

📄 ClassifyText (JS)

Version Total Downloads License

Use machine learning to classify text using n-grams and cosine similarity.

Minimal library that can be used both in the browser and in Node.js, that allows you to train a model with a large amount of text samples (and corresponding labels), and then use this model to quickly predict one or more appropriate labels for new text samples.

Installation

Using npm

npm install ml-classify-text

Using yarn

yarn add ml-classify-text

Getting started

Import as an ES6 module

import Classifier from 'ml-classify-text'

Import as a CommonJS module

const { Classifier } = require('ml-classify-text')

Basic usage

Setting up a new Classifier instance

const classifier = new Classifier()

Training a model

const positive = [
	'This is great, so cool!',
	'Wow, I love it!',
	'It really is amazing'
]

const negative = [
	'This is really bad',
	'I hate it with a passion',
	'Just terrible!'
]

classifier.train(positive, 'positive')
classifier.train(negative, 'negative')

Getting a prediction

const predictions = classifier.predict('It sure is pretty great!')

if (predictions.length) {
	predictions.forEach((prediction) => {
		console.log(`${prediction.label} (${prediction.confidence})`)
	})
} else {
	console.log('No predictions returned')
}

Returning:

positive (0.5423261445466404)

Advanced usage

Configuration

The following configuration options can be passed both directly to a new Model, or indirectly by passing it to the Classifier constructor.

Options

| Property | Type | Default | Description | | -------------- | --------------------------- | ------- | ----------------------------------------------------------------------------------------------------- | | nGramMin | int | 1 | Minimum n-gram size | | nGramMax | int | 1 | Maximum n-gram size | | vocabulary | Array | Set | false | [] | Terms mapped to indexes in the model data, set to false to store terms directly in the data entries | | data | Object | {} | Key-value store of labels and training data vectors |

Using n-grams

The default behavior is to split up texts by single words (known as a bag of words, or unigrams).

This has a few limitations, since by ignoring the order of words, it's impossible to correctly match phrases and expressions.

In comes n-grams, which, when set to use more than one word per term, act like a sliding window that moves across the text — a continuous sequence of words of the specified amount, which can greatly improve the accuracy of predictions.

Example of using n-grams with a size of 2 (bigrams)

const classifier = new Classifier({
	nGramMin: 2,
	nGramMax: 2
})

const tokens = classifier.tokenize('I really dont like it')

console.log(tokens)

Returning:

{
    'i really': 1,
    'really dont': 1,
    'dont like': 1,
    'like it': 1
}

Serializing a model

After training a model with large sets of data, you'll want to store all this data, to allow you to simply set up a new model using this training data at another time, and quickly make predictions.

To do this, simply use the serialize method on your Model, and either save the data structure to a file, send it to a server, or store it in any other way you want.

const model = classifier.model

console.log(model.serialize())

Returning:

{
    nGramMin: 1,
    nGramMax: 1,
    vocabulary: [
    	'this',    'is',      'great',
    	'so',      'cool',    'wow',
    	'i',       'love',    'it',
    	'really',  'amazing', 'bad',
    	'hate',    'with',    'a',
    	'passion', 'just',    'terrible'
    ],
    data: {
        positive: {
            '0': 1, '1': 2, '2': 1,
            '3': 1, '4': 1, '5': 1,
            '6': 1, '7': 1, '8': 2,
            '9': 1, '10': 1
        },
        negative: {
            '0': 1, '1': 1, '6': 1,
            '8': 1, '9': 1, '11': 1,
            '12': 1, '13': 1, '14': 1,
            '15': 1, '16': 1, '17': 1
        }
    }
}

Documentation

Contributing

Read the contribution guidelines.

Changelog

Refer to the changelog for a full history of the project.

License

ClassifyText is licensed under the MIT license.