npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

textclass

v0.1.1

Published

A simple text classifier library for javascript

Downloads

8

Readme

TextClass // Text classification algorithms

With TextClass, you get a text classification library with zero dependencies.

Example use cases:

  • Spam-filter
  • Potential insult checking
  • Simple ChatBot
  • Self-learning bot applications
  • Text sample detection

How the algorithms work

The algorithms in TextClass work slightly differently than, for example, Bayesian classification. They are also not based on neural networks or some kind of AI. TextClass works with a points system. So the inputs will be tokenized to only words in lower case without any special chars. In the "weighted" classes, the tokens are counted according to their frequency. Then on use of the class it will check and calculate the results using frequencies and weightings to rate the texts. The difference between normal and "weighted" classes is that with "weighted" classes, the frequency and value of importance does also matter, not just the number of matches. A big difference to other classifications like Bayes is that TextClass needs less but more accurate training data to be more accurate. If there is a lot of data, I recommend the use of "weighted" classes, because they show the importance of the tokens. Furthermore, the "single" classes do not need any additional training data for comparing. That means, if you build a spam filter, you don't need even spam AND normal data for training. You only need the spam data to train. And thats how TextClass works!

Examples

Simple greeting detection

let tcs = new TextClass.Single();

tcs.learn('Hello there!');
tcs.learn('Hi');
tcs.learn('Welcome everyone');
tcs.learn('Hey how are you?');

let result1 = tcs.run('Hi, how are you?');
/*
{
  inputTokens: [ 'hi', 'how', 'are', 'you' ],
  matchedTokens: [ 'hi', 'how', 'are', 'you' ],
  result: { confidence: 1, percent: 100 }
}
*/

let result2 = tcs.run('I want to have food');
/*
{
  inputTokens: [ 'i', 'want', 'to', 'have', 'food' ],
  matchedTokens: [],
  result: { confidence: 0, percent: 0 }
}
*/

Classes

TextClass.Single()

Most simple and primitive classification. The system only checks whether the tokens are present in the model and uses them to count and calculate the points for the result.

Methods:

learn(text: string)

Train the model of your instance with text

run(text: string)

Classify text and get results

Returns:

let result = {
    inputTokens: Array,
    matchedTokens: Array,
    result: {
        confidence: Number,
        percent: Number
    }
}

Or if no data is in model it returns null.

Properties:

model: Array

The model. In this TextClass, its just an array with all tokens.

TextClass.SingleWeighted()

More advanced classification. It calculates the importance of every token in the model and can deliver more accurate results on heavy data.

Methods:

learn(text: string)

Train the model of your instance with text

run(text: string)

Classify text and get results

Returns:

let result = {
    inputTokens: Array,
    matchedTokens: Array,
    result: {
        confidence: Number,
        percent: Number
    }
}

Or if no data is in model it returns null.

Properties:

model: Object

The model. In this TextClassWeighted, it looks like this:

let model = {
    tokens: Object,
    importantTokens: Object,
    processed: Boolean
}

TextClassMulti

Comming soon

TextClassMultiWeighted

Comming soon

WordClass

Comming soon