npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@geoffcox/pretty-good-nlp

v1.0.0

Published

A simple natural language processing (NLP) recognizer you can use in minutes.

Downloads

4

Readme

@geoffcox/pretty-good-nlp package

Pretty-good-nlp is a deterministic, match-based, recognizer for natural language processing (NLP) scenarios.

This readme covers installation and usage. You can read about the NLP concepts in the repository readme.

Live Demo

Installation

npm install @geoffcox/pretty-good-nlp

Basic Usage

1. Define an intent

An intent made up of a set of examples.

const intent : Intent = {
    name: 'Turn on oven',
    examples: [];
};

2. Add examples to your intent

Each example consists of an ordered set of parts to match.

const intent : Intent = {
    name: 'Turn on oven',
    examples: [
        {
            name: "Turn the oven on to 450 degrees for 2 hours",
            parts: [],                
        },
        //...
    ];
};

3. Add parts to your example

Add the parts in the order you expect them to be in the example. Each part can have literal phrases, patterns, and/or regular expressions.

const intent : Intent = {
    name: 'Turn on oven',
    examples: [
        {
            name: "Turn the oven on to 450 degrees for 2 hours",
            parts: [
                { phrases: ["Turn on the oven to"] },                
                { patterns: ["###"] },
                { phrases: ["degrees"] },
                { phrases: ["for"] },
                { regularExpressions: ["\\d+"] },
                { phrases: ["hours"] },
            ],            
        }
        //...
    ];
};

Of course, you would have more than one phrase in most parts. As you think of variations that fit within the example format, add them.

phrases: ["Turn on the oven", "Turn the oven on", "Bake at", "Broil at"]

If you find a variation that doesn't fit, it might mean you need to define a new example. If you find that you are covering too many permutations of a phrase, you might need to break it up into more parts.

4. Define a variable name in each part that should be extracted

When a part with a variable name matches, the matched text is extract and returned as the value for that variable.

const intent : Intent = {
    name: 'Turn on oven',
    examples: [
        {
            name: "Turn the oven on to 450 degrees for 2 hours",
            parts: [
                { phrases: ["Turn on the oven to"] },
                { patterns: ["###"], variable: "temperature" },
                { phrases: ["degrees"], variable: "temperatureUnit" },
                { phrases: ["for"] },
                { regularExpressions: ["\\d+"], variable: "duration" },
                { phrases: ["hours"], variable: "durationUnit" },
            ],            
        }
        //...
    ];
};

5. Call recognize

The recognize method takes the text to recognize, the intent you created, and some options. The options are covered later in the advanced usage section.

function recognize(
text: string,
intent: Intent,
options?: RecognizeOptions
): IntentRecognition;

Recognize returns an IntentRecognition.

It has the name of the intent, a recognition score, and a dictionary of extracted variable name/values. There is also a details object that contains more information specific to this recognizer.

export type IntentRecognition = {
name: string;
score: number;
variableValues: Record<string, string[]>;
details: {
    examples: ExampleRecognition[];
    textTokenMap: TokenMap;
};

Advanced

How scoring works

  • The recognition score for the intent is the highest example recognition score.
  • The score will be between 0 (not recognized) and 1 (exactly recognized) inclusive.
  • Each example recognition is scored by a ratio of the actual/expected part matches.
    • The score is adjusted based on the relative weight of each part.
    • There is a deduction if the matches are out of order up to a maximum of 0.15.
    • There is also a deduction for noise (i.e. words in the text that are not matched) up to a maximum of 0.05.

About variable values:

  • The variable values will contain the variables extracted for the higest scoring example.
  • Sometimes there are multiple possible matches for a variable. In this case there will be more than one value in the values array.
  • The values array associated with each varaible name will be in order from best to worst match.

About details:

  • The example recognitions are in the same order as the examples in the intent.
  • Each example recognition can be inspected to review the example's name, score, recognized parts, recognized never parts, and some metrics from the scoring process.
  • The text token map can be inspected to review the input text and the tokens from tokenization. Tokenization is breaking up the input text into words.

Dealing with negatives

There are words that indicate the opposite of an intent. You can handle these cases by adding parts to the example using the neverParts property. If any of these part match then the example gets a score of 0.

const intent : Intent = {
    name: 'Turn on oven',
    examples: [
        {
            name: "Turn the oven on to 450 degrees for 2 hours",
            parts: [],
            neverParts: [
                { phrases: ["Don't", "Do not", "Cancel", "Stop", "Off"]}
            ],
        },
        //...
    ];
};

Handling Importance

You can set options on an example part to indicate if it is more/less important than other parts.

  • You can weight a part relative to other parts. A weight of zero indictes an optional part. The default is 1.

  • You can make a part required. If a required part is not found, the entire example gets a score of 0.

  • You can indicate that a part can appear in any order within the example.

    const intent : Intent = {
        name: 'Turn on oven',
        examples: [
            {
                name: "Turn the oven on to 450 degrees for 2 hours",
                parts: [
                    { phrases: ["Turn on the oven to"] },
                    { patterns: ["###"], variable: "temperature", weight: 4 },
                    { phrases: ["degrees"], variable: "temperatureUnit" },
                    { phrases: ["for"], weight: 0 },
                    { regularExpressions: ["\\d+"], variable: "duration", weight: 2 },
                    { phrases: ["hours"], variable: "durationUnit" },
                ],
                neverParts: [],
            }
            //...
        ];
    };

Ignoring Order

Parts are expected to appear in order and the score is reduced for out of order parts. You can indicate that a part can appear in any order within the example by the ignoreOrder property.

//...
parts: [
    { phrases: "Please", ignoreOrder: true}
    //...
],
//...

Sharing phrases, patterns, and regular expressions

Use the shared property to specify named sets of phrases, patterns, and regular expressions to use across intents and examples.

const options = {
  shared: {
    temperatureUnits: ['fahrenheit', 'celcius', 'kelvin'],
    timeDurations: ['hours', 'minutes'],
    datePatterns: ['####-##-##','##/##/##','##-####'],
    timeRegexs: ['\\d\\d:\\d\\d', '\\d+']
  }
};

You can then include them into example parts by reference.

const part : ExamplePart = {
  // This resolves to ['degrees', fahrenheit', 'celcius', 'kelvin']
  phrases: ['degrees', '$ref=temperatureUnits']
}

Tuning out of order and noise penalties

Set the maxOutOfOrderPenalty or maxNoisePenalty to control how severe the penalties are when scoring examples. Values must be between 0 and 1 inclusive.

const options = {
  maxOutOfOrderPenalty: 0.2,
  maxNoisePenalty: 0    
};

Using a different tokenizer

The tokenize method takes a string of text and returns a TokenMap. A TokenMap is the original text and an array of character ranges with one range per token.

export type Tokenizer = (text: string) => TokenMap;

You can implement a tokenizer to separate words based on a particular language, or if you want to break on different delimiters. The default tokenizer breaks up words based on ' .,:;' (i.e. space, period, comma, colon, semicolon, question mark, and exclamation point). Pass your tokenizer in the options.

const options = {
  tokenizer: myTokenizer,  
};