npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

natural-language-parser

v1.0.0

Published

A parser for the English language in TypeScript

Downloads

5

Readme

Natural Language Parser in Typescript

Run Tests

The purpose of this tool is to create an AST from a sentence in English. You ca use the generated abstract syntax tree to analyze the semantics of the sentence and use them as an input for a natural language interpreter.

Structure

The rules that describe the language grammar are defined in grammar/BNF.txt - using the Backus–Naur metasyntax. The actual implementation utilizes OOP principals using TypeScript classes.

Usage

Install using npm:

npm i natural-language-parser

Import using CommonJS and create an instance:

const Parser = require('natural-language-parser').default
const parser = new Parser()
const parsed = parser.parse('the dog is in the park') // will create a Rule instance

JavaScript API

Import the parser using CommonJS:

const Parser = require('natural-language-parser').default

parserInstance.parse()

The parse function creates a Rule instance that contains all matched sentence parts as properties:

const parsed = parser.parse('the dog is in the park')

outputs an object with the following structure:

verbPhrase: VerbPhraseRule {
  type: 'VerbPhrase',
  verb: VerbPhraseRule {
    type: 'VerbPhrase',
    noun: [NounPhraseRule],
    verb: [VerbPhraseRule]
  },
  preposition: [Preposition],
  noun: NounPhraseRule {
    type: 'NounPhrase',
    determiner: [Determiner],
    noun: [NounPhraseRule]
  }
}

parsed.toHumanReadableJSON()

Use the toHumanReadableJSON function to create a JSON:

const parsed = parser.parse('the dog is in the park')
console.log(parsed.toHumanReadableJSON())

outputs a JSON object with simplified structure:

{
  "VerbPhrase": {
    "VerbPhrase": {
      "NounPhrase": {
        "determiner": "the",
        "noun": "dog"
      },
      "verb": "is"
    },
    "preposition": "in",
    "NounPhrase": {
      "determiner": "the",
      "noun": "park"
    }
  }
}

CLI

Use the nlp-cli command to parse a sentence:

cli usage

nlp-cli parse -s "the balrog sleeps in Moria"

will produce:

{
  "VerbPhrase": {
    "VerbPhrase": {
      "NounPhrase": {
        "determiner": "the",
        "noun": "balrog"
      },
      "verb": "sleeps"
    },
    "preposition": "in",
    "noun": "Moria"
  }
}

Configuration

The parser needs a dictionary in order to be able to recognize different words as verbs, nouns. prepositions etc. There is a built-in dictionary in the parser. It supports the most common English verbs, nouns, prepositions, determiners and conjunctions.

A dictionary.js file

If you need to specify a custom dictionary - you can create a dictionary.js file located in the root of your project:

node_modules/ index.js dictionary.js ...

The dictionary file must contain values for all required word classes supported by the parser:

module.exports = {
    nouns: ['road'],
    verbs: ['drive'],
    conjunctions: ['and'],
    prepositions: ['in'],
    determiners: ['the'],
    modalVerbs: ['should'],
}

If some of the above listed word classes is missing the parser will use the built-in dictionary. The dictionary is not case insensitive.

Custom dictionary file

If you want to use a dictionary from a custom-named file that is not in the root of the repo - you can use a nlpconfig.js file. The config file must be located in the root of the repo and it must have the dictionaryPath property:

module.exports = {
    dictionaryPath: 'some-folder/dictionary-custom.js'
}

How it Works

The parser accepts an input in English, breaks it down to its building components and builds a syntax tree representing the hierarchical structure of a sentence.

syntax tree

It separates the input into tokens - this process is called tokenization. Then recursively checks if the tokens can be substituted with items from the grammar's set - this is called the production operation. The production rules are defined in the grammar of the parser. For example a noun phrase is made up of a determiner and and a noun - "The sun" - NP -> D N. A verb phrase is made up of a verb and a noun phrase - "The sun rises" - VP -> V NP | NP V. Once there are no possible productions the parser stops and outputs the result. It uses a bottom up(shift-reduce) parsing algorithm - pushes the next word of the input sentence to a stack(the shift operation) and checks if a sequence of tokens corresponds to the right hand side of a production rule and substitutes it with the left hand side of that rule(the reduce phase) - will replace V NP with VP:

parsing steps

For more information regarding natural language parsing refer to Natural Language Processing with Python .

Limitations & Known Issues

This is an experimental project. As such it has limitations and issues:

  • It does not fully support the English language. The supported grammar is described in Backus–Naur form in the BNF.txt file.
  • It will not produce a full tree if a token is not recognized by the dictionary
  • Compound-complex sentences are not fully supported; currently only a sentence that consists of [<verb_phrase> <conjunction> <verb_phrase>] will be parsed successfully:
 nlp-cli parse -s "the balrog should not pass and sleeps in Moria"

will output:

AST: {
  "conjunction": "and",
  "verbPhraseA": {
    "VerbPhrase": {
      "NounPhrase": {
        "determiner": "the",
        "noun": "balrog"
      },
      "ModalVerbPhrase": {
        "modalVerb": "should",
        "conjunction": "not",
        "verb": "pass"
      }
    }
  },
  "verbPhraseB": {
    "VerbPhrase": {
      "verb": "sleeps",
      "preposition": "in",
      "noun": "Moria"
    }
  }
}

Everything else will output a single Rule instance - the last token that was reduced:

nlp-cli parse -s "the balrog should not pass and sleeps in Moria and should not sleep"

will output:

AST: sleep