word-ngrams
v0.2.0
Published
A package for building and analyzing word nGrams
Downloads
96
Maintainers
Readme
####Getting Started Install package with:
npm install word-ngrams
####Features:
- buildNGrams
- listAllNGrams
- getNGramsByFrequency
- getMostCommonNGrams
- listNGramsByCount
Documentation
- buildNGrams: function(text, unit [, options])
- Maps all nGrams within input text with input unit length (1=unigram, 2=bigram, 3=trigram, ...)
- In constructing the nGram, terminal sentence punctuation (such as periods, question marks, and exclamation marks) and semicolons are considered words, as they also carry meaning. Apostrophes and compound word hyphens are ignored. To signify the end of a paragraph or body of text, null will be used.
- Options include caseSensitive and includePunctuation.
- If includePunctuation is set to false, then terminal sentence punctuation and the end of the body of text are not included in the nGram.
- Both caseSensitive and includePunctuation both default to false.
- Example:
buildNGrams(“Hello, World! How’s the world weather today? Hello, World!”, 2, {caseSensitive: true, includePunctuation: true}) // returns { Hello: { ,: 2 }, ,: { World: 2 }, World: { !: 2 }, !: { How’s: 1, null: 1}, How’s: { the: 1 }, the: { world: 1 }, world: { weather: 1 }, weather: { today: 1 }, today: { ?: 1 }, ?: { Hello: 1 } }
- listAllNGrams: function(nGrams)
- Given an input set of nGrams (of the same format as the buildNGrams output), listAllNGrams will return a list of unique nGrams found in the text.
- Example:
// Example input nGram for “Hello World. Goodbye World!”, without punctuation listAllNGrams({ Hello: { World: 1 }, Goodbye: { world: 1 }}) // returns [“hello world”, “goodbye world”]
- getNGramsByFrequency: function(nGrams, frequency)
- Given an input set of nGrams (of the same format as the buildNGrams output), getNGramsByFrequency will return a list of all nGrams that occur that many times.
- Example:
// Example input nGram for “Hello World”, without punctuation getNGramsByFrequency({ hello: { world: 1 }, 1) // returns [ “hello world”]
- getMostCommonNGrams: function(nGrams)
- Given an input set of nGrams (of the same format as the buildNGrams output), getMostCommonNGrams will return a list of the most common nGrams.
- Example:
// Example input nGram for “Hello World! Goodbye World!”, with punctuation getMostCommonNGrams({ Hello: { World: 1 }, World: { !: 2 }, !: { Goodbye: 1, null: 1 }, Goodbye: { world: 1 }}) // returns [“World!”]
- listNGramsByCount: function(nGrams)
- Given an input set of nGrams (of the same format as the buildNGrams output), listNGramsByCount will return all nGrams sorted into buckets by count.
- Example:
// Example input for “Hello, World! How’s the weather? Goodbye, World!” listNGramsByCount({ hello: 1, world: 2, “how’s”: 1, the: 1, weather: 1, goodbye: 1}) // returns { 1: [“hello”, “how’s”, “the”, “weather”, “goodbye”], 2: [“world”]}
View the full specs and check out more text analysis in my Text Analysis Suite.