document-tfidf
v0.2.1
Published
A TFIDF analysis package that allows for tokens of any word length
Downloads
7
Maintainers
Readme
####Getting Started Install package with:
npm install document-tfidf
####Features:
- countTermFrequencies
- storeTermFrequencies
- normalizeTermFrequencies
- identifyUniqueTerms
- fullTFIDFAnalysis
Documentation
- Term Frequency - Inverse Document Frequency (TFIDF) Module:
- countTermFrequencies: function(text [, options])
- Counts the number of times each token appears in the input text.
- Current options include tokenLength, which dictates the number of words that comprise each token. tokenLength defaults to 1.
- Depends on nGrams module, which can get all tokens with arbitrary length.
- storeTermFrequencies: function(tokenSet, TFStorage)
- Adds the tokenSet to the collectionStorage for improved analysis over time.
- It’s recommended to save this collection in a persistent data store, although this is unnecessary.
- If collectionStorage is not provided, it will create it as an object and return that object.
- normalizeTermFrequencies: function(tokenSet, TFStorage)
- For each token in tokenSet, normalizeTermFrequencies will divide its count by the total number found in TFStorage and return the token set with normalized counts.
- identifyUniqueTerms: function(normalizedTokenSet [, options])
- From the input normalizedTokenSet, identifyUniqueTerms will return the most unique tokens, as defined by the highest TFIDF
- Current options include uniqueThreshold. If specified, identifyUniqueTerms will return all terms with a TFIDF equal to or greater than the uniqueThreshold
- fullTFIDAnalysis: function(text [, options])
- Completes all of the above TFIDF calculations
- options correspond with the options for each piece of the analysis
- countTermFrequencies: function(text [, options])
View the full specs and check out more text analysis in my Text Analysis Suite.