atext-wordz
v1.0.4
Published
Provides a list of words within an entire text alongside few statistics
Downloads
1
Readme
Provides a list of words within an entire text alongside few statistics
Getting started
- Install the package
$ npm i atext-wordz
- Require it's functions
const { getWStatsList , getWStatsObj , getWordList } = require( "atext-wordz" );
- Call it's functions
Regarding your needs you have to pick in what format you wish to get the result.
NOTE : You have 3 choices
const result = getWStatsList( text , options ); // ==> [ {}:wordstats(1), {}, {}, ... , {}:wordstats(N) ] // OR const result = getWStatsObj( text , options ); // ==> { wordstats1, wordstats2, ..., wordstatsN } // OR const result = getWordList( text , options ); // ==> ["word 1", "word 2", ... "word N"]
Light demo
Assuming you have a demo.txt
file in a demo
folder at the same level as this .js
file and you want to get word stats.
const { fs } = require('fs') ;
const { getWStatsList , getWStatsObj , getWordList } = require( "../atext-wordz" );
fs.readFile( "./demo/demo.txt" , "utf8" , ( err , text ) => {
const sortString = ` by number of a > than b's `;
const cbOnNewWord = ( word ) => {
// TODO: make first sector actions on new word found
};
const options = { sortString , cbOnNewWord };
const result = getWStatsList( text , options );
console.log( result );
// outputs :
// < an array of word statistics sorted by most used words >
});
Options
There is few options to meet your requirements at this time. Here is the definition table.
| option | type | default | |-|-|-| |sortString|string|""| |minimumLength|number|2| |cbOnNewWord|function| (word:string) => {}|
sortString You can sort your words and stats before the service wraps everything up. Thanks to the integrated byStr~Sort npm module. You may find usefull to ceck it's sortString section.
const sortString = ` by order of a greater than b's then by number of a < than b's `;
NOTE : Every
sort sentence
starts byby
and can be ended bythen
to chain other sort sentencesminimumLength You can define the minimum length of words during the analysis, phase.
cbOnNewWord Provides you with a callback function that will be called whenever a new word is encountered. Which means, only once per word.
Stats
The services will gives you a stats matching an instance of IStatsOfWords
or IStatsOfWordsObject
or a simple array of strings
.
Here are the definitions for each of them:
IStatsOfWords
|field|type|notes|
|-|-|-|
|word|string|the word|
|order|number|the order of appearence|
|number|number|the number appearence|
|length|number|the word's length|
IStatsOfWordsObject
Each word will be a key
and stats will be the value
of that pair
||order|number|length|
|-|-|-|-|
|type|number|number|number|
Word detection
It is not that easy to detect words in a text that is quite big and containing many noises. It's not as easy as spliting on every space. And a normal text relies also on punctuation.
By chance French and English punctuation may not very this much or not at all.
Therefore, detecting anything matching anything something else than a "special" character chould be considered as part of a world. Things come very complicated when dealing with languages that are not that strict about isolating words... like japanese or chinese to list very a few.
Here is the regex that helped to detect non special characters :
const special =
/[�\d\s\\[\]\x20-\x40\-`{-~\xA0-\xBF×Ø÷øʹ͵ͺ;!?♪╚-╬┘-▀\uFF3B\uFF40\uFF5B-\uFF65¥・()]/i;