extract-lemmatized-nonstop-words
v1.0.19
Published
Extracts a pure list of lemmatized words of a text filtered by stop words
Downloads
264
Readme
Extract lemmatized nonstop words
Extracts a pure list of lemmatized words of a text filtered by stop words.
Features
- Removing stopwords.
- Removing proper noun.
- Regular past tense verb and past participle verb to present form:
created
tocreate
- Present form (3rd person) to present form:
creates
tocreate
- Plural noun to singular:
cats
tocat
- Gerund form verb to present form:
creating
tocreate
Install
install using Yarn:
yarn add extract-lemmatized-nonstop-words
install using NPM:
npm i --save extract-lemmatized-nonstop-words
Usage
const extract = require('extract-lemmatized-nonstop-words');
const words = extract('He created these categories and they are better.');
returns:
Array (3 items)
0: Object
lemma: "create"
normal: "created"
pos: "VBD"
tag: "word"
value: "created"
vocabulary: "create"
1: Object
lemma: "category"
normal: "categories"
pos: "NNS"
tag: "word"
value: "categories"
vocabulary: "category"
2: Object
lemma: "good"
normal: "better"
pos: "JJR"
tag: "word"
value: "better"
vocabulary: "better"
API
extract(text, filter) ⇒ Array.<Object>
Extracts a pure list of lemmatized words of a text filtered by stop words. it will remove non-word tokens, ones which their length is less than 3 and contains non-alphabetic charachters.
| Param | Type | Description |
| --- | --- | --- |
| text | String | input text |
| filter | Array.<String> | list of custom stopword which will replace with defaults, in case of passing false
filtering results by stopwords will ignore. |
Annotation Specification
Annotation | Name | Example
--- | --- | ---
NN
| Noun | dog
man
NNS
| Plural noun | dogs
men
NNP
| Proper noun | London
Alex
NNPS
| Plural proper noun | Smiths
VB
| Base form verb | be
VBP
| Present form verb | throw
VBZ
| Present form (3rd person) | throws
VBG
| Gerund form verb | throwing
VBD
| Past tense verb | threw
VBN
| Past participle verb | thrown
MD
| Modal verb | can
shall
will
may
must
ought
JJ
| Adjective | big
fast
JJR
| Comparative adjective | bigger
JJS
| Superlative adjective | biggest
RB
| Adverb | not
quickly
closely
RBR
| Comparative adverb | less-closely
faster
RBS
| Superlative adverb | fastest
DT
| Determiner | the
a
some
both
PDT
| Predeterminer | all
quite
PRP
| Personal Pronoun | I
you
he
she
PRP$
| Possessive Pronoun | I
you
he
she
POS
| Possessive ending | 's
IN
| Preposition | of
by
in
PR
| Particle | up
off
TO
| to | to
WDT
| Wh-determiner | which
that
whatever
whichever
WP
| Wh-pronoun | who
whoever
whom
what
WP$
| Wh-possessive | whose
WRB
| Wh-adverb | how
where
EX
| Expletive there | there
CC
| Coordinating conjugation | &
and
nor
or
CD
| Cardinal Numbers | 1
7
77
one
LS
| List item marker | 1
B
C
One
UH
| Interjection | ah
oh
oops
FW
| Foreign Words | viva
mon
toujours
,
| Comma | ,
:
|Mid-sent punct | :
;
...
.
| Sent-final punct. | .
!
?
(
| Left parenthesis | )
}
]
)
| Right parenthesis | (
{
[
#
| Pound sign | #
$
| Currency symbols | $
€
£
¥
SYM
| Other symbols | +
*
/
<
>
EM
| Emojis & emoticons | :)
❤