subtlex-word-frequencies
v2.0.0
Published
List of 74,286 words sorted by frequency of use in spoken English
Downloads
41
Readme
subtlex-word-frequencies
List of 74,286 words sorted by frequency of use in spoken English.
The word counts are derived from SUBTLEXus, a corpus of American English subtitles of movies.
Install
npm:
npm install subtlex-word-frequencies
Use
var subtlex = require('subtlex-word-frequencies')
console.log(words.length)
console.log(words.slice(0, 3))
console.log(words.filter(d => d.word.match(/chick/)).slice(0, 5))
Yields:
74286
[
{word: 'you', count: 2134713},
{word: 'I', count: 2038529},
{word: 'the', count: 1501908}
]
[
{word: 'chicken', count: 3148},
{word: 'chick', count: 1334},
{word: 'chicks', count: 742},
{word: 'chickens', count: 520},
{word: 'chickenshit', count: 85}
]
API
subtlexWordFrequencies
Array.<Entry>
— List of all entries in SUBTLEXus.
Each entry has the following properties:
word
(string
) — Unique word (example:git
)value
(number
) — Number of times the word appears in the corpus (example:101
)
word
starts with a capital when the word more often starts with an uppercase
letter than with a lowercase letter (example: I
).
The entire original corpus consists of 51 million words.