npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@neopass/wordlist

v0.5.2

Published

Generate a word list from various sources, including SCOWL

Downloads

980

Readme

Build Status

wordlist

Generate a word list from various sources, including system dictionaries and SCOWL.

Includes a default list of ~86,000 english words.

Additional dictionary/wordlist paths can be configured via the options. System dictionaries exist at locations such as /usr/share/dict/words, /usr/share/dict/british-english, etc.

Contents

Installation

npm install @neopass/wordlist

Usage

There are three functions available for creating word lists: wordList, wordListSync, and listBuilder. The default list is included by default, so no configuration of options is required.

wordList builds and returns the list asynchronously:

const { wordList } = require('@neopass/wordlist')

wordList().then(list => console.log(list.length)) // 86748

wordListSync builds and returns the list synchronously:

const { wordListSync } = require('@neopass/wordlist')

const list = wordListSync()
console.log(list.length) // 86748

listBuilder calls back each word asynchronously:

const { listBuilder } = require('@neopass/wordlist')

const builder = listBuilder()
const list = []

builder(word => list.push(word))
  .then(() => console.log(list.length)) // 86748

Options

export interface IListOptions {
  /**
   * Word list paths to search for in order. Only the first
   * one found is used. This option is ignored if 'combine'
   * is a non-empty array.
   *
   * default: [
   *  '$default',
   * ]
   */
  paths?: string[]
  /**
   * Word list paths to combine. All found files are used.
   */
  combine?: string[]
  /**
   * Mutate the list by filtering on lower-case words, converting to
   * lower case, or applying a custom mutator function.
   */
  mutator?: 'only-lower'|'to-lower'|Mutator
}

paths: Allows alternate, fallback lists to be used.

combine: Allows multiple lists to be combined into one.

mutator: mutates the list depending on the value provided.

  • only-lower: Filter out words that are not strictly comprised of characters [a-z].
  • to-lower: Convert words to lower case.
  • Mutator: (word: string) => string|string[]|void: a custom function that receives a word and returns one or more words, or undefined. Used for custom transformation/exclusion of words in the list.

Return values:

  • string: the returned string is added to the list.
  • string[]: all returned strings are added to the list.
  • For any other return value the word is not added.
const { wordList } = require('@neopass/wordlist')

/**
 * Create a custom mutator for splitting hyphenated words
 * and converting them to lower case.
 */
function customMutator(word: string) {
  // Will return ['west', 'ender'] for an input of 'West-ender'.
  return word.split('-').map(word => word.toLowerCase())
}

const options = {
  paths: ['/some/list/path/words.txt'],
  mutator: customMutator,
}

const list = await wordList(options)
assert(list.includes('west'))
assert(list.includes('ender'))

Specify Alternate Word Lists

The paths specified in options are searched in order and the first list found is used. This allows for the use of system word lists with different names and/or locations on various platforms. A common location for the system word list is /usr/share/dict/words.

const { wordList } = require('@neopass/wordlist')

// Prefer british-english list.
const options = {
  paths: [
    '/usr/share/dict/british-english',  // if found, use this one
    '/usr/share/dict/american-english', // else if found, use this one
    '/usr/share/dict/words',            // else if found, use this one
    '$default',  // else use this one
  ]
}

wordList(options)
  .then(list => console.log(list.length)) // 101825

Combine Lists

Lists can be combined into one with the combine option:

const { wordList } = require('@neopass/wordlist')

// Combine multiple dictionaries.
const options = {
  combine: [
    // System dictionary.
    '/usr/share/dict/words', // use this one
    '$default',              // and use this one
  ]
}

wordList(options)
  .then(list => console.log(list.length)) // 335427

Important: Using combine with wordList/wordListSync will result in duplicates if the lists overlap. It is recommended to use combine with listBuilder to control how words are added. For example, a Set can be used to eliminate duplicates from combined lists:

const { listBuilder } = require('@neopass/wordlist')

// Combine multiple lists.
const options = {
  combine: [
    // System dictionary.
    '/usr/share/dict/words',
    // Default list.
    '$default',
  ]
}

// Create a list builder.
const builder = listBuilder(options)

// Create a set to avoid duplicate words.
const set = new Set()

// Run the builder.
builder(word => set.add(word))
  .then(() => console.log(set.size)) // 299569

The Default List

The default list is a ~86,000-word, PG-13, lower-case list taken from english SCOWL sources, with some other additions including slang.

Suggestions for additions to the default list are welcome by submitting an issue. Whole lists are definitely preferred to single-word suggestions, e.g., "notable extraterrestrials in history", "insects of upper polish honduras", or "names of horses in modern literature". Suggestions for inappropriate word removal are also welcome (curse words, coarse words/slang, racial slurs, etc.).

By default the list alias, $default, is included in the options. This allows wordlist to create a largish list without any additional configuration.

export const defaultOptions: IListOptions = {
  paths: [
    '$default'
  ]
}
/**
 * We don't need to specify a config because the `$default` alias
 * is part of the default configuration.
 */
const list = wordListSync()

The $default alias (along with other aliases) resolves to a path at run time.

Generate a List From Scowl Sources

SCOWL word lists are included as aliases, and can be used to generate custom lists:

const { listBuilder } = require('@neopass/wordlist')

// Combine multiple lists from scowl.
const options = {
  combine: [
    '$english-words.10',
    '$english-words.20',
    '$english-words.35',
    '$special-hacker.50',
  ]
}

// Create a list builder.
const builder = listBuilder(options)

// We'll add the words to a set.
const set = new Set()

// Run the builder.
builder(word => set.add(word))
  .then(() => console.log(set.size)) // 49130

Warning: Some SCOWL sources contain words not approprate for all audiences, including swear words, racial slurs, and words of a sexual nature. You'll most likely want to scrutinize these sources depending on your use case and intended audience.

SCOWL is primarily intened as a source for spell checkers. From the SCOWL website:

SCOWL (Spell Checker Oriented Word Lists) and Friends is a database of information on English words useful for creating high-quality word lists suitable for use in spell checkers of most dialects of English. The database primary contains information on how common a word is, differences in spelling between the dialects if English, spelling variant information, and (basic) part-of-speech and inflection information.

Note: SCOWL sources contain some words with apostrophes 's and also unicode characters. Care should be taken to deal with these depending on your needs. For example, we can transform words to remove any trailing 's characters and then only accept words that contain the letters a-z:

const { listBuilder } = require('@neopass/wordlist')

/**
 * Remove trailing `'s` from words.
 */
function transform(word) {
  if (word.endsWith(`'s`)) {
    return word.slice(0, -2)
  }
  return word
}

/**
 * Determine if a word should be added.
 */
function accept(word) {
  // Only accept words with characters a-z (case insensitive).
  return (/^[a-z]+$/i).test(word)
}

// Combine multiple lists from scowl.
const options = {
  combine: [
    '$english-words.10',
    '$english-words.20',
    '$english-words.35',
    '$special-hacker.50',
  ]
}

// Create a list builder.
const builder = listBuilder(options)

// Create a set to avoid duplicate words.
const set = new Set()

// Run the builder.
const _builder = builder((word) => {
  word = transform(word)

  if (accept(word)) {
    set.add(word)
  }
})

_builder.then(() => console.log(set.size)) // 38714

Scowl Aliases

A path alias is defined for every SCOWL source list. SCOWL aliases consist of the $ character followed by the source file name. The below is a representative sample of the available source aliases.

$american-abbreviations.70
$american-abbreviations.95
$american-proper-names.80
$american-proper-names.95
$american-upper.50
$american-upper.80
$american-upper.95
$american-words.35
$american-words.80
$australian-abbreviations.35
$australian-abbreviations.80
$australian-contractions.35
$australian-proper-names.35
$australian-proper-names.80
$australian-proper-names.95
$australian-upper.60
$australian-upper.95
$australian-words.35
$australian-words.80
$australian_variant_1-abbreviations.95
$australian_variant_1-contractions.60
$australian_variant_1-proper-names.80
$australian_variant_1-proper-names.95
$australian_variant_1-upper.80
$australian_variant_1-upper.95
$australian_variant_1-words.80
$australian_variant_1-words.95
$australian_variant_2-abbreviations.80
$australian_variant_2-abbreviations.95
$australian_variant_2-contractions.50
$australian_variant_2-contractions.70
$australian_variant_2-proper-names.95
$australian_variant_2-upper.80
$australian_variant_2-words.55
$australian_variant_2-words.95
$british-abbreviations.35
$british-abbreviations.80
$british-proper-names.80
$british-proper-names.95
$british-upper.50
$british-upper.95
$british-words.10
$british-words.20
$british-words.35
$british-words.95
$british_variant_1-abbreviations.55
$british_variant_1-contractions.35
$british_variant_1-contractions.60
$british_variant_1-upper.95
$british_variant_1-words.10
$british_variant_1-words.95
$british_variant_2-abbreviations.70
$british_variant_2-contractions.50
$british_variant_2-upper.35
$british_variant_2-upper.95
$british_variant_2-words.80
$british_variant_2-words.95
$british_z-abbreviations.80
$british_z-abbreviations.95
$british_z-proper-names.80
$british_z-proper-names.95
$british_z-upper.50
$british_z-upper.95
$british_z-words.10
$british_z-words.95
$canadian-abbreviations.55
$canadian-proper-names.80
$canadian-proper-names.95
$canadian-upper.50
$canadian-upper.95
$canadian-words.10
$canadian-words.95
$canadian_variant_1-abbreviations.55
$canadian_variant_1-contractions.35
$canadian_variant_1-proper-names.95
$canadian_variant_1-upper.35
$canadian_variant_1-upper.80
$canadian_variant_1-words.35
$canadian_variant_1-words.95
$canadian_variant_2-abbreviations.70
$canadian_variant_2-contractions.50
$canadian_variant_2-upper.35
$canadian_variant_2-upper.80
$canadian_variant_2-words.35
$canadian_variant_2-words.80
$english-abbreviations.20
$english-abbreviations.80
$english-contractions.35
$english-contractions.80
$english-contractions.95
$english-proper-names.35
$english-proper-names.80
$english-upper.35
$english-upper.80
$english-words.80
$english-words.95
$special-hacker.50
$special-roman-numerals.35
$variant_1-abbreviations.55
$variant_1-abbreviations.95
$variant_1-contractions.35
$variant_1-proper-names.80
$variant_1-proper-names.95
$variant_1-upper.35
$variant_1-upper.80
$variant_1-words.20
$variant_1-words.80
$variant_2-abbreviations.70
$variant_2-abbreviations.95
$variant_2-contractions.50
$variant_2-contractions.70
$variant_2-upper.35
$variant_2-upper.95
$variant_2-words.35
$variant_2-words.95
$variant_3-abbreviations.40
$variant_3-abbreviations.95
$variant_3-words.35
$variant_3-words.95

See the SCOWL Readme for a description of SCOWL sources.

Create a Custom Word List File

A custom word list file from miscellaneous sources can be assembled with the wordlist-gen binary, or the word-gen utility in the wordlist repo.

From the @neopass/wordlist package:

npx wordlist-gen --sources <path1 path2 ...> [options]

From the wordlist repo:

git clone [email protected]:neopass/wordlist.git
cd wordlist
node bin/word-gen --sources <path1 path2 ...> [options]

First, set up a directory of book and/or word list files, for example:

root
  +-- data
    +-- books
    | -- modern steam engine design.txt
    | -- how to skin a rabbit.txt
    +-- lists
    | -- names.txt
    | -- animals.txt
    | -- slang.txt
    +-- scowl
    | -- english-words.10
    | -- english-words.20
    | -- english-words.35
    | -- special-hacker.50
    +-- exclusions
    | -- patterns.txt

The structure doesn't really matter. The format should be utf-8 text, and can consist of one or more words per line. exclusions is optional.

npx wordlist-gen --sources data/books data/lists data/scowl --out my-words.txt

sources can specify multiple files and/or directories.

Note: only words consisting of letters a-z are added, and they're all lower-cased.

Exclusions

Words can be scrubbed by specifying exclusions:

node bin/word-gen <...> --exclude data/exclusions

Much like the sources, exclusions can consist of multiple files and/or directories in the following format:

# Exclude whole words (case insensitive):
spoon
fork
Tongs

# Exclude patterns (as regular expressions):
/^fudge/i   # words starting with 'fudge'
/crikey/i   # words containing 'crikey'
/shazam$/   # words ending in lowercase 'shazam'
/^BLASTED$/ # exact match for uppercase 'blasted'

Using the Custom List

Use path.resolve or path.join to create an absolute path to your custom word list file:

const path = require('path')
const { wordList } = require('@neopass/wordlist')

const options = {
  paths: [
    // Use a path relative to the location of this module.
    path.resolve(__dirname, '../my-words.txt')
  ]
}

wordList(options)
  .then(list => console.log(list.length)) // 124030

SCOWL License

Copyright 2000-2016 by Kevin Atkinson

Permission to use, copy, modify, distribute and sell these word
lists, the associated scripts, the output created from the scripts,
and its documentation for any purpose is hereby granted without fee,
provided that the above copyright notice appears in all copies and
that both that copyright notice and this permission notice appear in
supporting documentation. Kevin Atkinson makes no representations
about the suitability of this array for any purpose. It is provided
"as is" without express or implied warranty.

Full License | SCOWL