@vuldin/trie

v1.0.4

Published

2 years ago

rank text by best match of included words and phrases

Downloads

0High
0Medium
0Low

vuldin

trie phrase word match stem search

@vuldin/trie

A trie implementation with a focus on matching phrases. This package is available in es/cjs/umd modules. Add it to an existing project via npm:

npm i @vuldin/trie

Usage

Using this library consists of instantiating, then adding any number of words or phrases with the add function. This function takes a single word, sentences, or paragraphs and creates a trie data structure that is a somewhat different than most other implementations.

Parsing text

Strings sent to this library (whether during intial data structure creation or during lookup) go through several parsing steps. First the string is broken up into phrases or sentences. Then each sentence is stripped of all common words. Remaining uncommon words are finally converted to their stem form to remove any differences during later comparisons that relate to tense or plural forms. The lower case version of these stems are then used to generate the data structure (details on this data structure below). The following text:

Here is a test sentence that contains some common words in English.

becomes:

['here', 'is', 'test', 'sentenc', 'contain', 'common', 'word', 'english']

Data structure

The following data structure is used to ensure that phrases contained in the text can be matched. But since we also want to match on any (uncommon) word, each of these words is additionally added to the root node. This means that each node is added to the trie the same number of times as the place it holds in the array. The tradeoff is that we create a larger data structure but have a more functional lookup (by any uncommon word or phrase).

API

const Trie = require('@vuldin/trie')
const trie = new Trie()
trie.add('Here is a test sentence that contains some common words in English.')
trie.add('Strings can. Contain multiple sentences.')
trie.add('add function calls').add('can be chained')

// finding phrases
trie.find('test sentence') // { count: 2, exact: true }
// common words are ignored
trie.find('test a sentence') // same result

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@vuldin/trie

Usage

Parsing text

Data structure

API