fuzzy-neon
v0.1.5
Published
Collection of fuzzy string matching algorithms written in rust
Downloads
8
Maintainers
Readme
fuzzy-neon
fuzzy-neon: Collection of fuzzy string matching algorithms written in rust
Created using neon.
Installation
Requires Node
and Cargo
to install.
Node
In order to install fuzzy-neon you may need to first install cargo-cp-artifact
:
$ npm i cargo-cp-artifcat
Installing the Module
$ npm i fuzzy-neon
Usage
const fuzzy = require('fuzzy-neon');
let score = fuzzy.hamming('nick', 'nice');
console.log(score);
// 1
Available String Matching Metrics
- Hamming distance
hamming(str1, str2)
- Levenshtein (recursive impl)
levenshtein(str1, str2)
- Wagner-Fischer (dynamic programming impl of levenshtein)
wagnerFischer(str1, str2)
- Damerau-Levenshtein
damerauLevenshtein(str1, str2)
- Jaro-Winkler Distance
jaroWinkler(str1, str2)
- Longest Common Subsequence
lcs(str1, str2)
- n-gram based distance metric
ngram(str1, str2, ngramSize)
- Jensen-Shannon Vector Distance
jensonshannonVector(str1, str2)
Algorithm Explanation/Useful Links
Hamming
Computes number of positions between two strings where characters differ. Expanded to allow strings of different lengths Example
const fuzzy = require('fuzzy-neon');
let score = fuzzy.hamming('nick', 'nice');
console.log(score);
// 1
Levenshtein
The Levenshtein distance between two strings is the minimum number of single-character edits (insertions, deletions or substitutions) to convert one word to the other. For efficient implementation use Wagner-Fischer Example
const fuzzy = require('fuzzy-neon');
let score = fuzzy.levenshtein('nick', 'nit');
console.log(score);
// 2
Wagner-Fischer
Implementation of Levenshtein using a faster, dynamic programming implementation - interesting to compare performance between the two. Example
const fuzzy = require('fuzzy-neon');
let score = fuzzy.wagnerFischer('nick', 'nit');
console.log(score);
// 2
Damerau-Levenshtein
Extension of the Levenshtein distance metric which allows for adjacent character transpositions Example
const fuzzy = require('fuzzy-neon');
let score = fuzzy.damerauLevenshtein('taco', 'atco');
console.log(score);
// 1
Jaro Winkler Distance
Extension of the Jaro distance between two strings; the Jaro distance uses the relative probability of each character in a string to calculate an edit score between two strings (see here for formula details). Winkler extended this to boost the scores of words which shared prefixes of some length. Returns a normalised score between 0 and 1. Example
const fuzzy = require('fuzzy-neon');
let score = fuzzy.jaroWinkler('nice', 'nick');
console.log(score);
// 0.11549999999999994
N-gram Based distance metric
n-gram based string distance metric implemented based on the work from this paper. Extremely good at integrating context into producing the metric score, O(n^2) complexity. Example
const fuzzy = require('fuzzy-neon');
let score = fuzzy.ngram('rupert', 'robert', 2); // last arg is size of ngram
console.log(score);
// 0.16666666666666666
Jensen Shannon Distance
Computes the relatively probability of events in the string (events being either single characters or bi-grams) porducing a probabilty vector. Then computes the Jensen Shannon distance metric over the two probability vectors. Produced from the work by Richard Connor et al in this paper. O(n) complexity. Example
const fuzzy = require('fuzzy-neon');
let score = fuzzy.jensonshannonVector('hi their', 'hi there');
console.log(score);
// 0.36638698518165513