fuzzy-predicate-leven
v1.1.0
Published
Filter an array of objects (or an array of just about anything) down to only include those objects with properties somewhat matching the provided query, even using a leventshein threshold.
Downloads
3
Readme
fuzzy-predicate-leven
Filter an array of objects (or an array of just about anything) down to only include those objects with properties somewhat matching the provided query, even using a levensheintain threshold.
var fuzzy = require("fuzzy-predicate-leven");
var data = [
{
name: "Dan Smith"
},
{
name: "Issac Long"
}
];
var result = data.filter(fuzzy("dan"));
console.log(result);
// [{
// name: "Dan Smith"
// }]
result = data.filter(fuzzy("dun", 0.2));
console.log(result);
// [{
// name: "Dan Smith"
// }]
Installation
npm install fuzzy-predicate-leven --save
Usage
1. Import the library
var fuzzy = require("fuzzy-predicate-leven");
2. Generate a predicate
// where "apple" is the data you're looking for
var predicate = fuzzy("apple");
3. Use the predicate to filter an array
var result = myArray.filter(predicate);
result
now contains only the elements in myArray
that somewhat match the query ("apple").
myArray
could have been an array of strings, an array of numbers, an array of objects, or even an array of arrays. fuzzy-predicate-leven
will recursively search through all the data trying to find something that matches the original query.
Example
fuzzy-predicate-leven
is an ideal tool for using user input to filter a response from a Web service. Let's say you have an array of objects that each represent a user and you wanted to find user(s) named "John":
var fuzzy = require("fuzzy-predicate-leven");
var data = [
{
id: "7128792",
name: "John Doe",
mail: "[email protected]",
twitter: "john_doe"
},
{
id: "1203922",
name: "Jane Doe",
mail: "[email protected]",
twitter: "grannysmithapple"
},
{
id: "9189701",
name: "Dan Smith",
mail: "[email protected]",
twitter: "javascripz",
}
];
var result = data.filter(fuzzy("john"));
In this scenario, result
would be an array containing a single element:
{
id: "7128792",
name: "John Doe",
mail: "[email protected]",
twitter: "john_doe"
}
But what if the query was "smith"?
var result = data.filter(fuzzy("smith"));
result
would contain two elements:
{
id: "1203922",
name: "Jane Doe",
mail: "[email protected]",
twitter: "grannysmithapple"
},
{
id: "9189701",
name: "Dan Smith",
mail: "[email protected]",
twitter: "javascripz",
}
When searching for "smith," fuzzy-predicate-leven
found a match in the Twitter handle for Jane, and the name (and email) property for Dan.
Perhaps we only wanted to find people with a name matching "Smith":
var result = data.filter(fuzzy("smith", ["name"]));
This time, result
would contain only one element:
{
id: "9189701",
name: "Dan Smith",
mail: "[email protected]",
twitter: "javascripz",
}
Documentation
fuzzy(query, keys, leven)
Returns a filter predicate (function) suitable for passing to Array.prototype.filter
.
query
: The filter query to use to reduce an array down to objects matching the query. This can be a string or a number.keys
: Optionally restrict the search to a set of keys; only applied when filtering objects. This can be a string containing the name of a single key, or an array of keys.threshold
: The Dice's Coefficient threshold used to consider matches. This means that differences such as "In my camp Dan has a fire" and "Dun has fire" can be tolerated. It's a fraction between 0 and 1, which indicates the degree of similarity between the needle and haystack. 0 indicates completely different strings, 1 indicates identical strings.
Normalization
What makes this a "fuzzy" filter is that it is looking for values that somewhat match the query—not exact matches.
When comparing strings, the needle (the query) and the haystack value are both normalized following these rules:
- Convert the string to a lowercase string
- Remove all non-word characters (characters matching the
\W
regex and underscores)
Then, instead of checking string equality, it checks to see if the haystack value contains the needle value (using indexOf
). If it does, it's considered a match. If threshold is supplied, then a Dice's Coefficient algorithm is used to compare the degree of difference between the needle and haystack and if it's above or equal to the threshold then it's a match.
This process only applies when comparing strings; numbers must be exactly equal to be considered a match.
License
See LICENSE.
Contributing
I welcome pull requests containing bug fixes and documentation improvements for fuzzy-predicate-leven
. Be sure to run the tests before submitting any changes.
And although I consider fuzzy-predicate-leven
to be mostly feature complete, I welcome discussion on how it could be a more useful tool (e.g. if callers could customize how normalization worked).
Contributors
Emmanuel Mahuni - Added Dice algorithm option for truly fuzzy matches
Attribution
fuzzy-predicate : wasn't mantained and not taking any PR, so had to republish it.