handlens
v1.0.0-b8
Published
Search like you expect
Downloads
3
Readme
🔎 handlens
Search like you expect
handlens is a document full-text search engine with zero dependencies.
Table of Contents
Installation
npm install handlens
Setup
import handlens from "handlens";
Pass a function to the handlens()
function.
The function is called with the new index as the context (the value of this), and as the first parameter.
Note that if you use an arrow function expression, you must use the first parameter since the arrow function will not rebind its
this
value
Using the index provided to your function, set fields to index, documents to search, and an optional document reference (the property to use to uniquely identify the document).
var mySearchableIndex = handlens( ( index ) => {
index.fields = [
"body",
"title"
];
index.documents = [
{
"bookId": 1,
"title": "A Tale of Two Cities",
"source": "https://en.wikiquote.org/wiki/A_Tale_of_Two_Cities",
"body": "It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us, we were all going direct to Heaven, we were all going direct the other way – in short, the period was so far like the present period, that some of its noisiest authorities insisted on its being received, for good or for evil, in the superlative degree of comparison only."
},
{
"bookId": 2,
"title": "Of Mice And Men",
"source": "https://2paragraphs.com/2012/08/of-mice-and-men/",
"body": "A few miles south of Soledad, the Salinas River drops in close to the hillside bank and runs deep and green."
}
];
} );
All of these values are also available later, but you will need to tell the index to rebuild.
var mySearchableIndex = handlens();
mySearchableIndex.fields = [ "body", "title" ];
mySearchableIndex.documents = [
{
"bookId": 1,
"title": "A Tale of Two Cities",
"source": "https://en.wikiquote.org/wiki/A_Tale_of_Two_Cities",
"body": "It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us, we were all going direct to Heaven, we were all going direct the other way – in short, the period was so far like the present period, that some of its noisiest authorities insisted on its being received, for good or for evil, in the superlative degree of comparison only."
},
{
"bookId": 2,
"title": "Of Mice And Men",
"source": "https://2paragraphs.com/2012/08/of-mice-and-men/",
"body": "A few miles south of Soledad, the Salinas River drops in close to the hillside bank and runs deep and green."
}
];
mySearchableIndex.rebuild();
Searching
Once you have created an index, it can be searched at any time.
var mySearchableIndex = handlens( ( index ) => {
index.fields = [
"body",
"title"
];
index.documents = [
{
"bookId": 1,
"title": "A Tale of Two Cities",
"source": "https://en.wikiquote.org/wiki/A_Tale_of_Two_Cities",
"body": "It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness, it was the epoch of belief, it was the epoch of incredulity, it was the season of Light, it was the season of Darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us, we were all going direct to Heaven, we were all going direct the other way – in short, the period was so far like the present period, that some of its noisiest authorities insisted on its being received, for good or for evil, in the superlative degree of comparison only."
},
{
"bookId": 2,
"title": "Of Mice And Men",
"source": "https://2paragraphs.com/2012/08/of-mice-and-men/",
"body": "A few miles south of Soledad, the Salinas River drops in close to the hillside bank and runs deep and green."
}
];
} );
mySearchableIndex.search( "mice cities" );
// returns: [ { "ref": "1" }, { "ref": "2" } ]
You can search only specific fields.
mySearchableIndex.search( "body:mice title:cities" );
// returns: [ { "ref": "1" } ]
You can search with boolean AND.
// implicit Boolean OR
mySearchableIndex.search( "body:mice title:cities" );
// returns: [ { "ref": "1" } ]
// explicit Boolean OR
mySearchableIndex.search( "body:mice OR title:cities" );
// returns: [ { "ref": "1" } ]
// Boolean AND
mySearchableIndex.search( "title:mice AND title:cities" );
// returns: [] <-- No documents contain "mice" AND "cities" in the title
.search
is great for allowing user input, but it requires a lot of inefficient string parsing.
If you are searching programmatically, you should use .query
instead. .search
runs the string input through a querybuilder and then immediately calls .query
with the resulting queries.
A query is an array of objects in the format:
{
"bool": boolean,
"fields": {
"*": tokens,
fieldName: tokens
}
}
boolean
is one of [ "OR", "AND" ]
. If "OR"
, all tokens will be compared individually. A document that matches one token but does not match another given token will still be considered a match. If "AND"
, every given token must be matched in a single document for it to be considered matching.
tokens
must be an array of token strings. Note that by default tokens are processed by splitting on whitespace, so if you provide tokens in another format, you will not get matches.
fieldName
must be any registered field. That is, if you created the index with index.fields = [ "alpha" ];
the only allowable value for fieldName
is "alpha"
.
The "any field" ("*"
) entry is required, but the value can be an empty array.
Settings
When creating an index (or once one is created), various settings can be altered that could significantly alter the way handlens
works.
var myIndex = handlens( ( index ) => {
index.settings.documents.retainAfterIndex = false;
} );
myIndex.settings.documents.retainAfterIndex = true;
All Settings:
| Setting | Default | What It Does |
|---------|---------|--------------|
| settings.documents.retainAfterIndex
| true
| If this value is false, rebuilding the index will delete all of the source documents. Adding a document with this set to false will not store the new document. This is nice if you have an enormous amount of data and you can reference it elsewhere so that the index itself doesn't store a copy of everything. |
| settings.tokenize.separator
| /\s+/
| Determines how the tokenizer splits strings. By default it grabs as much contiguous whitespace as possible and splits the tokens on that. This value is passed directly to String.prototype.split
, so it can either be a RegExp or a String. |
| settings.tokenize.lowercase
| true
| If false, the tokenizer will not lowercase every token it finds. Convenient if you want case-sensitive searching, but keep in mind that hello
and Hello
are not the same token if lowercasing is turned off. |
| settings.stopwords.lang
| "en"
| The language to use when stripping stopwords from tokens. |
| settings.stopwords.list
| { "en": [...], ... }
| Very long lists of stopwords for a bunch of languages. Arrays of strings keyed by ISO 639-1 two-letter language code. |
Advanced
Rather than constantly rebuilding the index with a modified set of documents - even the set is only different by one or two - you can use .addDocument
.
var idx = handlens();
idx.documents = [ ..., ... ];
idx.rebuild();
idx.addDocument( { ... } );
This will index just that document without re-indexing every other document.
The same format is available for fields.
var idx = handlens();
idx.fields = [ "alpha" ];
idx.rebuild();
idx.addField( "beta" );
Note, however, that adding a field changes the entire root structure of the index, so a .rebuild
is issued after adding a field.
It would be prudent to determine the list of fields before initializing the index. Likewise, if you need to add a number of fields, it would be best to simply push them onto the list and then issue a single .rebuild
at the end. This method is provided for convenience only and is not the most efficient way to modify the list of fields.
Planned Features
- Parenthetical groups, distribution, and expansion
title:(mice cities) AND body:(winter river)
- This query should search for:
- mice in the title AND winter in the body
- OR mice in the title AND river in the body
- OR cities in the title AND winter in the body
- OR cities in the title AND river in the body
- This query should search for:
- This behavior is currently achievable by being extremely verbose
title:mice AND body:winter title:mice AND body:river title:cities AND body:winter title:cities AND body:river
- Affinities
- Matches should be ranked by how high the affinity is between the document and the query.
- A document that is nothing but the word
"cats"
repeated hundreds of times should have a much higher affinity for a search likecats
than an article that is regular English, even if it is about the topic of cats (and may therefore contain the wordcats
a few times).
- A document that is nothing but the word
- Matches should be ranked by how high the affinity is between the document and the query.
- Field Boosting
- It should be possible to boost a field at query time or at index time to increase the affinity of matches found in that field
- Token Boosting
- It should be possible to boost a single token at query time so that matches for that token have an increased affinity
- Fuzzy matching
hallo
should matchhello
with a small affinity hit- More info
- Stemming
automobile
andautomotive
should be stemmed toautomo
and searches forautomotive
should also matchautomobile
with a small affinity hit (and vice versa).- More info