npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

surch

v0.3.0

Published

Create and query searchable document indices.

Downloads

2,829

Readme

surch

Build status Package status Downloads License

Create and query searchable document indices in Node.js.

Why would I want to do that?

You probably don't. Use the index functionality provided by your database layer or a proper search engine instead.

For my part, I wanted to implement in-memory fuzzy search as a hack for services that don't support fuzziness in their API.

How does it work?

When documents are added to the index, strings are broken down into n-grams (trigrams by default) to be used as keys in an inverted index. When searching, queries are broken down in the same way. This approach has two benefits:

  • Matches are fuzzy. A single query can match completely separate portions of an indexed string.

  • Queries are fast. Each n-gram is a key into the inverted index, so there is no need to iterate through every string.

To satisfy the most common use-case (i.e. mine), whitespace in each query is treated as a separator between subqueries. Subquery results are then intersected by document id to produce the overall result.

How do I install it?

Via npm:

npm i surch --save

Or if you just want the git repo:

git clone [email protected]:philbooth/surch.git

How do I use it?

Loading the library

Use require:

const surch = require('surch');

Creating an index

Call create(key), where key is the name of the property you wish to be indexed:

const index = surch.create('foo');

create also takes an optional second argument, which allows different aspects of the internal behaviour to be configured:

const index2 = surch.create('bar', {
  idKey,         // The identity key for documents. Defaults to `'_id'`.
  minLength,     // The minimum queryable substring length. Defaults to `3`.
  caseSensitive, // Enables case-sensitive matching. Defaults to `false`.
  strict,        // Enables strict (entire words) matching. Defaults to `false`.
  fuzzy,         // Enables fuzzy matching. Defaults to `false`.
  coerceId       // Coercion function for object-based ids. Defaults to `id => id`.
});

Adding documents to an index

Call add(document), where document is the object you want to add to the index:

index.add({
  _id: 'ffox1',
  foo: 'Down in the valley there were three farms.'
});
index.add({
  _id: 'ffox2',
  foo: 'The owners of these farms had done well.'
});
index.add({
  _id: 'ffox3',
  foo: 'They were rich men.'
});

Searching for matching documents

Call search(query), where query is the string that you'd like to match against:

index.search('farm');
// Returns [
//   {
//     id: 'ffox2', indices: [ 20 ], score: 10,
//     match: 'The owners of these farms had done well.'
//   },
//   {
//     id: 'ffox1', indices: [ 36 ], score: 10,
//     match: 'Down in the valley there were three farms.'
//   }
// ]

index.search('valley farm');
// Returns [
//   {
//     id: 'ffox1', indices: [ 12, 36 ], score: 26,
//     match: 'Down in the valley there were three farms.'
//   }
// ]

The result is an array of objects that identify the matched document, the matching string, the indices of each matched substring within that string and a weighting score indicating the strength of the match as a whole.

The maximum score is 100 and the array is sorted in descending score order. If two results have the same score, the match with the lowest index (i.e. closest to the start of the string) comes first.

Deleting documents from an index

Call delete(id), where id identifies the document that you wish to delete:

index.delete('ffox2');

Updating documents in an index

Call update(document), where document is the updated object:

index.update({
  _id: 'ffox1',
  foo: 'Their names were Farmer Boggis, Farmer Bunce and Farmer Bean.'
});

Clearing an index

Call clear() to delete all documents from an index:

index.clear();

How is punctuation handled?

Punctuation is ignored. For instance, a document containing the string 'King\'s Cross' will be matched by both of the queries 'King\'s Cross' and 'Kings Cross'.

Does it understand unicode?

Yes. Documents are indexed in their NKFC-normalised form so lookalikes such as 'ma\xf1ana' and 'man\u0303ana' are matched identically.

Does it handle object-based document ids?

Yes. Object-based document ids work out-of-the-box, but you may want to coerce them to a different type using the coerceId option to create.

Document ids are always compared using ===, so require consistent object references to be passed to add, update and delete. The coerceId function is called on entry to each of these methods and can be used to ensure that object-based ids are handled sanely.

For instance, to coerce MongoDB ObjectId references to strings, you could do the following:

const index = surch.create('foo', {
  coerceId: id => id.toString()
});
index.add({
  _id: new ObjectId('58847582a08c71481a672cc3'),
  foo: 'The quick brown fox jumps over the lazy dog.'
});

Note that the coerceId option also affects the id property returned by search:

index.search('fox');
// Returns [
//   { id: '58847582a08c71481a672cc3', ... }
// ]

What should I be careful about?

It's entirely your responsibility to keep the index synchronised with your data store. Among other things, that means you need to handle restarts sanely. When your application starts, you need to populate the index with all of the documents from your database. And as you insert, update or delete items, you need to update the index accordingly.

Is there a change log?

Yes.

How do I set up the dev environment?

To install the dependencies:

npm i

To run the tests:

npm t

To lint the code:

npm run lint

What license is it released under?

MIT.