npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

nickel-search

v0.6.1

Published

Nickel Search Server is a basic serverless prefix search indexer

Downloads

40

Readme

Nickel Search

Nickel Search implements a basic serverless word prefix search.

What is prefix search

In a full text search solution, you expect the server to return documents containing the searched words.

In a prefix search solution, you expect the server to search for all documents containing words starting with a specific prefix.

Given the advanced querying almost any full text search engine allows, prefix search is a subset of a full text search problem. For example, with Lucene (hence Solr, Elastic, and others) you can use * syntax to search for prefixed words. E.g., adv* would return documents containing adventure, advanced, and other words that start from adv.

The goal of this project is to allow prefix search in a serverless way, so that you don't have to pay for servers hosting Solr, Elastic, or another server.

Current issues and TODO

  1. The search doesn't support multi-word search.
  2. The indexing takes a lot of time and RAM.
  3. No support for synonyms, stemming/lemmatization.
  4. No test coverage.
  5. More ranking sampels needed.

How to use

There is a fully functional sample in the /samples directory, which also includes running the indexer as a Docker container on AWS Fargate. See README.md in the /samples directory for more info.

Install Nickel Search:

$> npm install nickel-search

Implement your index model and run indexer:

import nickel from "nickel-search";

class MyBlogPost {
    Title: string;
    Author: string;
    Body: string;
}

const options = {
    // Set fields that will be returned with search results
    getDisplayedFields: (s3Uri: string, document: MyBlogPost) => ({
        Title: document.Title,
        Author: document.Author,
    }),
    // Set fields to search against
    getSearchedFields: (s3Uri: string, document: MyBlogPost) => ({
        Title: document.Title,
    }),
    // number of search results per page has to be set when creating the index
    resultsPageSize: 50,
    // save checkpoints every 100 changes to each hash value
    saveThreshold: 100,
    // shards in the index store
    indexShards: 1000,
    // Implement to set search results sort order.
    sort: (a: ISearchable, b: ISearchable) => {
        let sort = a.weight - b.weight;
        if (sort === 0) {
            sort = a.original.Title.localeCompare(b.original.Title);
        }
        return sort;
    },
    // Data source options
    source: nickel.createDataStore<MyBlogPost>({
        location: "../sample-data/", // existing folder with JSON files matching MyBlogPost
    }),
    // Index store options
    indexStore: nickel.createIndexStore({
        location: "../sample-index/", // existing folder that will store the search index
    }),
};

nickel.indexer(options).run();

In the sample above, the indexer will JSON.decode all files in ../sample-data/, apply getDisplayedFields and getSearchedFields for each file, and save the index in ../sample-index/. The indexer will split the index into 1000 'shards' ({ options.indexShards: 1000 }). The number of shards has to be similar when indexing and searching against the same index.

Run the indexer. When it's done, run the search:

import nickel from "nickel-search";

const indexStore = nickel.createIndexStore({
    location: "../sample-index/", // search index location
});

const ns = nickel.searcher({ indexShards: 1000 }, indexStore);

const searchResults = await ns.search('nic');

See an example in the ./samples directory.

Requirements

  • Indexer can run fairly long.
    • In theory, most time consuming tasks can run in parallel but it is not implemented.
  • It will store the entire index in RAM before saving it, so it will require a lot of RAM.

Features

When to use Nickel Search

Nickel can help if all of the following is true:

  • You have a set of text documents that you want to be able to search using prefixes
  • Your dataset does not change often
  • You don't need advanced query syntax such as provided by Lucene or other implementations
  • You don't want to pay for an always on search server (such as Elastic or Solr)

A simple example scenario is an autocomplete search for book names. We don't need advanced full text search query syntax such as provided by Lucene or other implementation. In a same way many other autocomplete scenarios can be addressed.

When not to use Nickel Search

Don't use Nickel Search if:

  • You need to rank results when querying
  • You have KPIs on index update time
  • You need advanced syntax querying (AND/OR/etc.)
  • You need to get a response in less than 100ms
  • Your dataset is larger than RAM available for indexing
  • For languages other than English (or maybe submit a PR to support that language?)

How it works

Nickel Search is a node.js app that converts a set of documents into a prefix-queriable set of documents, so that you can use the capabilities of the storage system as your prefix-search server. I use it with AWS S3, so it provides a serverless search for my projects.

Future steps

TODO:

  • Deallocate stack after indexing done, keeping the source and target S3 buckets:

    • Move the S3 buckets definition to a different stack, and reference them from the current stack
    • Or delete money-consuming objects from the created stack
  • Add storage to Docker container before indexing starts

  • Remove storage from Docker container when indexing finishes.

  • Create a project directory for fabu.

  • Make indexer resumable.

  • Optimize time and memory usage.

  • Try other features of mature full text search solutions and see if they can be added to Nickel.

Release notes

v0.3

  • Changed the tokenizer to split on more punctuation marks
  • Added local file buffer to reduce RAM consumption
  • Enhanced sorting performance