npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

picosearch

v1.0.0

Published

Tiny, customizable module for creating basic full-text search indices and queries using the BM25F algorithm.

Downloads

4

Readme

picosearch

Warning: As long as the version is in the 0.X.X range; version changes are most likely breaking!

Minimalistic, customizable module for creating basic full-text search indices and queries using the BM25F algorithm (used by Lucene, Elasticsearch etc.). The focus is on providing a simple and reusable implementation and configuration with no dependencies.

  • Stemmers and stopwords are not included and must be provided as config values.
  • JSON serializable indices
  • Highly customizable

Installation

yarn add picosearch

or

npm install picosearch

Quickstart

const { createIndex, indexDocument, searchIndex } = require('picosearch')
const porterStemmer = require('porter-stemmer')
const { eng } = require('stopword')

; (async () => {
  // define a (custom) tokenizer for splitting a sentence into tokens
  const tokenizer = (sentence) => sentence.split(' ').map(s => s.trim())

  // define a (custom) anaylzer for preprocessing individual tokens/words
  const REGEXP_PATTERN_PUNCT = new RegExp("['!\"“”#$%&\\'()\*+,\-\.\/:;<=>?@\[\\\]\^_`{|}~']", 'g')
  const analyzer = (token) => {
    let newToken = token.trim().replace(REGEXP_PATTERN_PUNCT, '').toLowerCase()

    if (eng.includes(newToken)) {
      return ''
    }

    return porterStemmer.stemmer(newToken)
  }

  // create a new index with a specific mapping
  const index = createIndex({
    title: 'text',
    body: 'text',
    topic: 'keyword'
  })

  // index some documents
  // raw documents are not stored in the index by default to optimize the index size
  // that's why we keep the data in a lookup mapping that can be used by the search to
  // get the documents later
  const docsLookup = {
    doc1: { title: 'Milk', body: 'A man is drinking milk.', topic: 'a' },
    doc2: { title: 'Bread', body: 'A man is eating breads.', topic: 'a' },
    doc3: { title: 'Butter', body: 'A man is eating bread and butter.', topic: 'b' }
  }
  const docsArray = Object.entries(docsLookup).map(([docId, doc]) => ({ _id: docId, ...doc }))

  docsArray.forEach((doc) => indexDocument(index, doc, analyzer))

  // make an example search on the 'body' and 'title' fields
  console.log(
    await searchIndex(
      index,
      'bread', {
        size: 10,
        queryFields: ['body', 'title'],
        filter: {
          topic: 'a'
        },
        getDocument: docId => docsLookup[docId]
      },
      analyzer,
      tokenizer
    )
  )
  // returns:
  // {
  //   total: 1,
  //   maxScore: 0.08530260953900706,
  //   hits: [ { _id: 'doc2', _score: 0.08530260953900706, _source: [Object] } ]
  // }
})()

See examples/.

API

createIndex(mappings)

TS Doc

Parameters

  • mappings: Mappings An object defining the fields of a document. Possible field types: text, keyword, number, date.

Return Value

Returns an index object to be used for querying and scoring. The raw documents are not included. Depending on the size of the text corpus, the size of the index can very.

indexDocument(index, document, analyzer, tokenizer)

TS Doc

Parameters

  • index The index.
  • document The document to index.
  • analyzer A function for analyzing an individual token.
  • tokenizer A function for splitting a query into individual tokens.

searchIndex(index, query, options, analyzer, tokenizer)

TS Doc

Parameters

  • index The index.
  • query The search query.
  • options The searhc options. See here.
  • analyzer A function for analyzing an individual token.
  • tokenizer A function for splitting a query into individual tokens.

Return Value

A search results object. See here

API Docs

see https://olastor.github.io/picosearch/ for more details.