npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

simple-fast-prefix-completions

v0.5.0

Published

[![npm](https://img.shields.io/npm/v/simple-fast-prefix-completions)](https://www.npmjs.com/package/simple-fast-prefix-completions) [![npm bundle size](https://img.shields.io/bundlephobia/minzip/simple-fast-prefix-completions)](https://www.npmjs.com/packa

Downloads

36

Readme

Simple Fast Prefix Completions

npm npm bundle size GitHub Workflow Status Coverage Status

This package implements simple and fast prefix completions in javascript. Given a large list of words this package can rapidly perform a prefix search and return either all the words matching a given prefix in alphabetical order or the top k words matching a given prefix ordered by a numerical ranking key. The package also supports mapping words to unique ids.

import { SimpleFastPrefixCompletions } from "simple-fast-prefix-completions";

const wordsWithIds = [
  ["sally", 0],
  ["sells", 1],
  ["seashells", 2],
  ["by", 3],
  ["the", 4],
  ["seashore", 5],
];
const completions = new SimpleFastPrefixCompletions({
  wordsWithIds,
});

// search by prefix returns prefixes in lexicographically sorted order
console.log(completions.findWordsWithIds("se"));
// [["seashells", 2], ["seashore", 5], ["sells", 1]]

// note you can still use other methods which do not depend on ids
console.log(completions.findWords("se"));
// ["seashells", "seashore", "sells"]

// serialize to json
const serialized = completions.toJSON();

// deserialize from json
const deserialized = SimpleFastPrefixCompletions.fromJSON(serialized);

Usage

Observable Example for Prefix Completions

Observable Example for Top-K Completions

Construction

You can create the completion datastructure by passing one of the following options to the constructor words, wordsWithIds, rankedWords, rankedWordsWithIds. The package makes no assumptions about your ids. They can be strings, numbers, objects, basically anything you want. If you want to be able to serialize the datastructure your ids must be serializable though.

The rankings are assumed to be ordered such that lower rankings come before higher rankings.

words - expects an array of strings ["sally", "sells", "seashells"]

wordsWithIds - expects an array of [string, id] tuples. [["sally", 50], ["sells", 30], ["seashells", 25]]

rankedWords - expects an array of [string, number] tuples. The number is a ranking. [["sally", 0], ["sells", 1], ["seashells", 2]]

rankedWordsWithIds - expects an array of [string, number, id] tuples. The number is a ranking and the id is the id. [["sally", 0, 50], ["sells", 1, 30], ["seashells", 2, 25]]

// example using the words option
const words = ["sally", "sells", "seashells"];
const completions = new SimpleFastPrefixCompletions({
  words,
});

Querying

The datastructure exposes four methods for querying data.

findWords(prefix): string[] - takes a prefix and returns words which start with that prefix in alphabetical order

findWordsWithIds(prefix): [string, id][] - takes a prefix and returns words which start with that prefix in alphabetical order and their associated ids

findTopKWords(prefix, k): [string][] - takes a prefix and a number k and returns the top k words which start with that prefix ordered by the passed in ranking

findTopKWordsWithIds(prefix, k): [string, id][] - takes a prefix and a number k and returns the top k words which start with that prefix ordered by the passed in ranking along with the ids associated with each word

If you call a method and have not provided data the method needs then the method will error. For example if you passed words and query for findWordsWithIds you will get an error since you have not provided ids.

Serializing

You can save time building the datastructure by building the datastructure once and then serializing it with toJSON and fromJSON

const words = ["sally", "sells", "seashells"];
const completions = new SimpleFastPrefixCompletions({
  words,
});
// serialize to json string
const serialized = completions.toJSON();

// deserialize from json string
const deserialized = SimpleFastPrefixCompletions.fromJSON(serialized);

Separator

Internally the datastructure concatenates all your strings together separated by a single character sentinel \u0001. If this character is in your dataset you can pass your own sentinel, just choose a single width character that is not in your dataset.

Rough Approximation of Runtime and Space Constraints

The memory consumed should be approximately O(5n + c) where n is the number of words provided and c is the concatenated length of all the words. 2n for the segment tree, n for word starts, c for words concatenated into a single string, n for word ids, and n for word rankings.

The build time is approximately O(n log n) + O(n).

Query time to find m matches for prefix of length p is approximately O(p log n) + O(m)

Query time to find k top-k matches for a prefix of length p is approximately O(p log n) + O(k log k).

Internal Architecture

Take the input words, sort them lexicographically, and concatenate them into a single string separated by Separator. While concatenating the strings build an array containing the offset into the string for the start of each word. When the user searches for a prefix, we binary search the array of offsets twice, finding the indices of the leftmost and rightmost offsets which match the prefix.

The array of offsets is already sorted so if the user is simply searching for prefix matches we iterate over the array from the leftmost index to the rightmost index and return the words matching the prefix. To return a word we iterate from the start offset to the first instance of a Separator.

To support top-k indices we build a SegmentTree over the indices of the sorted word array which can be used to query index of the minimum value for any interval in the word array. When we find our leftmost and rightmost indices we add the interval to a priority queue. While the queue is not empty we pop the interval associated with the lowest ranked term, and then add the left and right intervals to the queue.