npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

approx-string-match

v2.0.0

Published

A library for approximate string matching.

Downloads

15,324

Readme

approx-string-match

A library for approximate string matching.

This can be used to find occurrences of a pattern P (of length m) in a text T (of length n) allowing for up to a given number of errors (k), where errors may include insertions, substitutions or deletions of characters from the pattern.

For example the pattern "annd" occurs in the string "four score and seven" with one error.

The implementation uses a bit-parallel algorithm by G. Myers [1] which, to the best of my knowledge, is the state of the art algorithm for the online version of the problem (where the text and pattern cannot be preprocessed in advance). It runs in O((k/w) * n) expected-time where k <= m and w is the word size (32 in JavaScript). It also includes some additional optimizations suggested in [3]. See comments in the code for more details.

Usage

npm install --save approx-string-match
import search from "approx-string-match";

const text = "Four score and seven";
const pattern = "annd";
const matches = search(text, pattern, 2 /* max errors */);
console.log(matches);

// Outputs `[{ start: 11, end: 14, errors: 1 }]`

API

The library exports a single function search(text, pattern, maxErrors) which returns an array of the closest matches for pattern in text allowing up to maxErrors errors.

interface Match {
  start: number;
  end: number;
  errors: number;
}

search(text: string, pattern: string, maxErrors: number): Match[]

Implementation notes

Word size

The algorithm uses bitwise operations on integers. Since JavaScript only supports bitwise operations on 32-bit integers, that is the word size, regardless of the platform.

If JS gains support for bitwise operations on larger integers in future, that support could be used to speed up this library.

Unicode

The library currently works on code units rather than code points, where the code unit is a UTF-16 value. What this means is that a change to a unicode character which requires multiple characters to represent in a JavaScript string, such as emoji, would actually count as two changes rather than one. This is because such chars require two elements to represent in a string (eg. "😊".length is 2).

Related reading

For an overview of the different approaches to approximate string matching and the history of the development of solutions, there is a good survey paper [2].

References

[1] G. Myers, “A Fast Bit-Vector Algorithm for Approximate String Matching Based on Dynamic Programming,” vol. 46, no. 3, pp. 395–415, 1999.

[2] G. Navarro, “A guided tour to approximate string matching,” ACM Comput. Surv., vol. 33, no. 1, pp. 31–88, 2001.

[3] Šošić, M. (2014). "An SIMD dynamic programming c/c++ library" (Doctoral dissertation, Fakultet Elektrotehnike i računarstva, Sveučilište u Zagrebu).