npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

fast-streamsearch

v1.0.1

Published

Streaming searching for Node.js, but up to 100 times faster than streamsearch 🚀

Downloads

6

Readme

⚠️ This repo has Moved to @stream-toolbox/search

fast-streamsearch

fast-streamsearch 是一款 Node.js 流搜索工具,最快可以比 streamsearch 快 100 倍 🚀。

简体中文English

特点

  • 速度快
  • 零依赖,仅 100 多行代码
  • 匹配成功时可以为后面的内容设置新的 needle
  • 所有回调的 data 都是安全的(不会被改变)

安装

npm i fast-streamsearch

快速开始

const { inspect } = require("util");
const FastStreamSearch = require("fast-streamsearch");

const needle = Buffer.from("\r\n");
const fss = new FastStreamSearch(needle, (isMatch, data) => {
  if (isMatch) {
    console.log("match!");
  } else {
    console.log("data: " + inspect(data.toString("latin1")));
  }
});

const chunks = ["foo", " bar", "\r", "\n", "baz, hello\r", "\n world.", "\r\n Node.JS rules!!\r\n\r\n"];
for (const chunk of chunks) {
  fss.push(Buffer.from(chunk));
}

// data: 'foo'
// data: ' bar'
// match!
// data: 'baz, hello'
// match!
// data: ' world.'
// match!
// data: ' Node.JS rules!!'
// match!
// match!

你甚至可以使用不同的 needle:

const { inspect } = require("util");
const FastStreamSearch = require("fast-streamsearch");

const foo = Buffer.from("foo");
const bar = Buffer.from("bar");

const fss = new FastStreamSearch(foo, (isMatch, data) => {
  if (isMatch) {
    console.log(`match ${fss.needle.toString("latin1")}!`);
    if (fss.needle === foo) {
      fss.setNeedle(bar);
    }
  } else {
    console.log("data: " + inspect(data.toString("latin1")));
  }
});

const chunks = ["ab", "foo", "cd", "foo", "ef", "bar"];
for (const chunk of chunks) {
  fss.push(Buffer.from(chunk));
}

// data: 'ab'
// match foo!
// data: 'cd'
// data: 'foo'
// data: 'ef'
// match bar!

API

构造函数

new FastStreamSearch(needle: Buffer, callback: function)

创建一个用于搜索 needle 的实例。当匹配成功时或有肯定不匹配的数据时,回调函数会执行,回调参数如下:

  • isMatch
    • 类型: boolean
    • 说明: 表示是否匹配到了 needle
  • data
    • 类型: Buffer
    • 说明: 不可能匹配上 needle 的数据。当 isMatchtrue 时,data 为空,反之,data 不为空

属性

  • needle
    • 类型: Buffer
    • 说明: 当前匹配过程正在使用的 needle

方法

  • push(chunk: Buffer): 添加新的待搜索 chunk
  • end(): 当流结束时调用,会回调出最后剩余的未匹配数据。
  • setNeedle(needle: Buffer): 更新搜索过程中使用的 needle,只有当匹配成功时可以调用,否则会导致不可预期的后果。
  • init(): 重置内部状态,如果你想搜索一个新的流时,可以使用,否则请不要调用。

streamsearch vs fast-streamsearch

以下结果为搜索一个 4GB 大小的流的耗时情况

为什么 fast-streamsearch 更快

fast-streamsearch 的整个搜索过程采用的是 boyer-moore-horspool 算法,但当待匹配内容可以直接使用 Buffer 原生的 indexOf 方法进行匹配时,会优先使用 indexOf 来进行匹配,并根据 indexOf 的结果推动整体的搜索指针前进。这么做的好处在于原生的 indexOf 通常比 horspool 算法有更快的速度,特别是 needle 较短时,速度优势非常明显,horspool 算法在 needle 较长时会比 indexOf 快一点,但优势不明显。所以 fast-streamsearch 通常会比只基于 horspool 算法的 streamsearch 更快,尤其是 needle 较短时。当 needle 长度为 1 时,甚至可以比 streamsearch 快 100 倍之多,needle 长度在 256 以下时,也有至少 2 倍的性能优势,大部分搜索场景,needle 都不会太长。所以 fast-streamsearch 通常是更好的选择,不过 streamsearch 也是一个很棒的工具。

附上一段 horspool 与 indexOf 速度对比的代码:

function horspool(haystack, needle) {
  const len = needle.length;
  const last = len - 1;
  const move = new Array(256).fill(len);
  for (let i = 0; i < last; i++) {
    move[needle[i]] = last - i;
  }

  let i = 0;
  let j = 0;
  while (i <= haystack.length - len) {
    const start = i;

    while (j < len && haystack[i] === needle[j]) {
      i++;
      j++;
    }

    if (j === len) {
      return start;
    }

    i = start + move[haystack[start + last]];
    j = 0;
  }
  return -1;
}

const haystack = Buffer.allocUnsafe(2 ** 30); // 1GB
for (let i = 0; i < haystack.length; i++) {
  haystack[i] = Math.floor(Math.random() * 128);
}

const needleLen = 1;
const needle = Buffer.allocUnsafe(needleLen);
for (let i = 0; i < needle.length; i++) {
  needle[i] = 128 + Math.floor(Math.random() * 128);
}

console.time("indexOf");
haystack.indexOf(needle);
console.timeEnd("indexOf");

console.time("horspool");
horspool(haystack, needle);
console.timeEnd("horspool");