npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

tiny-toatie-cache

v1.4.1

Published

Prototype cache system for sliding window lookup operations

Downloads

21

Readme

Tiny Toatie Cache

This library is designed to cache search operations performed against a Node.js Buffer. The maximum size of the underlying storage Buffer is bounded (the upper bound is specified on instantiation). New data is appended to the front and if the upper size limit is hit, the oldest items will be deleted from the back.

The search operation is concerned with finding out if a key exists within a Buffer and if it does exist, what its offset is. It's primarily designed for algorithms utilizing a sliding window.

Use case

You interact with this library in the same way you interact with an associative array (such as a hash map). The main feature Tiny Toatie Cache offers is a reasonably strict boundary on the amount of memory it can use. The first time you perform a lookup for a specific key, it calls Buffer.indexOf(key), but for each subsequent call, the internal cache will short circuit the need for a repeat call to indexOf. The offsets are tracked relative to an internal pointer which means the cache entries still have enough information to tell you a keys' offset within the current internal Buffer (rather than what it was when it was originally cached). Expired cache entries are never returned because it can determine the validity of a cache entry before returning it with negligible overhead.

This is made for an experimental implementation of LZ77, a streaming compression algorithm. It's often called a 'Streaming' because each compression packet can be decompressed in sequential order and the next chunk of clear text data will be produced. Some compression algorithms must decompress the entire stream of compression packets before they can get at any of the clear text data. The LZ77 compression algorithm also doesn't need to be distributed with a static dictionary, it performs search operations against a sliding window which fulfils an equivalent role of a dictionary.

I wrote a prototype of LZ77 and after finally creating something reliable, I found the performance, both in terms of speed and compression, to be quite horrific. The main problem is that the dictionary buffer needs to be in the region of 2 Megabytes (2048000 Bytes) for good space savings, but that means almost 2048000 comparisons every time it has to find the next RLE. In the worst case scenario, LZ77 could have to perform a full search operation across the entire history buffer for every single byte in its input stream. Without any sort of optimization, my prototype is O(n^2). The dictionary size the lz77 prototype uses is more modest and it performs terribly.

I originally spent time looking for a data structure to represent the history buffer. My naive stab at solution would be some sort of tree with an insertion performance of O(1) and a lookup performance of O(log(N)) (O(N log(N)) in context). The big problem with a tree is the sheer size of the input streams it needs to handle (have a think about how big a file of 2048000 bytes (or 2MB) actually is if you process it a byte at a time) which will almost always lead you to hitting the dictionary size. You cannot exceed the dictionary size without both data integrity (the decompression process needs to use the same value as the compressor) and memory space issues. With a tree like structure, you'd either need something with a very complex purging mechanism or you have to rebuild the tree constantly.

Caching will hopefully prove simple and fast with low housekeeping overhead, but it's ultimately just a research project.

Technical design

The library uses cloakroom-smart-buffer-proxy (created specifically for this library) behind the scenes as a sort of book keeper.

Building

/tiny-toatie-cache <master> % yarn install
/tiny-toatie-cache <master> % yarn build

Unit tests

The unit tests use Jest and the Yarn command below runs them.

tiny-toatie-cache ‹master*› % yarn test                                                                                                    1 ↵
yarn run v1.13.0
$ jest .spec.js --coverage
 PASS  __tests__/search/search.spec.js
 PASS  __tests__/cache/lookup-handlers/cache-lookup-handler.spec.js
 PASS  __tests__/cache/lookup-handlers/cache-short-circuit-check.spec.js
 PASS  __tests__/cache/lookup-handlers/cold-lookup-handler.spec.js
 PASS  __tests__/cache/lookup-dispatcher.spec.js
 PASS  __tests__/cache-store/store.spec.js
 PASS  __tests__/cache/cache.spec.js
 PASS  __tests__/public-factory/index.spec.js
-------------------------------|----------|----------|----------|----------|-------------------|
File                           |  % Stmts | % Branch |  % Funcs |  % Lines | Uncovered Line #s |
-------------------------------|----------|----------|----------|----------|-------------------|
All files                      |      100 |      100 |      100 |      100 |                   |
 cache                         |      100 |      100 |      100 |      100 |                   |
  index.js                     |      100 |      100 |      100 |      100 |                   |
  lookup-dispatcher.js         |      100 |      100 |      100 |      100 |                   |
 cache-store                   |      100 |      100 |      100 |      100 |                   |
  index.js                     |      100 |      100 |      100 |      100 |                   |
 cache/lookup-handlers         |      100 |      100 |      100 |      100 |                   |
  cache-lookup-handler.js      |      100 |      100 |      100 |      100 |                   |
  cache-short-circuit-check.js |      100 |      100 |      100 |      100 |                   |
  cold-lookup-handler.js       |      100 |      100 |      100 |      100 |                   |
  handler-response-enum.js     |        0 |        0 |        0 |        0 |                   |
  result-source-enum.js        |        0 |        0 |        0 |        0 |                   |
 public-factory                |      100 |      100 |      100 |      100 |                   |
  index.js                     |      100 |      100 |      100 |      100 |                   |
 search                        |      100 |      100 |      100 |      100 |                   |
  index.js                     |      100 |      100 |      100 |      100 |                   |
-------------------------------|----------|----------|----------|----------|-------------------|

Test Suites: 8 passed, 8 total
Tests:       64 passed, 64 total
Snapshots:   0 total
Time:        3.455s
Ran all test suites matching /.spec.js/i.
✨  Done in 4.49s.

Adding to your project

First, add the dependency to your project like so:

/your-rad-project ‹master*› % yarn add tiny-toatie-cache

And then you just use the library like the example below (use import syntax if you have Babel configured for it):

const ToatieCache = require('tiny-toatie-cache');

const cacheStorageSize = 1234456;
const cache = ToatieCache.build(cacheStorageSize);

cache.append(Buffer.from([0x66, 0x84, 0x33, 0x56, 0x11, 0x34, 0x89, 0xff]));
for (let i = 0; i < 64000; ++i) {
  cache.append(Buffer.from([0xff]));
}

cache.on('hit', timeTook => {
  console.log(`Cache hit! Operation time: ${timeTook}ms`);
});

cache.on('miss', timeTook => {
  console.log(`Cache miss! Operation time: ${timeTook}ms`);
});

/** First call, a cold lookup.
 *
 *  response: `{ offset: 64005, value: <Buffer 33 56>, length: 2 }`
 **/
console.log(cache.find(Buffer.from([0x33, 0x56])));

/** Second call, a cache hit
 *
 *  response: `{ offset: 64005, value: <Buffer 33 56>, length: 2 }`
 **/
console.log(cache.find(Buffer.from([0x33, 0x56])));

/** Third call, a cache hit
 *
 *  response: `{ offset: 64005, value: <Buffer 33 56>, length: 2 }`
 **/
console.log(cache.find(Buffer.from([0x33, 0x56])));

for (let i = 0; i < 16; ++i) {
  cache.append(Buffer.from([0xff]));
}

/** Fourth call, after appending data, cache hit
 *
 *  response: `{ offset: 64021, value: <Buffer 33 56>, length: 2 }`
 **/
console.log(cache.find(Buffer.from([0x33, 0x56])));

And its output:

/your-rad-project ‹master*› % node sample.js
Cache miss! Operation time: 20ms
Cache hit! Operation time: 1ms
Cache hit! Operation time: 0ms

Performance

driving-range-test.js tests scenarios by generating a set (the exact number is controlled by input_key_count) of random words, appends them to the cache and then performs a set of search queries against the cache. number_of_search_attempts controls the number of queries it makes and only_dictionary_search_queries controls whether or not it it uses words it knows are in the dictionary, or if it uses random words.

It performs one test with the cache off as a control, then it performs another with the cache enabled. Both experiments are timed and the all of the setup displayed.

cache-on-vs-cache-off-no-negative-queries.js

tiny-toatie-cache ‹master*› % node performance/driving-range/cache-on-vs-cache-off-no-negative-queries.js

Test label: Control experiment (cache disabled)
Cache hits: 0, Cache misses: 1000000
Experiment configuration:
{ input_key_count: 2560,
  dictionary_size: 2560000,
  words_per_key: 2,
  only_dictionary_search_queries: true,
  number_of_search_attempts: 1000000,
  cache_bypass: true }
Experiment time taken: 9063.742370999998


Test label: Cache enabled experiment
Cache hits: 0, Cache misses: 1000000
Experiment configuration:
{ input_key_count: 2560,
  dictionary_size: 2560000,
  words_per_key: 2,
  only_dictionary_search_queries: true,
  number_of_search_attempts: 1000000,
  cache_bypass: false }
Experiment time taken: 1330.6336329999995

cache-on-vs-cache-off-randomised-queries.js

tiny-toatie-cache ‹master*› % node performance/driving-range/cache-on-vs-cache-off-randomised-queries.js

Test label: Control experiment (cache disabled)
Cache hits: 0, Cache misses: 1000000
Experiment configuration:
{ input_key_count: 2560,
  dictionary_size: 2560000,
  words_per_key: 2,
  only_dictionary_search_queries: false,
  number_of_search_attempts: 1000000,
  cache_bypass: true }
Experiment time taken: 17834.606991


Test label: Cache enabled experiment
Cache hits: 0, Cache misses: 3545
Experiment configuration:
{ input_key_count: 2560,
  dictionary_size: 2560000,
  words_per_key: 2,
  only_dictionary_search_queries: false,
  number_of_search_attempts: 1000000,
  cache_bypass: false }
Experiment time taken: 17229.577994