npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

sparse-array-rled

v2.0.1

Published

Sparse array with run-length encoded deletions

Downloads

1,488

Readme

sparse-array-rled

Sparse array with run-length encoded deletions

npm i --save sparse-array-rled

About

This package provides SparseArray<T>, a sparse array with values of type T.

SparseArray<T> behaves similarly to an ordinary Array<T> used in sparse mode. However, it is additionally optimized for the following tasks:

  1. Convert between the array and a compact JSON representation with run-length encoded deletions (SerializedSparseArray<T>). For example, the sparse array ["foo", "bar", , , , "X", "yy"] serializes to [["foo", "bar"], 3, ["X", "yy"]].
  2. Iterate over present values only.
  3. Convert between a count c and the c-th present entry.

For ordinary array tasks, SparseArray aims to have comparable memory usage and acceptable speed relative to an ordinary Array. However, indexed accesses are slower, due to internal searches.

For special cases, SparseString and SparseIndices implement the same functionality with additional optimizations:

  • SparseString is functionally identical to a SparseArray with single-char values, but it uses strings (e.g. "abc") instead of arrays (e.g. ["a", "b", "c"]) in its internal state and serialized form. This typically uses less memory (2x in our benchmarks) and results in smaller JSON, though with a slight cost in mutation speed. SparseString<E> can also contain embedded objects of type E (e.g., an image inline in a text document), each taking the place of one char.
  • SparseIndices is functionally identical to a SparseArray, except that it only stores which indices are present, not their associated values. This typically uses much less memory (4x in our benchmarks) and results in much smaller JSON.

Example Use Cases

I use this package in collaborative situations, where individual users perform actions in order, but some of these actions may be deleted/undone/not-yet-received - causing sparsity.

  1. Collaborative text/list editing: Group sequential insertions by a single user into "bunches", which map a bunch ID to its sequence of values. Later, some values may be deleted, making the sequence sparse.
    • This is how list-positions represents the state of a List<T>: as a Map<bunchID, SparseArray<T>>.
  2. General collaboration or peer-to-peer networking: Track which messages you've received from another user/peer using a SparseIndices. Typically, this will be a single number ("all messages 0 through n-1"), but dropped/reverted messages could make the indices sparse.
    • A Map<peerID, SparseIndices> generalizes vector clocks and provides a space-optimized alternative to dotted vector clocks, described here.

Usage

Create and mutate a sparse array:

import { SparseArray } from "sparse-array-rled";

const arr = SparseArray.new<string>();
arr.set(0, "a");
arr.set(1, "b");
arr.set(1, "c");
arr.set(4, "d");
arr.delete(0);

console.log([...arr.entries()]); // Prints [[1, 'c'], [4, 'd']]

Basic queries:

arr.get(1); // 'c'
arr.get(4); // 'd'
arr.get(0); // undefined
arr.get(10000); // undefined

arr.has(1); // true
arr.has(0); // false

// Length is the last present index + 1 (or 0 if empty).
console.log(arr.length); // Prints 5
// Note: All methods accept index arguments `>= this.length`, acting as if
// the array ends with infinitely many deleted indices (empty slots).

Queries that only consider present values:

const arr2 = SparseArray.new<string>();
arr2.set(0, "e");
arr2.set(1, "f");
arr2.set(5, "g");
arr2.set(6, "h");

// Total present values.
arr2.count(); // 4

// Present values up to but excluding a given index.
arr2.countAt(4); // 2
arr2.countAt(6); // 3

// Find the c-th present index, or -1 if c is too large.
arr2.indexOfCount(1); // 1
arr2.indexOfCount(2); // 5
arr2.indexOfCount(5); // -1
arr2.indexOfCount(1000); // -1

Bulk mutations are specifically optimized:

const arr3 = SparseArray.new<string>();

// Set multiple values (the rest parameters).
arr3.set(0, "m", "n", "o", "p", "q");
// Delete multiple values (the second arg, which says how many to delete -
// *not* the index to end at.).
arr3.delete(3, 2);

console.log([...arr3.entries()]); // Prints [[0, 'm'], [1, 'n'], [2, 'o']]

Mutations return the previous values as a SparseArray:

// arr3 starts as above: entries [[0, 'm'], [1, 'n'], [2, 'o']].
const previous = arr3.delete(1, 5);
console.log([...previous.entries()]); // Prints [[0, 'n'], [1, 'o']]
console.log(previous.length); // Prints 2 (last present index + 1) - not necessarily the delete count.

Serialized form

The serialized form, SerializedSparseArray<T>, uses run-length encoded deletions. Specifically, it is an array that alternates between:

  • arrays of present values, and
  • numbers, representing that number of deleted indices (empty slots).

For example:

const arr4 = SparseArray.new<string>();
arr4.set(0, "foo");
arr4.set(1, "bar");
arr4.set(5, "X");
arr4.set(6, "yy");

console.log(arr4.serialize()); // Prints [['foo', 'bar'], 3, ['X', 'yy']]

Deserialize with const arr3 = SparseArray.deserialize(serialized).

arr.toString() returns the JSON-encoded serialized form.

Iterators

entries() and keys() have the same signature as Array, but they do not visit empty slots (as seen in the code snippets above). items() is an optimized alternative that iterates over runs of present values.

newSlicer() is an additional iterator that lets you iterate through the array one "slice" at a time.

Iterators (entries, keys, newSlicer) are invalidated by concurrent mutations, unlike the built-in Array iterators (but like most Java Collections). We do not attempt to detect concurrent mutations and throw errors.

SparseString and SparseIndices

SparseString and SparseIndices have essentially the same API as SparseArray<T>, except that values: T[] is replaced by chars: string for SparseString and count: number for SparseIndices.

Internals

Internally, the state of a SparseArray<T> is stored as a singly-linked list of nodes, where each node represents either an array of present values or a number of deleted indices. The nodes are normalized so that they are never empty and adjacent nodes cannot be merged any further. In other words, a SparseArray<T>'s internal state is a singly-linked list representation of its serialized state. (Exception: The linked list may end with a deleted node, while the serialized state will not.)

To reduce repetition and code size, most functionality for the three exported classes (SparseArray, SparseString, SparseIndices) is inherited from a common superclass, SparseItems<I>. It is a template class that defines mutations and queries in terms of items of type I, which are the content of present nodes: T[] for SparseArray, string | E for SparseString<E>, number for SparseIndices.

Performance

To benchmark the library, I applied the operations corresponding to a collaborative text-editing trace (Martin Kleppmann's automerge-perf), simulating this library's usage by the list-positions library as described above. The trace uses 3301 sparse arrays with average final length 40.4 (max final length 7352). It is 260k ops long, with 182k sets and 77k deletes, and the ending state has 105k chars.

In addition to this library's classes, the benchmarks test two ways of using a plain Array<string> in sparse mode (see benchmarks/impls/plain_array.ts).

Results:

| Implementation | Total time (ms) | Ending memory usage (MB) | | -------------- | --------------- | ------------------------ | | SparseArray | 81.2 +- 9.0 | 1.91 | | SparseString | 80.9 +- 5.8 | 1.10 | | SparseIndices | 63.3 +- 7.5 | 0.47 | | PlainArray | 76.7 +- 10.5 | 2.02 | | PlainArray2 | 51.8 +- 6.9 | 1.90 |

For additional microbenchmarks, see benchmark_results.txt, which reports the time to perform 1,000,000 operations of various types (implemented in benchmarks/traces.ts).