sparse-array-rled
v2.0.1
Published
Sparse array with run-length encoded deletions
Downloads
1,488
Maintainers
Readme
sparse-array-rled
Sparse array with run-length encoded deletions
npm i --save sparse-array-rled
About
This package provides SparseArray<T>
, a sparse array with values of type T
.
SparseArray<T>
behaves similarly to an ordinary Array<T>
used in sparse mode.
However, it is additionally optimized for the following tasks:
- Convert between the array and a compact JSON representation
with run-length encoded deletions (
SerializedSparseArray<T>
). For example, the sparse array["foo", "bar", , , , "X", "yy"]
serializes to[["foo", "bar"], 3, ["X", "yy"]]
. - Iterate over present values only.
- Convert between a count
c
and thec
-th present entry.
For ordinary array tasks, SparseArray
aims to have comparable
memory usage and acceptable speed relative to an ordinary Array
. However, indexed accesses are slower, due to internal searches.
For special cases, SparseString
and SparseIndices
implement the same functionality with additional optimizations:
SparseString
is functionally identical to aSparseArray
with single-char values, but it uses strings (e.g."abc"
) instead of arrays (e.g.["a", "b", "c"]
) in its internal state and serialized form. This typically uses less memory (2x in our benchmarks) and results in smaller JSON, though with a slight cost in mutation speed.SparseString<E>
can also contain embedded objects of typeE
(e.g., an image inline in a text document), each taking the place of one char.SparseIndices
is functionally identical to aSparseArray
, except that it only stores which indices are present, not their associated values. This typically uses much less memory (4x in our benchmarks) and results in much smaller JSON.
Example Use Cases
I use this package in collaborative situations, where individual users perform actions in order, but some of these actions may be deleted/undone/not-yet-received - causing sparsity.
- Collaborative text/list editing: Group sequential insertions by a single user into "bunches", which map a bunch ID to its sequence of values. Later, some values may be deleted, making the sequence sparse.
- This is how list-positions represents the state of a
List<T>
: as aMap<bunchID, SparseArray<T>>
.
- This is how list-positions represents the state of a
- General collaboration or peer-to-peer networking: Track which messages you've received from another user/peer using a
SparseIndices
. Typically, this will be a single number ("all messages 0 through n-1"), but dropped/reverted messages could make the indices sparse.- A
Map<peerID, SparseIndices>
generalizes vector clocks and provides a space-optimized alternative to dotted vector clocks, described here.
- A
Usage
Create and mutate a sparse array:
import { SparseArray } from "sparse-array-rled";
const arr = SparseArray.new<string>();
arr.set(0, "a");
arr.set(1, "b");
arr.set(1, "c");
arr.set(4, "d");
arr.delete(0);
console.log([...arr.entries()]); // Prints [[1, 'c'], [4, 'd']]
Basic queries:
arr.get(1); // 'c'
arr.get(4); // 'd'
arr.get(0); // undefined
arr.get(10000); // undefined
arr.has(1); // true
arr.has(0); // false
// Length is the last present index + 1 (or 0 if empty).
console.log(arr.length); // Prints 5
// Note: All methods accept index arguments `>= this.length`, acting as if
// the array ends with infinitely many deleted indices (empty slots).
Queries that only consider present values:
const arr2 = SparseArray.new<string>();
arr2.set(0, "e");
arr2.set(1, "f");
arr2.set(5, "g");
arr2.set(6, "h");
// Total present values.
arr2.count(); // 4
// Present values up to but excluding a given index.
arr2.countAt(4); // 2
arr2.countAt(6); // 3
// Find the c-th present index, or -1 if c is too large.
arr2.indexOfCount(1); // 1
arr2.indexOfCount(2); // 5
arr2.indexOfCount(5); // -1
arr2.indexOfCount(1000); // -1
Bulk mutations are specifically optimized:
const arr3 = SparseArray.new<string>();
// Set multiple values (the rest parameters).
arr3.set(0, "m", "n", "o", "p", "q");
// Delete multiple values (the second arg, which says how many to delete -
// *not* the index to end at.).
arr3.delete(3, 2);
console.log([...arr3.entries()]); // Prints [[0, 'm'], [1, 'n'], [2, 'o']]
Mutations return the previous values as a SparseArray
:
// arr3 starts as above: entries [[0, 'm'], [1, 'n'], [2, 'o']].
const previous = arr3.delete(1, 5);
console.log([...previous.entries()]); // Prints [[0, 'n'], [1, 'o']]
console.log(previous.length); // Prints 2 (last present index + 1) - not necessarily the delete count.
Serialized form
The serialized form, SerializedSparseArray<T>
, uses run-length encoded deletions. Specifically, it is an array that alternates between:
- arrays of present values, and
- numbers, representing that number of deleted indices (empty slots).
For example:
const arr4 = SparseArray.new<string>();
arr4.set(0, "foo");
arr4.set(1, "bar");
arr4.set(5, "X");
arr4.set(6, "yy");
console.log(arr4.serialize()); // Prints [['foo', 'bar'], 3, ['X', 'yy']]
Deserialize with const arr3 = SparseArray.deserialize(serialized)
.
arr.toString()
returns the JSON-encoded serialized form.
Iterators
entries()
and keys()
have the same signature as Array
, but they do not visit empty slots (as seen in the code snippets above). items()
is an optimized alternative that iterates over runs of present values.
newSlicer()
is an additional iterator that lets you iterate through the array one "slice" at a time.
Iterators (entries
, keys
, newSlicer
) are invalidated by concurrent mutations, unlike the built-in Array
iterators (but like most Java Collections). We do not attempt to detect concurrent mutations and throw errors.
SparseString and SparseIndices
SparseString
and SparseIndices
have essentially the same API as SparseArray<T>
, except that values: T[]
is replaced by chars: string
for SparseString
and count: number
for SparseIndices
.
Internals
Internally, the state of a SparseArray<T>
is stored as a singly-linked list of nodes, where each node represents either an array of present values or a number of deleted indices. The nodes are normalized so that they are never empty and adjacent nodes cannot be merged any further. In other words, a SparseArray<T>
's internal state is a singly-linked list representation of its serialized state. (Exception: The linked list may end with a deleted node, while the serialized state will not.)
To reduce repetition and code size, most functionality for the three exported classes (SparseArray
, SparseString
, SparseIndices
) is inherited from a common superclass, SparseItems<I>
. It is a template class that defines mutations and queries in terms of items of type I
, which are the content of present nodes: T[]
for SparseArray
, string | E
for SparseString<E>
, number
for SparseIndices
.
Performance
To benchmark the library, I applied the operations corresponding to a collaborative text-editing trace (Martin Kleppmann's automerge-perf), simulating this library's usage by the list-positions library as described above. The trace uses 3301 sparse arrays with average final length 40.4 (max final length 7352). It is 260k ops long, with 182k sets and 77k deletes, and the ending state has 105k chars.
In addition to this library's classes, the benchmarks test two ways of using a plain Array<string>
in sparse mode (see benchmarks/impls/plain_array.ts).
Results:
| Implementation | Total time (ms) | Ending memory usage (MB) | | -------------- | --------------- | ------------------------ | | SparseArray | 81.2 +- 9.0 | 1.91 | | SparseString | 80.9 +- 5.8 | 1.10 | | SparseIndices | 63.3 +- 7.5 | 0.47 | | PlainArray | 76.7 +- 10.5 | 2.02 | | PlainArray2 | 51.8 +- 6.9 | 1.90 |
For additional microbenchmarks, see benchmark_results.txt, which reports the time to perform 1,000,000 operations of various types (implemented in benchmarks/traces.ts).