@suptxt/quickset

v0.4.6

Published

a year ago

A fast and performant *Least Frequently Used* (LFU) sorted set implementation for working with reasonably sized integers (unsigned). Trades memory for performance, optimised for frequently updating and counting a relatively small set of integers (integer

Downloads

QuickSet

A fast and performant Least Frequently Used (LFU) sorted set implementation for working with reasonably sized integers (unsigned). Trades memory for performance, optimised for frequently updating and counting a relatively small set of integers (integers ranging from 0 to 2^16) or extracting unique integers from a large pool of numbers in one go (integers ranging from 0 to 2^28). Sorted in natural ascending order of integers by default.

Use cases

Unique integer extraction
Duplicate integer counting
Finding top-k most frequent items in one or many lists
Nearest neighbour finding based on frequency of occurrence
Sorting random integers for binary operations
A lightweight key/value dictionary

Documentation

See the full API documentation for in-depth settings and examples.

See the ObservableHQ collection for hands-on examples.

How it works

Once initialised, QuickSet allocates a TypedArray based on the expected range of integers (numbers between 0 and 2^28) and frequency of occurrence (counts between 0 and 2^32). Additionally, it keeps track of how often individual integers are added to the set, providing a top-k window of most frequent integers.

Two modes are provided for establishing top-k ranks, minsum and winsum. Both eject the least frequent integer from their ranks upon integer insertion, yielding a ranked 'window' that guarantees the k-most occurring elements of the set to 'bubble up' (ejecting the Least Frequently Used or LFU). Whereas minsum overwrites the least frequent integer (i.e. random access), winsum keeps integers sorted in decreasing order of occurrence (slightly more computationally expensive).

This makes QuickSet a fast alternative to counting and sorting all elements in a given set, preventing costly sorting operations and returning a ranked window of most frequent integers up till a point of your choosing. This enables working with frequently occurring items 'earlier' compared to processing and sorting the input data in full, especially if the underlying integers follow a non-uniform distribution.

Quickstart

npm install @suptxt/quickset

After installing, create a QuickSet by calling:


let set = new QuickSet()

This instantiates a new set with default parameters and a top-k window of 0-length, which may need additional configuring to suit your needs. As a rule of thumb:

If you are interested in using unweighted set operations only, use add or put for single and unique for bulk insertions.
If you want to assign weights to integers, use sum for single and batch for bulk insertions. Updates to the top-k window are only made when the slot parameter is set.

Methods can be mixed and matched to your liking, but may yield unwanted results if used without caution:

add , put and unique overwrite previous values and do not update the top-k window on integer insertion
sum and batch maintain previous values and do update the top-k window on integer insertion

See the tombstoning example for why this is useful.

Tips

Reuse a single instance
Randomly switch between modes
Use multiple QuickSets with a small integer span
Maintain a new Map() for reverse value lookups
Set freq to a value higher than 1 for top-k window speed-ups
Subtract the minimum and adjust the span parameter to the new maximum expected integer to save on memory when working with a set of large integers
Use multiple QuickSets with custom offsets to increase the maximum integer range

Caveats

Decreased performance on large sets (>2^24 uniformly distributed integers)
Only limited top-k slots available (<64)
No set size parameter yet
No type checking of unsigned integers
Reverse iteration not yet implemented
Backing array resizing not yet implemented

Benchmarks

On a Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz-1.99 GHz with 16GB RAM, the average time to extract unique keys from 5 runs of uniformly distributed random integers:

| Random keys | instance | ms | factor | | -: | :- | -: | :- | | 2^28 | native | size exceeded | - | | = 268 435 456 | QuickSet | 4592 | - | | 2^24 | native | 6095 | - | | = 16 777 216 | QuickSet| 212 | 28x | | 2^16 | native | 4.4 | - | | = 65 536 |QuickSet | 1.3 | 3x |

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme