@suptxt/quickset
v0.4.6
Published
A fast and performant *Least Frequently Used* (LFU) sorted set implementation for working with reasonably sized integers (unsigned). Trades memory for performance, optimised for frequently updating and counting a relatively small set of integers (integer
Downloads
29
Maintainers
Readme
QuickSet
A fast and performant Least Frequently Used (LFU) sorted set implementation for working with reasonably sized integers (unsigned). Trades memory for performance, optimised for frequently updating and counting a relatively small set of integers (integers ranging from 0 to 2^16) or extracting unique integers from a large pool of numbers in one go (integers ranging from 0 to 2^28). Sorted in natural ascending order of integers by default.
Use cases
- Unique integer extraction
- Duplicate integer counting
- Finding top-k most frequent items in one or many lists
- Nearest neighbour finding based on frequency of occurrence
- Sorting random integers for binary operations
- A lightweight key/value dictionary
Documentation
See the full API documentation for in-depth settings and examples.
See the ObservableHQ collection for hands-on examples.
How it works
Once initialised, QuickSet
allocates a TypedArray based on the expected range of integers (numbers between 0 and 2^28) and frequency of occurrence (counts between 0 and 2^32).
Additionally, it keeps track of how often individual integers are added to the set, providing a top-k window of most frequent integers.
Two modes are provided for establishing top-k ranks, minsum
and winsum
.
Both eject the least frequent integer from their ranks upon integer insertion, yielding a ranked 'window' that guarantees the k-most occurring elements of the set to 'bubble up' (ejecting the Least Frequently Used or LFU).
Whereas minsum
overwrites the least frequent integer (i.e. random access), winsum
keeps integers sorted in decreasing order of occurrence (slightly more computationally expensive).
This makes QuickSet
a fast alternative to counting and sorting all elements in a given set, preventing costly sorting operations and returning a ranked window of most frequent integers up till a point of your choosing.
This enables working with frequently occurring items 'earlier' compared to processing and sorting the input data in full, especially if the underlying integers follow a non-uniform distribution.
Quickstart
npm install @suptxt/quickset
After installing, create a QuickSet
by calling:
let set = new QuickSet()
This instantiates a new set with default parameters and a top-k window of 0-length, which may need additional configuring to suit your needs. As a rule of thumb:
- If you are interested in using unweighted set operations only, use
add
orput
for single andunique
for bulk insertions. - If you want to assign weights to integers, use
sum
for single andbatch
for bulk insertions. Updates to the top-k window are only made when theslot
parameter is set.
Methods can be mixed and matched to your liking, but may yield unwanted results if used without caution:
add
,put
andunique
overwrite previous values and do not update the top-k window on integer insertionsum
andbatch
maintain previous values and do update the top-k window on integer insertion
See the tombstoning example for why this is useful.
Tips
- Reuse a single instance
- Randomly switch between modes
- Use multiple QuickSets with a small integer span
- Maintain a
new Map()
for reverse value lookups - Set
freq
to a value higher than 1 for top-k window speed-ups - Subtract the minimum and adjust the
span
parameter to the new maximum expected integer to save on memory when working with a set of large integers - Use multiple QuickSets with custom offsets to increase the maximum integer range
Caveats
- Decreased performance on large sets (>2^24 uniformly distributed integers)
- Only limited top-k slots available (<64)
- No set size parameter yet
- No type checking of unsigned integers
- Reverse iteration not yet implemented
- Backing array resizing not yet implemented
Benchmarks
On a Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz-1.99 GHz with 16GB RAM, the average time to extract unique keys from 5 runs of uniformly distributed random integers:
| Random keys | instance | ms | factor | | -: | :- | -: | :- | | 2^28 | native | size exceeded | - | | = 268 435 456 | QuickSet | 4592 | - | | 2^24 | native | 6095 | - | | = 16 777 216 | QuickSet| 212 | 28x | | 2^16 | native | 4.4 | - | | = 65 536 |QuickSet | 1.3 | 3x |