hyperloglog32
v1.0.0
Published
HyperLogLog using a 32-bit murmurhash3 for node and browser
Downloads
358
Maintainers
Readme
hyperloglog32
HyperLogLog distinct value estimator for node and the browser using a 32-bit murmurhash3. Fork of hyperloglog (MIT © Optimizely, Inc). From Wikipedia: HyperLogLog is an algorithm for the count-distinct problem, approximating the number of distinct elements in a multiset (the cardinality).
Jump to: api / install / license
example
Insert two distinct values into an HLL structure with 12 bit indices. Hashing is done for you:
var HyperLogLog = require('hyperloglog32')
var h = HyperLogLog(12)
h.add('value 1')
h.add('value 2')
h.add('value 1')
h.count() === 2;
api
h = HyperLogLog(n)
Construct an HLL data structure with n
bit indices. This implies that there will be 2^n
buckets (and required octets). Typical values for n
are around 12, which would use 4096 buckets and yield less than 1.625% relative error. Higher values use more memory but provide greater precision. Here's a nice table.
h.add(string)
Add a value.
h.count()
Get the current estimate of the number of distinct values.
h.state()
Get the internal HLL state as a Buffer
.
h.merge(h2 || Buffer)
Merge another HLL's state into this HLL. If the incoming data has fewer buckets than this HLL, this one will be folded down to be the same size as the incoming data, with a corresponding loss of precision. If the incoming data has more buckets, it will be folded down as it is merged. The result is that this HLL will be updated as though it had processed all values that were previously processed by either HLL.
h1.add('value 1')
h1.add('value 2')
h2.add('value 2')
h2.add('value 3')
h1.merge(h2)
h1.count() === 3;
h.error()
Estimate the relative error for this HLL.
install
With npm do:
npm i hyperloglog32
and browserify for the browser.