npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

cvm-lib

v0.1.1

Published

Estimate the number of distinct values in a set using the simple and space-efficient CVM algorithm

Downloads

53

Readme

CVM Library

Estimate the number of distinct values in a set using the simple and space-efficient CVM algorithm.

Version JSR Maintenance License codecov npm bundle size

Getting Started

Install

NPM:

npm install cvm-lib

Yarn:

yarn add cvm-lib

JSR:

jsr add @rojas/cvm

Examples

See the examples/ directory for all examples.

Hamlet

Estimate unique words in Shakespeare's Hamlet:

node ./examples/hamlet/index.js
  • Total words: 31991
  • CVM capacity: 2161
  • Expected uniques: 4762 ± 10.00%
  • Estimated uniques: 4728 (-0.71%)

1M

Estimate unique integers among 1 million random integers.

node ./examples/hamlet/index.js
  • Total values: 1000000
  • CVM capacity: 10631
  • Expected uniques: 994384 ± 5.00%
  • Estimated uniques: 996480 (0.21%)

API

Functions

calculateCapacity(n, epsilon?, delta?)

Calculates the space required to estimate the number of distinct values in a set with a given accuracy and confidence.

  • n: The total number of values in the set, or an estimate if unknown. Must be a positive number.
  • epsilon (optional): An estimate's relative error. Controls accuracy. Must be between 0 and 1. Defaults to 0.05.
  • delta (optional): The probability an estimate is not accurate. Controls confidence. Must be between 0 and 1. Defaults to 0.01.

Classes

Estimator<T>

Estimates the number of distinct values in a set using the CVM algorithm.

  • Constructors

    • new (capacity): Create an instance with a given capacity. Must be a positive integer.
    • new (config): Create an instance using a config object.
  • Properties

    • capacity: Gets the maximum number of samples in memory.
    • randomFn: Gets or sets the random number generator function (e.g. Math.random).
    • sampleRate Gets the base sample rate (e.g. 0.5).
    • size: Gets the number of samples in memory.
  • Methods

    • add(value): Adds a value.
    • clear(): Clears/resets the instance.
    • estimate(): Gets the estimated number of distinct values.

Interfaces

EstimatorConfig<T>

A configuration object used to create Estimator instances.

  • capacity: The maximum number of samples in memory. Must be a positive integer.
  • randomFn (optional): The random number generator function. Should return random or pseudorandom values between 0 and 1.
  • sampleRate (optional): The sampling rate for managing samples. Must be between 0 and 1.
    • Note: Custom values may negatively affect accuracy. In general, the further from 0.5, the more it's affected. If capacity was calculated via calculateCapacity, expected accuracy / confidence may be invalidated.
  • storage (optional): An object that implements SampleSet for storing samples.

SampleSet<T>

Represents a generic set for storing samples.

  • size: The number of values in the set.
  • add(value): Adds a value to the set.
  • clear(): Clears all values from the set.
  • delete(value): Removes a specified value from the set.
  • [Symbol.iterator](): Iterates through the set's values.

Community and Support

Contributions are welcome!

  • Questions / Dicussions: Please contact us via GitHub discussions.

  • Bug Reports: Please use the GitHub issue tracker to report any bugs. Include a detailed description and any relevant code snippets or logs.

  • Feature Requests: Please submit feature requests as issues, clearly describing the feature and its potential benefits.

  • Pull Requests: Please ensure your code adheres to the existing style of the project and include any necessary tests and documentation.

For more information, check out the contributor's guide.

Build

  1. Clone the project from github
git clone [email protected]:havelessbemore/cvm-lib.git
cd cvm-lib
  1. Install dependencies
npm install
  1. Build the project
npm run build

This will output ECMAScript (.mjs) and CommonJS (.cjs) modules in the dist/ directory.

Format

To run the code linter:

npm run lint

To automatically fix linting issues, run:

npm run format

Test

To run tests:

npm test

To run tests with a coverage report:

npm run test:coverage

A coverage report is generated at ./coverage/index.html.

References

  1. Source paper: Chakraborty, S., Vinodchandran, N. V., & Meel, K. S. (2023). Distinct Elements in Streams: An Algorithm for the (Text) Book
  2. Notes by Donald Knuth: Knuth, D. E. (2023). The CVM Algorithm for Estimating Distinct Elements in Streams. Stanford Computer Science Department.
  3. Wikipedia: CVM Algorithm.
  4. High-level summary: Nadis, S. (2024, May 16). Computer Scientists Invent an Efficient New Way to Count. Quanta Magazine..