npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

nesity-statistics

v4.3.2

Published

This package provides a set of functions for statistical analysis.

Downloads

124,104

Readme

nesity-statistics

This package provides a set of functions for statistical analysis.

It focuses on comparing datasets acquired from benchmarks.

Installation

yarn install nesity-statistics

Functionality

Comparing two datasets

The compare function can be used to assess whether two datasets differ from each other, and if so, what's the effect size. It uses the t-test to assess whether the two datasets differ from each other, and Cohen's d to assess the effect size.

Special features:

  • The function has built-in denoising mechanisms that can improve the accuracy of results if the sample size is small.
  • It will also work in case the datasets have multiple modalities, by matching the modalities and performing separate comparisons on them, then combining the results into a single result, weighted by the number of samples present.

For best results ensure that:

  • Each modality is either normally distributed, or uniform.
  • The datasets have a similar number of samples.
import { compare } from 'nesity-statistics'

const result = compare({
  data1: [
    /*...*/
  ],
  data2: [
    /*...*/
  ],
  confidenceLevel: 0.95, // use alpha = 0.05 for the t-test
  minimalModalitySize: 4, // reject all modalities with less than 4 samples
  denoisingAndModalitySplittingOptions: {
    // see section on splitting modalities for the differences between 'quantile' and 'kde' splitting
    type: 'quantiles',
  },
})

Splitting a dataset's modalities

Two functions are provided to split a dataset's modalities:

  • splitMultimodalDistributionWithQuantiles - a very fast function that uses large jump in upper quantiles to find the splits in the dataset's modalities.
  • splitMultimodalDistributionWithKDE - a computationally intensive function that uses Kernel Density Estimation to find the splits in the dataset's modalities.

The quality of the results is similar in most cases, but the splitMultimodalDistributionWithQuantiles function is much faster. In certain cases using KDE will result in better results, but it's not recommended to use it unless you have a good reason to do so.

Matching modalities

Function matchModalities can be used to match the modalities of two datasets by their means.

For example, given the following datasets:

const data1 = [
  [1, 2, 3],
  [4, 5, 6],
]
const data2 = [
  [2, 3, 4],
  [5, 6, 7],
  [8, 9, 10],
]
const expectedResult = [
  // modality 1:
  [
    // data from 1
    [1, 2, 3],
    // data from 2
    [2, 3, 4],
  ],
  // modality 2:
  [
    [4, 5, 6],
    [5, 6, 7],
  ],
  // modality 3:
  [undefined, [8, 9, 10]],
]

The function will prioritize modalities with similar number of samples, and will try to match them by their means. You may adjust how the modalities are prioritized using the following parameters:

  • meanDistanceWeight - the weight of the mean distance in the comparison
  • sizeRatioWeight - the weight of the sample size ratio in the comparison
  • prioritizeSizeRatioAbove - prioritize modalities with a sample size ratio above this value

Optimize

The optimize function is a generic utility function to find the "best" input parameter/output result combination. It works by computing results, and then comparing them with each other result. It can be used to optimize an objective function by repeatedly evaluating it with different input parameters, and comparing the results of these evaluations to determine the best result and input parameter combination. The optimization is performed in a series of iterations, with each iteration producing a set of results. Each iteration result is compared with each other iteration result, and the results are sorted by their comparison rank.

It returns an array of optimization results sorted by their comparison rank. Each result is an array with the iteration argument, the iteration result, the comparison metadata, the comparison rank and a done flag.

One example usage of the optimize function could be to find the parameters to pass to a noise reduction function, which result in the best noise reduction in a 2-sample dataset.

Let's say we want to find the parameters that result in the lowest standard deviation:

import { optimize } from 'nesity-statistics'

const data = [
  [
    /* set 1: ... */
  ],
  [
    /* set 2: ... */
  ],
]
const iterate = (parameters): number[] => {
  return reduceNoise(data, parameters)
}

const getNextIterationArgument = (iteration: number) => {
  return {
    x: iteration * 0.1,
  }
}

const compare = (
  [a1, a2]: number[],
  [b1, b2]: number[],
): [number, { diff: number }] | INVALID_LEFT | INVALID_RIGHT => {
  // we should have at least 2 values for the result to be considered valid:
  if (a1.length < 2 || a2.length < 2) {
    return INVALID_LEFT
  }
  if (b1.length < 2 || b2.length < 2) {
    return INVALID_RIGHT
  }
  const diffA = stdev(a1) - stdev(a2)
  const diffB = stdev(b1) - stdev(b2)
  const comparisonDiff = Math.abs(diffA) - Math.abs(diffB)
  // the 2nd argument of the return value is the comparison metadata
  // if the comparison is expensive,
  // it could be used to keep a reference to any data about the "winning" result,
  // used to perform the comparison
  return [comparisonDiff, comparisonDiff <= 0 ? diffA : diffB]
}

const sortedResults = optimize({
  iterate,
  iterations: 100,
  getNextIterationArgument,
  compare,
})

const bestResult = sortedResults[0]
const [bestParameters, bestResult, bestMeta] = bestResult