npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

stat-methods

v0.2.0

Published

A library which provides functions for calculating mathematical statistics of numeric data

Downloads

6

Readme

stat-methods

Build Status tested with jest Codacy Badge

NPM

Getting Started

A library which provides methods for calculating mathematical statistics of numeric data. The library is heavily inspired by The Python Standard Library statistics module.

Installation

npm i stat-methods

Documentation

Table of contents

  1. Averages and measures of central location

  2. Measures of spread

  3. Descriptive statistics

  4. Measures of similarity

  5. Regressions

Averages and measures of central location

These methods compute an average or typical value from a population or sample.

| Method | Description | | ------------------------------- | ------------------------------------------------ | | mean | Arithmetic mean ('average') | | harmonicMean | Harmonic mean ('subcontrary mean') | | geometricMean | Geometric mean | | median | Median (middle value) | | medianLow | Low median | | medianHigh | High median | | medianGrouped | Median of grouped data | | quartiles | Quartiles (4-quantile) | | midRange | Average of minimum and maximum | | mode | Modes (most common data points) of discrete data | | rms | Root Mean Square | | percentile | Percentile | | kurtosis | Kurtosis |

Note: The methods do not require the data given to them to be sorted.

mean

mean(arr);

Returns the sample arithmetic mean of a numeric data array arr.

The arithmetic mean is the sum of values of the data points divided by the number of data points.

mean([-1.0, 2.5, 3.25, 5.75]); // -> 2.625

If the data array is empty or contains a non finite Number, the method returns undefined.

mean(['a', 2.5, 'b', 5.75]); // -> undefined
mean([NaN, 2.5, 3, 5.75]); // -> undefined
mean([]); // -> undefined
mean(3); // -> undefined

harmonicMean

harmonicMean(arr);

Return the harmonic mean of a numeric data array arr.

The harmonic mean is the reciprocal of the arithmetic mean of the reciprocals of the data. It is the number of data points divided by the sum of the reciprocals of the data points. For example, the harmonic mean of three values a, b and c will be equivalent to 3/(1/a + 1/b + 1/c).

harmonicMean([2.5, 3, 10]) * 10; // -> 36

The harmonicMean is typically appropriate compared with the arithmetic mean when evaluating the average of rates or ratios (for example speeds or densities).

If the data array contains elements with value 0, the method returns undefined.

harmonicMean([2.5, 3, 0]); // -> undefined

If the data array is empty or contains a non finite Number, the method returns undefined.

geometricMean

geometricMean(arr);

Return the geometric mean of a numeric data array arr.

The geometric mean is the nth root of the product of the n data points (n is the number of data points) For example, the geometric mean of three values a, b and c will be equivalent to (a*b*c) ^ (1/3).

geometricMean([4, 1, 1 / 32]); // -> 0.5

The geometric mean indicates the central tendency or typical value of a set of numbers and is often used when comparing different items — finding a single "figure of merit" for these items — when each item has multiple properties that have different numeric ranges.

If the data array contains an even total number of elements and an odd number of negative elements, the method returns undefined.

geometricMean([1, -2, 3, 4]); // -> undefined

If the data array is empty or contains a non finite Number, the method returns undefined.

median

median(arr);

Return the median (middle value) of a numeric data array arr.

The median is the value separating the higher half from the lower half of a data sample. The median method uses the “mean of middle two” method:

  • If there is an odd number of numbers, the median is the middle one.
median([1, 2, 3, 4, 5]); // -> 3
  • If there is an even number of observations, then there is no single middle value; the median is then defined as the mean of the two middle values.
median([1, 2, 3, 4, 5, 6]); // -> 3.5

If the data array is empty or contains a non finite Number, the method returns undefined. In case the data array is non numeric but supports order operations, the medianLow and medianHigh methods are recommended.

medianLow

medianLow(arr[, compareFunction]);

Return the low median of a data array arr. An optional compareFunction parameter can be provided for non numerica data arrays.

The low median is always a member of the data set. The medianLow method accepts both numeric and non numeric data arrays.

  • When the number of observations is odd, the middle value is returned.
medianLow([1, 2, 3, 4, 5]); // -> 3
  • When the number of observations is even, the smaller of the two middle values is returned.
medianLow([1, 2, 3, 4, 5, 6]); // -> 3

The median low can be computed with non numeric data arrays, provided they can be sorted and a compare function similar to the compare function required by the standard javascript Array.prototype.sort() method is provided.

function compareFunction(elt1, elt2) {
  return elt1.charCodeAt(0) - elt2.charCodeAt(0);
}
medianLow(['a', 'c', 'b', 'd'], compareFunction); // -> 'b'

By default, the compare function orders the data array in ascending order, in the numerical sense. Using arbitrary values for the compare function might result in invalid results.

If the data array is empty, the method returns undefined.

medianHigh

medianHigh(arr[, compareFunction]);

Return the high median of a data array arr. An optional compareFunction parameter can be provided for non numerica data arrays.

The high median is always a member of the data set. The medianHigh method accepts both numeric and non numeric data arrays.

  • When the number of observations is odd, the middle value is returned.
medianHigh([1, 2, 3, 4, 5]); // -> 3
  • When the number of observations is even, the larger of the two middle values is returned.
medianHigh([1, 2, 3, 4, 5, 6]); // -> 4

The median high can be computed with non numeric data arrays, provided they can be sorted and a compare function similar to the compare function required by the standard javascript Array.prototype.sort() method is provided.

function compareFunction(elt1, elt2) {
  return elt1.charCodeAt(0) - elt2.charCodeAt(0);
}
medianHigh(['a', 'c', 'b', 'd'], compareFunction); // -> 'c'

By default, the compare function orders the data array in ascending order, in the numerical sense. Using arbitrary values for the compare function might result in invalid results.

If the data array is empty, the method returns undefined.

medianGrouped

medianGrouped(arr[, width]);

Return the median (middle value) of grouped continuous numeric data arr, using interpolation.

medianGrouped([52, 52, 53, 54]); // -> 52.5
medianGrouped([1, 2, 2, 3, 4, 4, 4, 4, 4, 5]); // -> 3.7

The medianGrouped method takes an optional argument width which represents the class width, and defaults to 1. Changing the class width will change the result.

medianGrouped([1, 3, 3, 5, 7]); // -> 3.25
medianGrouped([1, 3, 3, 5, 7], 2); // -> 3.5

If the data array is empty or contains a non finite Number, the method returns undefined.

quartiles

quartiles(arr);

Return the quartiles of a numeric data array arr.

  • The first quartile (Q1) is defined as the middle number between the smallest number and the median of the data set.
  • The second quartile (Q2) is the median of the data.
  • The third quartile (Q3) is the middle value between the median and the highest value of the data set.
quartiles([2, 2, 3, 4]); // -> [2, 2.5, 3.5]

The data set if first ordered, from smallest to highest.

The median (Q2) is used to divide the ordered data set into two halves.

  • If there are an odd number of data points in the original ordered data set the median is not included in either half.
  • If there are an even number of data points in the original ordered data set, the sata set is split exactly in half.

The lower quartile value (Q1) is the median of the lower half of the data. The upper quartile value (Q3) is the median of the upper half of the data.

quartiles([6, 7, 15, 36, 39, 40, 41, 42, 43, 47, 49]); // -> [15, 40, 43]
quartiles([7, 15, 36, 39, 40, 41]); // -> [15, 37.5, 40]

If the data array contains less than 4 elements and/or contains a non finite Number, the method returns undefined.

midRange

midRange(arr);

Return the mid-range of the data array arr.

The mid-range of a data set is the arithmetic mean of the maximum and minimum values in the data set.

midRange([1, 4, 6, -1]); // -> 2.5;

If the data array is empty or contains a non-numeric value, the method returns undefined.

mode

mode(arr);

Return the mode(s) of a data array arr.

The mode is the most common data point from the data array. The method mode returns the mode(s) in an array.

mode([1, 1, 2]); // -> [1]

If there are multiple data points with the same larger number of occurences in the data array, there are multiple modes and they are all returned as an array.

mode([1, 2, 3, 3, 4, 4]); // [3, 4]

The mode method also applies to non-numeric data arrays.

mode(['a', 'c', 'b', 'd', 'c']); // -> ['c']

If the data array is empty, the method returns undefined.

rms

rms(arr);

Return the root mean square (rms) of the data array arr.

The Root Mean Square (rms) is the square root of the arithmetic mean of the squares of a set of numbers.

rms([4, 1, 1, 3]); // -> 2.598076211353316;

If the data array is empty or contains a non-numeric value, the method returns undefined.

percentile

percentile(arr, k);

Returns the k^{th} percentile of the data array.

A percentile (or a centile) is a measure used in statistics indicating the value below which a given percentage of observations in a group3 of observations falls.

percentile([13, 20, 8, 8, 7, 10, 3, 15, 16, 6], 0.25); // -> 7

If the data array is empty or contains a non-numeric value, the method returns undefined. If the value of k is non-numeric and not in the interval [0, 1], the method returns undefined.

kurtosis

kurtosis(arr);

Returns the sample kurtosis of the data array.

The sample kurtosis is a measure of the "tailedness" of a data array.

const arr = [0, 3, 4, 1, 2, 3, 0, 2, 1, 3, 2, 0, 2, 2, 3, 2, 5, 2, 3, 999];
kurtosis(arr).toFixed(2); // -> '15.05';

If the data array is empty or contains a non-numeric value, the method returns undefined.

Measures of spread

These methods compute a measure of the variability in a sample or population, how much the sample or population tends to deviate from the typical or average values.

| Method | Description | | ----------------------- | ----------------------------- | | pVariance | Population variance | | pStdev | Population standard deviation | | variance | Sample variance | | stdev | Sample standard deviation | | range | Range | | mad | Median Absolute Deviation |

pVariance

pVariance(arr[, mu]);

Return the population variance of a numeric data array arr.

The variance, or second moment about the mean, is a measure of the spread of a sample or population. A large variance indicates that the data is spread out; a small variance indicates it is clustered closely around the mean.

pVariance([0.0, 0.25, 0.25, 1.25, 1.5, 1.75, 2.75, 3.25]); // -> 1.25

The mean of the data array mu can be provided as an optional argument if previously computed.

const pop = [1, 2, 3, 4, 5];
const mu = mean(pop); // -> 3
pVariance(pop, mu); // -> 2

If ommited, the mean is automatically computed. The function does not verify that the provided mean is accurate. Using arbitrary values for the mean might lead to invalid results.

This method is appropriate for computing the variance of the entire population. To estimate the variance from a sample, the variance method is recommended.

If the data array is empty or contains a non finite Number, the method returns undefined.

pStdev

pStdev(arr[, mu]);

Return the population standard deviation of a numeric data array arr.

The standard deviation is a measure that is used to quantify the amount of variation or dispersion of a set of data values, computed as the square root of the variance.

The mean of the data array mu can be provided as an optional argument if previously computed.

pStdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75]); // -> 0.986893273527251
const mu = 3;
pStdev([1, 2, 3, 4, 5], mu); // -> 1.4142135623730951

Please refer to the pVariance method for further details.

variance

variance(arr[, xBar]);

Return the sample variance of a numeric data array arr.

The variance, or second moment about the mean, is a measure of the spread of a sample or population. A large variance indicates that the data is spread out; a small variance indicates it is clustered closely around the mean.

variance([0.0, 0.25, 0.25, 1.25, 1.5, 1.75, 2.75, 3.25]); // -> 1.4285714285714286

The mean of the data array xBar can be provided as an optional argument if previously computed.

const sample = [1, 2, 3, 4, 5];
const xBAr = mean(sample); // -> 3
variance(sample, xBar); // -> 2.5

If ommited, the mean is automatically computed. The function does not verify that the provided mean is accurate. Using arbitrary values for the mean might lead to invalid results.

This method is appropriate for computing the variance of a sample from a population. To compute the variance of an entire population, the pVariance method is recommended.

If the data array is empty, contains a single value or contains a non finite Number, the method returns undefined.

stdev

stdev(arr[, xBar]);

Return the sample standard deviation of a numeric data array arr.

The standard deviation is a measure that is used to quantify the amount of variation or dispersion of a set of data values, computed as the square root of the variance.

The mean of the data array xBar can be provided as an optional argument if previously computed.

stdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75]); // -> 1.0810874155219827
const xBar = 3;
stdev([1, 2, 3, 4, 5], xBar); // -> 1.5811388300841898

Please refer to the variance method for further details.

range

range(arr);

Return the range of a numeric data array arr.

The range of a set of data is the difference between the largest and smallest values.

range([89, 73, 84, 91, 87, 77, 94]); // -> 21

If the data array is empty or contains a non finite Number, the method returns undefined.

mad

mad(arr);

Return the median absolute deviation (mad) of a numeric data array arr.

The media absolute deviation is the median of the absolute deviations from the data median.

mad([3, 8, 8, 8, 8, 9, 9, 9, 9]); // -> 1

If the data array is empty or contains a non finite Number, the method returns undefined.

Descriptive statistics

These methods compute a summary statistic that quantitatively describes features of a data array.

| Method | Description | | ------------------- | --------------------------- | | min | Minimum | | max | Maximum | | product | Product of all the elements | | sum | Sum of all the elements |

min

min(arr);

Return the minimum value of a numeric data array arr.

The minimum is the smallest number in the data array.

min([2.5, 3.25, -2, 5.75]); // -> -2

If the data array is empty or contains a non finite Number, the method returns undefined.

max

max(arr);

Return the maximum value of a numeric data array arr.

The maximum is the largest number in the data array.

max([2.5, 3.25, -2, 5.75]); // -> 5.75

If the data array is empty or contains a non finite Number, the method returns undefined.

product

product(arr);

Return the product of all elements of a numeric data array arr.

product([1, 2, 3, 4]); // -> 24

If the data array is empty or contains a non finite Number, the method returns undefined.

sum

sum(arr);

Return the sum of all elements of a numeric data array arr. The method implement the Kahan summation algorithm in order to minimise numerical error.

sum([
  0.1,
  0.2,
  0.3,
  0.4,
  0.5,
  0.6,
  0.7,
  0.8,
  0.9,
  1.0,
  1.1,
  1.2,
  1.3,
  1.4,
  1.5,
  1.6,
  1.7,
]); // -> 15.3

If the data array is empty or contains a non finite Number, the method returns undefined.

Measures of similarity

These methods compute a measure of the similarity between samples or populations.

| Method | Description | | --------------------------- | ------------------- | | covariance | Joint variability | | correlation | Linear relationship |

covariance

covariance(x, y);

Return the sample covariance between two numeric data arrays x and y.

The covariance is a measure of the joint variability of two data arrays.

covariance([5, 12, 18, 23, 45], [2, 8, 18, 20, 28]); // -> 146.1

The covariance method will return undefined in the following cases:

  • At least one of the arguments is not an array.
  • At least one of the data arrays contains at least one non finite Number.
  • At least one of the data arrays contains less than two elements.
  • The two data arrays do not have the same number of elements.
covariance(3, [2, 2]); // -> undefined
covariance([3, 2.5, 5.1, 5.75], ['a', 2.5, 'b', 5.75]); // -> undefined
covariance([NaN, 2.5, 3, 5.75], [3, 2.5, 5.1, 5.75]); // -> undefined
covariance([2, 1], [3, 2.5, 5.1, 5.75]); // -> undefined
covariance([3], [2]); // -> undefined

correlation

correlation(x, y);

Return the correlation between two numeric data arrays x and y.

The correlation is a measure of how close two datasets are to having a linear relationship.

correlation([1, 2, 3, 5], [1, 3, 8, 10]); // -> 0.9519450934357727

The correlation is computed as the ratio between the covariance and the product of the standard deviations of the tow data arrays. The correlations is between -1 and 1.

Please refer to the covariance method for further details.

Averages and measures of central location

These methods compute regression models for estimating the relationships among variables.

| Method | Description | | ----------------- | ----------------- | | linReg | Linear Regression |

linReg

linReg(x, y);

Return the simple linear regression model between two variables, one independent x and one dependent y. The linear regression model is a linear function (straight line) which predicts with the highest possible accuracy the dependent variable as a function of the independent variable.

linReg([5, 12, 18, 23, 45], [2, 8, 18, 20, 28]); // ->
/*
{
  slope: 0.6316472114137484,
  y0: 2.1880674448767827,
  deltaSlope: 0.1364335975819171,
  deltaY0: 3.3680036696904967,
} */

The method uses the least-squares approach, minimising the sum of squared residuals. The confidence intervals are computed as defined here and with t*_{n-2} equal to 1.

The linReg method will return undefined in the cases defined in covariance.

License

The library is MIT licensed.