wilson-interval

v4.0.5

Published

2 years ago

Used to calculate the high bound, low bound, and center of a Wilson score interval. Features support for continuity correction and Singleton's adjustment.

Downloads

0High
0Medium
0Low

erikfox

wilson score reddit confidence statistics proportion interval rank votes upvotes downvotes

Wilson Interval

A comprehensive module used to calculate the high bound, low bound, and center of a Wilson score interval. Features support for known populations (i.e. Singleton's adjustment).

Popularized by Reddit's Comment/Best Sort and similar voting algorithms.

Install

npm install wilson-interval

Include

import wilson from 'wilson-interval';

Usage

wilson(observed, sample[, population ][, options ]);

observed - Number of observed positive outcomes.
sample - Size of sample.

Optional arguments:

population - Default false. Total population from which sample was taken (to use Singleton's adjustment[1]).
options - Default {}. Options object. Available parameters:
- confidence - Default 0.95. Desired confidence level of interval.
- precision - Default 20. Number of significant figures to use in calculations and output.

Example

return wilson(5, 100);

returns

{
  "center": "0.066647073981204927863",
  "high": "0.11175046869375655694",
  "low": "0.021543679268653298792",
}

Use cases

Low bound sorting

Most often, the low bound of the interval will be used as the sorting parameter (e.g. Reddit's Comment/Best Sort). This places more importance on confidence than total score.

Even if a ranked item has 100% positive responses, this ensures it won't be ranked at the top until enough data has been gathered for the algorithm to be confident that that ratio is what it really deserves.

Singleton's adjustment

Uses a known, finite population size to inform the degree of uncertainty of the prediction.

Descriptive statistics summarises the sample as if it were the entire population (left), whereas inferential statistics assumes the sample is a tiny subset of the population (right). If the sample is a large part of the population the confidence interval on observations is reduced (middle).[1]

USE WHEN:

Your sample size represents a significant portion of the population.
You have an imperfect original sample, from which you can only verify a subsample. The original sample can serve as a "population" to produce a verification interval to be combined with the first.[2]

Sources

[1] Wallis, Sean 2012. Inferential Statistics — and other animals. London: Survey of English Usage, UCL.

[2] Wallis, Sean 2014. Coping with imperfect data. London: Survey of English Usage, UCL.

Special thanks to Sean Wallis—Senior Research Fellow, Survey of English Usage—for his aid in transcribing equations, and for his blog posts which inspired many of the features of this module.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme