wilson-interval
v4.0.5
Published
Used to calculate the high bound, low bound, and center of a Wilson score interval. Features support for continuity correction and Singleton's adjustment.
Downloads
44
Maintainers
Readme
Wilson Interval
A comprehensive module used to calculate the high bound, low bound, and center of a Wilson score interval. Features support for known populations (i.e. Singleton's adjustment).
Popularized by Reddit's Comment/Best Sort and similar voting algorithms.
Install
npm install wilson-interval
Include
import wilson from 'wilson-interval';
Usage
wilson(observed, sample[, population ][, options ]);
observed
- Number of observed positive outcomes.sample
- Size of sample.
Optional arguments:
population
- Defaultfalse
. Total population from which sample was taken (to use Singleton's adjustment[1]).options
- Default{}
. Options object. Available parameters:confidence
- Default0.95
. Desired confidence level of interval.precision
- Default20
. Number of significant figures to use in calculations and output.
Example
return wilson(5, 100);
returns
{
"center": "0.066647073981204927863",
"high": "0.11175046869375655694",
"low": "0.021543679268653298792",
}
Use cases
Low bound sorting
Most often, the low bound of the interval will be used as the sorting parameter (e.g. Reddit's Comment/Best Sort). This places more importance on confidence than total score.
Even if a ranked item has 100% positive responses, this ensures it won't be ranked at the top until enough data has been gathered for the algorithm to be confident that that ratio is what it really deserves.
Singleton's adjustment
Uses a known, finite population size to inform the degree of uncertainty of the prediction.
Descriptive statistics summarises the sample as if it were the entire population (left), whereas inferential statistics assumes the sample is a tiny subset of the population (right). If the sample is a large part of the population the confidence interval on observations is reduced (middle).[1]
USE WHEN:
- Your sample size represents a significant portion of the population.
- You have an imperfect original sample, from which you can only verify a subsample. The original sample can serve as a "population" to produce a verification interval to be combined with the first.[2]
Sources
[1] Wallis, Sean 2012. Inferential Statistics — and other animals. London: Survey of English Usage, UCL.
[2] Wallis, Sean 2014. Coping with imperfect data. London: Survey of English Usage, UCL.
Special thanks to Sean Wallis—Senior Research Fellow, Survey of English Usage—for his aid in transcribing equations, and for his blog posts which inspired many of the features of this module.