npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

wilson-score-interval

v2.1.0

Published

binomial proportion confidence interval

Downloads

309

Readme

Wilson Score Interval

Simple JavaScript implementation of Wilson score interval. Useful wherever you want to make a confident estimate about the actions or preferences of a general population, given a sample of data (e.g. assigning scores for ranking comments by upvotes, products by popularity, and more).

Table of Contents

Installation

$ npm i wilson-score-interval

Usage

const wilson = require('wilson-score-interval');

/*
  wilson(upVotes, total);
  // upVotes === whatever result you want to estimate the confidence interval for
  // total === your total sample size
*/

wilson(430, 474); // { left: 0.8776750858242243, right: 0.9301239839930541 }
wilson(392, 436); // { left: 0.8672311846637769, right: 0.9239627360567735 }
wilson(10, 14);   // { left: 0.4535045882751561, right: 0.882788120898909 }

Explanation

Less technical:

If you know what a sample population thinks, you can use this tool to estimate the preferences of the population at large.

Suppose your site has a population of 10,000 users. One product has ratings from 100 users (your sample size): 40 upvotes, and 60 downvotes. You want to understand how popular the product would be across the whole population. So you run wilson-score-interval(40, 100), which returns the result { left: 0.3093997461136029, right: 0.4979992153815976 }. Now you can estimate with 95% confidence that between 30.9% and 49.7% of total users would upvote this product.

It is common to use the lower bound of this interval (here, 30.9) as the result, as it is the most conservative estimate of the "real" score.

For a beginner-friendly introduction to confidence intervals for population proportions, see this YouTube video.

More technical:

The Wilson score interval, developed by American mathematician Edwin Bidwell Wilson in 1927, is a confidence interval for a proportion in a statistical population. It assumes that the statistical sample used for the estimation has a binomial distribution. A binomial distribution indicates, in general, that:

  1. the experiment is repeated a fixed number of times;
  2. the experiment has two possible outcomes ('success' and 'failure');
  3. the probability of success is equal for each experiment;
  4. the trials are statistically independent.

This package uses a z-score of 1.96 by default, which translates to a confidence level of 95%.

For more, please see the Wikipedia page on the Wilson score interval and this blog post.

Comparison with other scoring methods

Using a simple calculation of score = (positive ratings) - (negative ratings) or score = average rating = (positive ratings) / (total ratings) proves to be problematic when working with smaller sample sizes, or differences in sample sizes across populations. See this blog post comparing scoring methods for details and examples.

The Wilson score interval is known for performing well given small sample sizes/extreme probabilities as compared to the normal approximation interval, because the formula accounts for uncertainties in those scenarios.

This paper offers a more technical comparison of the Wilson interval with other statistical approaches.

Use cases

Apart from sorting by rating, the Wilson score interval has a lot of potential applications! You can use the Wilson score interval anywhere you need a confident estimate for what percentage of people took or would take a specific action.

You can even use it in cases where the data doesn't break cleanly into two specific outcomes (e.g. 1-5 star ratings), as long as you are able to creatively abstract the outcomes into two buckets (e.g. % of users who voted 4 stars and above vs % of users who didn't).

Examples:

  • Most romantic city on Yelp (wilson-score-interval(num_romantic_searches / num_total_searches))
  • Sorting commments by upvotes on Reddit (wilson-score-interval(num_upvotes / num_total_votes))
  • Creating a 'most shared' list (wilson-score-interval(num_shares / num_total_views))
  • Spam/abuse detection (wilson-score-interval(num_marked_spam / num_total_votes))