npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

online-linear-regression

v1.0.0

Published

Online linear regression

Downloads

693

Readme

online-linear-regression

Description

Linear regression is a beautiful algorithm that models relationship between two or more variables. The algorithm estimates linear weights ⍵0, ⍵1 … ⍵n given data (y, X) and assumed model:

y = ⍵0 + ⍵1x1 + … + ⍵nxn    (1)

The classical approach for calculating is minimizing the residual sum of squares:

RSS(⍵) = ∑i=1..N (yi - f(xi))2 = (yX⍵)T(yX⍵)    (2)

After differentiating the RSS we find it extrema:

* = (XTX)−1XTy    (3)

The formula is really great, however there are couple of problems with it from the computational point of view:

  1. Matrix inverse is quite expensive procedure
  2. We have to store all values of x and y
  3. To update the model with new sample we need to iterate over all dataset

Instead of using the classical formula for ⍵ (3) the online-linear-regression is based on the recursive least squares algorithm. It stores only one n×n matrix P and two n-dimensional vectors K and ⍵, where n - number of weights.

Ki+1 = Pixi+1(1 + xTi+1Pixi+1)−1i+1 = ⍵i + Ki+1(yi+1 - xTi+1⍵) Pi+1 = Pi − Ki+1xTi+1Pi

Installation

npm install online-linear-regression

const LinReg = require('online-linear-regression')

To use in browsers, bundle with browserify

Usage

I've tried to make the function interface as simple as possible. However because of inner values and on-line nature it's still needed to initialize the function before usage:

const LinReg = require('online-linear-regression')
const linreg = LinReg()

then just call linreg with 2 arguments (x, y) when training the model or one argument (x) when predicting ŷ.

// Example of polynomial regression
for (let i = 0; i < 5; i += Math.random() * 0.5) {
  let noise = Math.random() * 10 - 5
  let y = 3 * i ** 2 - 2 * i ** 2 + 9 * i + 10 + noise
  linreg([i, i ** 2, i ** 3], y)
}

console.log(linreg([2, 4, 8])) // 43.22

LSE

The problem of finding solutions for a system of independent equations when the number of equations is greater than the number of variables is known as Least Squares Estimation. In our case we have data (X, y) with N pairs of input/ouput values. Each input is a vector xi of length n-1, output — scalar value of yi. We try to find weight vector of length n with coefficients ⍵0,⍵1…⍵n. If N > n (i.e. data length is bigger that number of input variables) it's impossible to find one unique solution for . So we should use the LSE method.

y0 = ⍵0 + ⍵1x01 + … + ⍵nx0n y1 = ⍵0 + ⍵1x11 + … + ⍵nx1n yN = ⍵0 + ⍵1xN1 + … + ⍵nxNn

or in vector form:

y = X

we are trying to find such that minimizes the error function (3). To do that let's find the function's derivative:

RSS(⍵) = (yX⍵)T(yX⍵) = (yTTXT)(yX⍵) = yTy - TXTy - yTX⍵ + TXTX⍵

Understanding that TXTy is scalar and its transpose (TXTy)T = yTX⍵ doesn't change its value:

RSS(⍵) = yTy - 2TXTy + TXTX⍵

To differentiate the above formula, we use such math statements:

/∂⍵ (TXTy) = XTy /∂⍵ (TXTX⍵) = 2XTX⍵

Now differentiate RSS(⍵) and find a closed-form formula for when ∂RSS(⍵)/∂⍵ equals to 0 (function extrema)

/∂⍵ RSS(⍵) = -2XTy + 2XTX⍵ = 0 = (XTX)−1XTy

RLSE

We are interested in a recursive algorithm for such that when new data arrives it'd be possible to update weights without iterating over the dataset. Let's start from the formula (3) for i+1 case:

i+1 = (XTi+1Xi+1)−1XTi+1yi+1 = Pi+1XTi+1yi+1    (4)

Where:

Pi+1 = (XTi+1Xi+1)−1 = (XTiXi + xi+1xTi+1)−1 = (Pi + xi+1xTi+1)−1    (5)

Using Woodbury matrix identity we get rid of matrix inversion in (5):

Pi+1 = Pi - αPixi+1xTi+1Pi

α is a scalar value equal to

α = (1 + xTi+1Pixi+1)-1

Now we can re-write the equation (4) in a recursive way:

i+1 = Pi+1XTi+1yi+1 = Pi+1(XTiyi + xi+1yi+1) = (Pi - αPixi+1xTi+1Pi)(XTiyi + xi+1yi+1) i+1 = i + ki+1(yi+1 - xTi+1i)   (6)

where

ki+1 = Pi+1xi+1 = αPixi+1 = (1 + xTi+1Pixi+1)-1Pixi+1   (7)

The last thing we need is an updated formula of Pi+1 based on knowledge of k:

Pi+1 = Pi - αPixi+1xTi+1Pi = Pi - ki+1xTi+1Pi   (8)

Now we have all needed pieces to calculate weights iteratively. For each new sample:

  1. Calculate vector ki+1 using previous values of the matrix Pi and new input values x0,x1…xn
  2. Calculate vector i+1
  3. Update n×n matrix Pi+1