npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

ndarray-linear-regression

v0.1.0

Published

Linear regression with ndarray

Downloads

5

Readme

ndarray-linear-regression

Fit linear regression models using QR decomposition on ndarray datastructures. It currently supports fitting, prediction intervals and standard errors for coefficients.

npm version build status ISC-licensed

Installing

npm install ndarray-linear-regression

Usage

An example on how to fit a linear regression model to the mtcars dataset. The model is mpg ~ hp + cyl. I.e. can we predict miles per gallon by a linear combination of hp and cyl.

const fit = require("ndarray-linear-regression")
const mtcars = require("mtcars")
const ndarray = require("ndarray")
const pool = require("ndarray-scratch")

const mpg = mtcars.map((x) => x.mpg)
const m = mpg.length
const n = 2
const hp = mtcars.map((x) => x.hp)
const cyl = mtcars.map((x) => x.cyl)
const response = ndarray(new Float64Array(mpg), [m])

const designMatrix = pool.zeros([m, n])
const newDataMatrix = pool.zeros([m, n])
for (let i = 0; i < m; i++) {
  for(let j = 0; j < n; j++) {
    const value = j == 0 ? hp[i] : cyl[i]
    designMatrix.set(i, j, value)
    newDataMatrix.set(i, j, value)
  }
}

// fit the model
// note, the response and designMatrix will be reused during the fitting process
// That means the values in those data structures should not be used by any other
// functions
const model = fit(response, designMatrix)

// the coeffients are here
const coefficents = model.coefficents

// you can use the resulting model object to make predictions for new data
const prediction = model.predict(newDataMatrix)

// you can compute the standard errors for the coefficents
const SEs = model.computeCoefficentSEs()

// and also predictions intervals
const predIntervals = model.predictionInterval(0.05, newDataMatrix)

API

Fit

In order to fit a linear regression model you need to have two datastructures.

  • One is a response vector, an ndarray of floats of dimension m
  • The other one is a so called design matrix. It is encoded as an ndarrayof dimension [m, n]. So one row per element in your response. In machine learning, the columns in that matrix are called "features".

Using the design matrix, you try to find a linear model that can predict the values in the response vector.

The following call shows how to fit a model:

const model = fit(response, designMatrix)

The returned result is an object whose named elements are described in subsequent sections.

It is very important to note that both the response and the designMatrix will be mutated during the fitting process. Other internal functions depend on the correctness of those values. This means that you need to make sure that the two data structures are not used elsewhere. The consequence is that the memory footprint is lower, but we have mutable state 🙈

Model diagnostics, interpretation and inference

The following options are available to asses the fitted model:

  • coefficients - is an ndarray of dimension [n] with the estimated coefficients of the fitted model.
  • residuals - an ndarray of dimension [m] having the residuals. The residuals is the initial response vector minus the fitted values (i.e. the prediction on the training dataset).
  • computeCoefficentSEs() - the function computes the standard errors for the model coefficents. It returns and ndarray of dimension [n]. These values can be used to tests if your model variables have a statistical significant effect on the response.
  • computeVcov() - a function that computes the variance-covariance matrix of the model coefficients.

Prediction

In order to make predictions, use the functions below:

  • predict(newData) - is a function that takes a new design matrix and uses the fitted model to make predictions on unseen data. It returns an ndarray of dimension [m]
  • predictionInterval(alpha, newData) - is a function with two parameters:
    • The first parameter alpha, a float between 0 and 1, is the so called significance level. A good choice for alpha is 0.05 :). The smaller this value, the larger your prediction intervals.
    • The second parameter is a new design matrix, similar to the function predict.
    • It returns an object with three elements fit, lowerLimit and upperLimit. The first one is the expected value of your prediction and the other two are the lower and upper limits of your (1 - alpha) prediction intervals. This is especially handy when you want to give an estimate of uncertainty around your prediction.

Inspiration

The following links give more information and inspired the creation of this package.

  • https://www.stat.wisc.edu/courses/st849-bates/lectures/Orthogonal.pdf
  • https://stackoverflow.com/questions/38109501/how-does-predict-lm-compute-confidence-interval-and-prediction-interval
  • https://genomicsclass.github.io/book/pages/qr_and_regression.html

Contributing

If you have a question or have difficulties using ndarray-linear-regression, please double-check your code and setup first. If you think you have found a bug or want to propose a feature, refer to the issues page.