npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@modelx/data

v1.1.2

Published

modelscript is a javascript module with simple and efficient tools for data mining and data analysis in JavaScript. When modelscript used with ML.js, pandas-js, and numjs, you're left with the equivalent R/Python tool set in JavaScript.

Downloads

235

Readme

@modelx/data

Coverage Status Build, Test & Coverage

quickly generate UMDs and other module types with rollup and typescript

Getting started

Clone the repo and drop your module in the src directory.

# Install Prerequisites
$ npm install rollup typedoc jest sitedown --g

Basic Usage

$ npm run build #builds type declarations, created bundled artifacts with rollup and generates documenation

Description

ModelScript is a javascript module with simple and efficient tools for data mining and data analysis in JavaScript. ModelScript can be used with ML.js, pandas-js, and numjs, to approximate the equivalent R/Python tool chain in JavaScript.

In Python, data preparation is typically done in a DataFrame, ModelScript encourages a more R like workflow where the data preparation is in it's native structure.

Installation

$ npm i modelscript

Full Documentation

Usage (basic)

ModelScript is an EcmaScript module and designed to be imported in an ES2015+ environment. In order to use in older environment, please use const modelscript = require('modelscript/build/modelscript.cjs.js') for older versions of node and <script type="text/javascript" src=".../path/to/.../modelscript/build/modelscript.umd.js"/>

"modelscript" : {
  ml:{ //see https://github.com/mljs/ml
    UpperConfidenceBound [Class: UpperConfidenceBound]{ // Implementation of the Upper Confidence Bound algorithm
      predict(), //returns next action based off of the upper confidence bound
      learn(), //single step training method
      train(), //training method for upper confidence bound calculations
    },
    ThompsonSampling [Class: ThompsonSampling]{ //Implementation of the Thompson Sampling algorithm
      predict(), //returns next action based off of the thompson sampling
      learn(), //single step training method
      train(), //training method for thompson sampling calculations
    },
  },
  nlp:{ //see https://github.com/NaturalNode/natural
    ColumnVectorizer [Class: ColumnVectorizer]{ //class creating sparse matrices from a corpus
      get_tokens(), // Returns a distinct array of all tokens after fit_transform
      get_vector_array(), //Returns array of arrays of strings for dependent features from sparse matrix word map
      fit_transform(options), //Fits and transforms data by creating column vectors (a sparse matrix where each row has every word in the corpus as a column and the count of appearances in the corpus)
      get_limited_features(options), //Returns limited sets of dependent features or all dependent features sorted by word count
      evaluateString(testString), //returns word map with counts
      evaluate(testString), //returns new matrix of words with counts in columns
    }
  },
  csv:{
    loadCSV: [Function: loadCSV], //asynchronously loads CSVs, either a filepath or a remote URI
    loadTSV: [Function: loadTSV], //asynchronously loads TSVs, either a filepath or a remote URI
  },
  model_selection: {
    train_test_split: [Function: train_test_split], // splits data into training and testing sets
    cross_validation_split: [Function: kfolds], //splits data into k-folds
    cross_validate_score: [Function: cross_validate_score],//test model variance and bias
    grid_search: [Function: grid_search], // tune models with grid search for optimal performance
  },
  DataSet [Class: DataSet]: { //class for manipulating an array of objects (typically from CSV data)
    columnMatrix(vectors), //returns a matrix of values by combining column arrays into a matrix
    columnArray(columnName, options), // - returns a new array of a selected column from an array of objects, can filter, scale and replace values
    columnReplace(columnName, options), // - returns a new array of a selected column from an array of objects and replaces empty values, encodes values and scales values
    columnScale(columnName, options), // - returns a new array of scaled values which can be reverse (descaled). The scaling transformations are stored on the DataSet
    columnDescale(columnName, options), // - Returns a new array of descaled values
    selectColumns(columns, options), //returns a list of objects with only selected columns as properties
    labelEncoder(columnName, options), // - returns a new array and label encodes a selected column
    labelDecode(columnName, options), // - returns a new array and decodes an encoded column back to the original array values
    oneHotEncoder(columnName, options), // - returns a new object of one hot encoded values
    columnMatrix(columnName, options), // - returns a matrix of values from multiple columns
    columnReducer(newColumnName, options), // - returns a new array of a selected column that is passed a reducer function, this is used to create new columns for aggregate statistics
    columnMerge(name, data), // - returns a new column that is merged onto the data set
    filterColumn(options), // - filtered rows of data,
    fitColumns(options), // - mutates data property of DataSet by replacing multiple columns in a single command
    static reverseColumnMatrix(options), // returns an array of objects by applying labels to matrix of columns
    static reverseColumnVector(options), // returns an array of objects by applying labels to column vector
  },
  calc:{
    getTransactions: [Function getTransactions], // Formats an array of transactions into a sparse matrix like format for Apriori/Eclat
    assocationRuleLearning: [async Function assocationRuleLearning], // returns association rule learning results using apriori
  },
  util: {
    range: [Function], // range helper function
    rangeRight: [Function], //range right helper function
    scale: [Function: scale], //scale / normalize data
    avg: [Function: arithmeticMean], // aritmatic mean
    mean: [Function: arithmeticMean], // aritmatic mean
    sum: [Function: sum],
    max: [Function: max],
    min: [Function: min],
    sd: [Function: standardDeviation], // standard deviation
    StandardScalerTransforms: [Function: StandardScalerTransforms], // returns two functions that can standard scale new inputs and reverse scale new outputs
    MinMaxScalerTransforms: [Function: MinMaxScalerTransforms], // returns two functions that can mix max scale new inputs and reverse scale new outputs
    StandardScaler: [Function: StandardScaler], // standardization (z-scores)
    MinMaxScaler: [Function: MinMaxScaler], // min-max scaling
    ExpScaler: [Function: ExpScaler], // exponent scaling
    LogScaler: [Function: LogScaler], // natual log scaling
    squaredDifference: [Function: squaredDifference], // Returns an array of the squared different of two arrays
    standardError: [Function: standardError], // The standard error of the estimate is a measure of the accuracy of predictions made with a regression line
    coefficientOfDetermination: [Function: coefficientOfDetermination],
    adjustedCoefficentOfDetermination: [Function: adjustedCoefficentOfDetermination],
    adjustedRSquared: [Function: adjustedCoefficentOfDetermination],
    rBarSquared: [Function: adjustedCoefficentOfDetermination],
    r: [Function: coefficientOfCorrelation],
    coefficientOfCorrelation: [Function: coefficientOfCorrelation],
    rSquared: [Function: rSquared], //r^2
    pivotVector: [Function: pivotVector], // returns an array of vectors as an array of arrays
    pivotArrays: [Function: pivotArrays], // returns a matrix of values by combining arrays into a matrix
    standardScore: [Function: standardScore], // Calculates the z score of each value in the sample, relative to the sample mean and standard deviation.
    zScore: [Function: standardScore], // alias for standardScore.
    approximateZPercentile: [Function: approximateZPercentile], // approximate the p value from a z score
  },
  preprocessing: {
    DataSet: [Class DataSet],
  },
}

Examples (JavaScript / Python / R)

Loading CSV Data

Javascript
import { default as jsk } from 'modelscript';
let dataset;

//In JavaScript, by default most I/O Operations are asynchronous, see the notes section for more
ms.loadCSV('/some/file/path.csv')
  .then(csvData=>{
    dataset = new ms.DataSet(csvData);
    console.log({csvData});
    /* csvData [{
      'Country': 'Brazil',
      'Age': '44',
      'Salary': '72000',
      'Purchased': 'N',
    },
    ...
    {
      'Country': 'Mexico',
      'Age': '27',
      'Salary': '48000',
      'Purchased': 'Yes',
    }] */
  })
  .catch(console.error);

// or from URL
ms.loadCSV('https://example.com/some/file/path.csv')
Python
import pandas as pd

#Importing the dataset
dataset = pd.read_csv('/some/file/path.csv')
R
# Importingd the dataset
dataset = read.csv('Data.csv')

Handling Missing Data

Javascript
//column Array returns column of data by name
// [ '44','27','30','38','40','35','','48','50', '37' ]
const OringalAgeColumn = dataset.columnArray('Age'); 

//column Replace returns new Array with replaced missing data
//[ '44','27','30','38','40','35',38.77777777777778,'48','50','37' ]
const ReplacedAgeMeanColumn = dataset.columnReplace('Age',{strategy:'mean'}); 

//fit Columns, mutates dataset
dataset.fitColumns({
  columns:[{name:'Age',strategy:'mean'}]
});
/*
dataset
class DataSet
  data:[
    {
      'Country': 'Brazil',
      'Age': '38.77777777777778',
      'Salary': '72000',
      'Purchased': 'N',
    }
    ...
  ]
*/
Python
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 3].values

# Taking care of of missing data
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values='NaN', strategy = 'mean', axis=0)
imputer = imputer.fit(X[:, 1:3])
X[:, 1:3] = imputer.transform(X[:, 1:3])
R
# Taking care of the missing data
dataset$Age = ifelse(is.na(dataset$Age),
                ave(dataset$Age,FUN = function(x) mean(x,na.rm =TRUE)),
                dataset$Age)

One Hot Encoding and Label Encoding

Javascript
// [ 'Brazil','Mexico','Ghana','Mexico','Ghana','Brazil','Mexico','Brazil','Ghana', 'Brazil' ]
const originalCountry = dataset.columnArray('Country'); 
/*
{ originalCountry:
   { Country_Brazil: [ 1, 0, 0, 0, 0, 1, 0, 1, 0, 1 ],
     Country_Mexico: [ 0, 1, 0, 1, 0, 0, 1, 0, 0, 0 ],
     Country_Ghana: [ 0, 0, 1, 0, 1, 0, 0, 0, 1, 0 ] },
    }
*/
const oneHotCountryColumn = dataset.oneHotEncoder('Country');

// [ 'N', 'Yes', 'No', 'f', 'Yes', 'Yes', 'false', 'Yes', 'No', 'Yes' ]
const originalPurchasedColumn = dataset.labelEncoder('Purchased');
// [ 0, 1, 0, 0, 1, 1, 1, 1, 0, 1 ]
const encodedBinaryPurchasedColumn = dataset.labelEncoder('Purchased',{ binary:true });
// [ 0, 1, 2, 3, 1, 1, 4, 1, 2, 1 ]
const encodedPurchasedColumn = dataset.labelEncoder('Purchased');
// [ 'N', 'Yes', 'No', 'f', 'Yes', 'Yes', 'false', 'Yes', 'No', 'Yes' ]
const decodedPurchased = dataset.labelDecode('Purchased', { data: encodedPurchasedColumn, });


//fit Columns, mutates dataset
dataset.fitColumns({
  columns:[
    {
      name: 'Purchased',
      options: {
        strategy: 'label',
        labelOptions: {
          binary: true,
        },
      },
    },
    {
      name: 'Country',
      options: {
        strategy: 'onehot',
      },
    },
  ]
});
Python
# Encoding  categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X = LabelEncoder()
X[:, 0] = labelencoder_X.fit_transform(X[:, 0])
onehotencoder = OneHotEncoder(categorical_features=[0])
X = onehotencoder.fit_transform(X).toarray()
labelencoder_y = LabelEncoder()
y = labelencoder_y.fit_transform(y)
R
# Encoding categorical data
dataset$Country = factor(dataset$Country,
                         levels = c('Brazil', 'Mexico', 'Ghana'),
                         labels = c(1, 2, 3))

dataset$Purchased = factor(dataset$Purchased,
                         levels = c('No', 'Yes'),
                         labels = c(0, 1))

Cross Validation

Javascript
const testArray = [20, 25, 10, 33, 50, 42, 19, 34, 90, 23, ];

// { train: [ 50, 20, 34, 33, 10, 23, 90, 42 ], test: [ 25, 19 ] }
const trainTestSplit = ms.cross_validation.train_test_split(testArray,{ test_size:0.2, random_state: 0, });

// [ [ 50, 20, 34, 33, 10 ], [ 23, 90, 42, 19, 25 ] ] 
const crossValidationArrayKFolds = ms.cross_validation.cross_validation_split(testArray, { folds: 2, random_state: 0, });
Python
#splitting the dataset into trnaing set and test set
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
R
# Splitting the dataset into the training set and test set
library(caTools)
set.seed(1)
split = sample.split(dataset$Purchased, SplitRatio = 0.8)
training_set = subset(dataset, split == TRUE)
test_set = subset(dataset, split == FALSE)

Scaling (z-score / min-mix)

Javascript
dataset.columnArray('Salary',{ scale:'standard'}); 
dataset.columnArray('Salary',{ scale:'minmax'}); 
Python
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)

Notes

Check out https://repetere.github.io/modelscript for the full modelscript Documentation

A quick word about asynchronous JavaScript

Most machine learning tutorials in Python and R are not using their asynchronous equivalents; however, there is a bias in JavaScript to default to non-blocking operations.

With the advent of ES7 and Node.js 7+ there are syntax helpers with asynchronous functions. It may be easier to use async/await in JS if you want an approximation close to what a workflow would look like in R/Python

import * as fs from 'fs-extra';
import * as np from 'numjs'; 
import { default as ml } from 'ml';
import { default as pd } from 'pandas-js';
import { default as mpn } from 'matplotnode';
import { loadCSV, preprocessing } from 'modelscript';
const plt = mpn.plot;

void async () => {
  const csvData = await loadCSV('../Data.csv');
  const rawData = new preprocessing.DataSet(csvData);
  const fittedData = rawData.fitColumns({
    columns: [
      { name: 'Age' },
      { name: 'Salary' },
      {
        name: 'Purchased',
        options: {
          strategy: 'label',
          labelOptions: {
            binary: true,
          },
        }
      },
    ]
  });
  const dataset = new pd.DataFrame(fittedData);
  const X = dataset.iloc(
    [ 0, dataset.length ],
    [ 0, 3 ]).values;
  const y = dataset.iloc(
    [ 0, dataset.length ],
    3).values;
  console.log({
    X,
    y
  });
}();