npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

fussy

v0.1.2

Published

Json prediction and recommendation engine

Downloads

4

Readme

node-fussy

JSON prediction and recommendation engine

Build Status Dependency Status

Summary

Fussy is an inference engine for Node. You can use it to guess missing values in a JSON object.

It tried to find the most probable values for missing attributes by scanning a database of previously seen objects, computing a weighted average based on similarity.

Examples

Demo examples (included in the project):

More examples:

License

BSD (see LICENCE.txt file).

Current issues

Fussy is an experimental project, and has a number of pitfalls:

  • no tests yet (but soon)
  • code is alpha quality, not reviewed
  • all data must be loaded into memory (cannot use a remote db yet)
  • extremely slow (see mushroom demo..)
  • the "0" value is not supported well
  • not tested for strings (distance function is a bit broken, and will be rewritten)
  • and probably many other bugs..

How it works

Algorithm

The algorithm works as the following:

For a given JSON object with some missing fields, it tries to determine the most probable value of these fields, by looking at all the past values, and computing the average.

However, it does a weighted average: a different "trust" is given to each past object, depending on how close and relevant they are from the object to repair.

For this, what Fussy do is iteraring over all the stored objects (in a map-reduce fashion), and computing a distance score, based on the similarity between values: strings, numbers..

This distance will be used to weight the value of the missing field when computing the average.

Quick-start

Installation

As a dependency

Go to your Node (and NPM-managed) project, and run:

$ npm install fussy --save

From sources

To download the sources, build the coffee-script and link into your system:

$ git clone [email protected]:jbilcke/node-fussy.git
$ cd node-fussy
$ npm run build
$ npm link

Initialization

First you have to get an instance of the class Fussy

var Fussy = require('fussy');
var fussy = new Fussy();

Inserting JSON data

Then you can insert documents. You have a few ways of doing this.

You can use use the insert method, which takes a JSON object, or an array of objects:

fussy
  .insert({ 'food': 'rice',  'taste': 'good'});
  .insert([
    { 'food': 'salad', 'taste': 'good'},
    { 'food': 'grass', 'taste': 'bad' }
  ]);

Importing a dataset

After spending some time using Fussy on various datasets, I found it handy to write a small importer for CSV files. So here we go!

The import function takes an input CSV file, and a list of columns as parameter:

var data = fussy.import('thermal.csv', [ 'day', 'temperature' ]);

This second parameter can be used to define types:

fussy.import('thermal.csv', [
    ['day','String'],
    ['temperature','Number']
]);

You can also define a dictionary of values, when using Strings:

fussy.import('thermal.csv', [
    ['day', {
      'mon': 'Monday',
      'tue': 'Tuesday',
      'wed': 'Wednesday',
      'thu': 'Thursday',
      'fri': 'Friday',
      'sat': 'Saturday',
      'sun': 'Sunday'
      }],
    ['temperature','Number']
]);

Using the dataset function

Sometimes you need to do some operations on a dataset before using it.

For instance, maybe you only want to keep a subset of the dataset, or do random sampling, so you need access to the array before importing it.

Fussy provides a function to create a dataset (array of JSONs), available in the fussy.toolbox object.

The fussy.toolbox.dataset takes an input CSV file and a list of columns as parameter.

It works like the import function:

var data = fussy.toolbox.dataset('thermal.csv', [ 'day', 'temperature' ]);

You can then manipulate this array, before importing it. For instance:

var data = fussy.toolbox.shuffle(
  fussy.toolbox.dataset('data.csv', 'schema.json')
);

Will load the dataset and shuffle it.

Predicting data

var query = fussy.query({
  select: ['column'],
  where: {
    foo: '',
    bar: ''
  }
});

Using the results

When you call the query object, what you get is a result set, or "view" on the data. This view has the following methods:

best()

The best function returns the best value for a given field. It actually just takes the first element of the all function.

Depending on the distribution and the category of problem you are trying to solve, this might not be the best choice for you.

mix()

The mix compute the weighted average value for a numeric field.

For instance, say there are 3 possible values for a "temperature" field: 10, 20, 40..

While the all function will returns an array of value->weight, the mix function will directly returns you the weighted average, eg. 23.33.

all()

The all function returns the distribution of values: an array sorted by weight, of all possible choices for the requested fields.

This is actually an array of (value, weight) tuples.

Use this function if you want access to raw data, and need to make multiple, weighted decisions. (eg. for investment, risk management use cases).