npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

krakens

v1.9.0

Published

A reactive & declarative dataframe implemented in pure Javascript using RxJS.

Downloads

14

Readme

CircleCI License: MIT

Krakens (alpha)

Note: This is a work in progress. It's not recommended for public use yet. You can expect a stable version to arrive in July 2019.

Krakens is a simple, declarative and reactive dataframe built in pure Javascript using RxJS as part of the Swashbuckler machine learning toolchain.

Declarative and platform-agnostic. Krakens allows data-driven applications to declare what data streams they want and implement business logic around those without worrying about where the dataframe stores its data or how the dataframe runs its computations. You can run the exact same data pipelines and operations on input data regardless of whether it is stored in memory, in a CSV file or 1000 miles away in a MongoDB cluster. Krakens makes it possible to offload many calculations to an underlying computational engine (like a MongoDB cluster or GraphQL API) but is also supports in-memory computations.

Built for analysts. Krakens is designed to suit the needs of data analysts and machine learning engineers. It provides helpful functions to support data analysis tasks like data exploration, queries, descriptive statistics and calculations across columns.

Everything is an Observable. Every Krakens function returns an RxJS observable. Columns are Observable streams. Rows are also Observable streams. All of the normal RxJS operators can be applied on columns and rows of data since they are simply RxJS Observables.

Universal. Krakens works with browsers, node.js, React Native, Electron and IoT devices.

Reactive. Krakens dataframes stream their data from a source. The data source can be static (like an in-memory matrix) but it doesn't need to be since Krakens can also stream data asynchronously on-demand. This allows users to run incremental analysis as additional rows are streamed into the dataframe.

Composable. Since every column in a Krakens dataframe is simply an RxJS Observable, Krakens dataframes can be composed from any data source or multiple data sources. For example, if you want to combine certain fields from a collection in MongoDB with additional calculated fields stored in memory or a CSV file, Krakens can do this. The only requirement is that the rows need to be loaded into Krakens in the same order (at the same index).

Installation

npm i --save krakens
yarn add krakens

Create a Dataframe

From an Array (containing rows) or Observable (containing the rows):

import { create } from 'krakens';
import { from } from 'rxjs';

const pirates = [
  {name: 'Blackbeard', booty: 100},
  {name: 'Sparrow', booty: 5},
  {name: 'Captain Crunch', booty: 10000000},
];

// Or create a dataframe from an array (containing the rows):
const dfFromArray$ = create(pirates);

// Create a dataframe from an observable containing the rows:
const pirate$ = from(pirates);
const dfFromObservable$ = create(pirate$);

From remote or local csv:

import { create } from 'krakens-csv';

// from local file
const dfFromLocalCsv$ = create('./pirates.csv'); 
// Remote files work, too:
const dfFromRemoteCsv$ = create('https://datasets.buccaneer.ai/pirates.csv');
// Gzipping remote files is a good idea. Krakens supports it:
const gzippedCsv$ = create('https://datasets.buccaneer.ai/pirates.csv.gz');
// AWS S3 is wildly popular so Krakens handles S3 CSV files out of the box:
const dfFromS3$ = create('s3://mybucket/pirates.csv');

From a MongoDB collection:

import { create } from 'krakens-mongo';

const dfFromMongoCollection$ = create({
  mongoUrl: process.env.MONGO_URL,
  collection: 'pirates',
});

From custom data sources:

You can often just pull all the data into an observable and then create a Krakens dataframe. Here's an example that pulls data from a RESTful HTTP endpoint:

import { from } from 'rxjs';
import { mergeMap } from 'rxjs/operators';
import superagent from 'superagent'; // install a HTTP client

// Stream data from an HTTP API
const httpResponse = superagent.get('https://api.buccaner.ai/pirates');
const row$ = from(httpResponse).pipe(mergeMap(json => from(json.data));
const df$ = create(row$);

If you want your Krakens client to fetch only what it needs or delegate computations to another software system, it's easy to implement your own Krakens client. Fear not! It really isn't hard!

Using Operators

Krakens implements a handful of pipe-able operators that play nice with databases and tabular data. Operators are declarative meaning that you declare what transformations you want the dataframe to run without worrying about how the data source implements the operator.

Note: Every operator expects to receive a dataframe. If you try to use an operator on a stream that doesn't contain a dataframe as its first parameter, then your pipeline will error out. For example:

import { of, from } from 'rxjs';
import { mergeMap } from 'rxjs/operators';
import { create } from 'krakens';
import { count } from 'krakens/operators';

const nauticalFriends = [{name: 'Ahab'}, {name: 'Nemo'}, {name: 'Ariel'}];
const df$ = create(nauticalFriends);

// Yarrr. This works.
df$.pipe(count());

// You can also pipe transformations into krakens operators, as long as they 
// return a dataframe:
of({foo: bar}).pipe(
  mergeMap(() => df$),
  count()
);

// No, Matey! This will fail.
of('notadataframe').pipe(count()); // triggers the onError method

count()

Returns an observable containing the row count for the dataframe.

import { create } from 'krakens';
import { count } from 'krakens/operators';

const df$ = create([
  {name: 'Blackbeard', booty: 100},
  {name: 'Sparrow', booty: 5},
  {name: 'Captain Crunch', booty: 10000000},
]);
const allRowCount$ = df$.pipe(
  count()
);
const filteredRowCount$ = df$.pipe(
  count({name: $eq: 'Blackbeard'})
);

createColumn()

import { create } from 'krakens';
import { createColumn } from 'krakens/operators';

const pirates = [
  {name: 'Blackbeard', booty: 100},
  {name: 'Sparrow', booty: 5},
  {name: 'Captain Crunch', booty: 10000000},
];
const df$ = create(pirates);

cols(columnNames, <Array><String,Integer>)

Given a column name or column index (either a String or integer), returns an observable containing the data from a particular column.

import { create } from 'krakens';
import { cols } from 'krakens/operators';

const df$ = create([
  {name: 'Blackbeard', booty: 100},
  {name: 'Sparrow', booty: 5},
  {name: 'Captain Crunch', booty: 10000000},
]);

const bootyCol$ = df$.pipe(
  cols(['booty'])
);

const columnsByIndex$ = df$.pipe(
  cols([1]) // this is the second key, which in this case is "booty"
);

const columnsByColumnName$ = df$.pipe(
  cols(['drinksRum', booty']
);
const rowWithColIndex$ = df$.pipe(
  cols([0, 2])
);

mapCols(mapper<Function>, options<{fields<Array><String,Number>, colName<String>}>)

  • Creates a new column by mapping one or more existing columns and transforming them to a new value.
  • Creates a new dataframe column and pushes the new dataframe (containing the new column) to the dataframe's Subject.
import { create, mapCols } from 'krakens';

const df$ = create([
  {name: 'Blackbeard', booty: 100},
  {name: 'Sparrow', booty: 5},
  {name: 'Captain Crunch', booty: 10000000},
]);

const mappedCol$ = df$.pipe(
  mapCols(
    row => row.booty * row.booty, 
    {fields: ['booty'], colName: 'bootySquared'}
  )
);
mappedCol$.subscribe(); // emits the value of the new 'bootySquared' column

// Henceforth, df$ will emit a dataframe with the new column "bootySquared", 
// which is saved in memory.

where(query<Query>)

import { create, where } from 'krakens';

const pirates = [
  {name: 'Blackbeard', booty: 100},
  {name: 'Sparrow', booty: 5},
  {name: 'Captain Crunch', booty: 10000000},
];
const df$ = create(pirates);

// where() supports the following operators, similar to MongoDB:
const results0$ = df$.pipe(where({booty: {$eq: 100}})); // equal to 100
const results1$ = df$.pipe(where({booty: {$ne: 5}})); // not equal to 5
const results2$ = df$.pipe(where({booty: {$in: [5, 100]}})); // equal to 5 or 100
const results3$ = df$.pipe(where({booty: {$nin: [5, 100]}})); // not equal to 5 or 100
const results4$ = df$.pipe(where({booty: {$exists: 1}})); // has a value for booty field
const results5$ = df$.pipe(where({booty: {$exists: 0}})); // has no value for booty field
const results6$ = df$.pipe(where({booty: {$gt: 100}})); // greater than 100
const results7$ = df$.pipe(where({booty: {$gte: 100}})); // greater than or equal to 100
const results8$ = df$.pipe(where({booty: {$lt: 100}})); // less than 100
const results9$ = df$.pipe(where({booty: {$lte: 100}})); // less than or equal to 100

// operators can be combined:
const results10$ = df$.pipe(where({booty: {$gt: 5}, drinksRum: {$eq: true}}));

License

MIT