npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

aws-data-science

v0.2.4

Published

Basic Building Blocks for Data Scientist analyzing AWS cloud platforms

Downloads

28

Readme

AWS Data Science

npm version Build Status License: LGPL v3 styled with prettier TypeScript

Pragmatic take on being a data scientist for AWS-based applications and systems. Basically you can take typical AWS data sources, apply transformations, and gather results into reports.

While AWS does indeed offer some services for the data handling on different scales, a data scientist might want to crunch the data on-demand, getting a feeling for things and answer questions right away before implementing bigger architectural systems.

Simple Example

Just a glimpse for you how it feels like to use this library (via typescript):

import { Aggregate, Collect, Origin, Transform } from 'aws-data-science';

async () => {
  // interesting for us: count some occurences of numbers by critera
  const counter = new Aggregate.Count(num => num > 3)

  // start the stream and `await` for the pipeline to end
  // here: just start a stream from a simple array of numbers
  const result = await new Origin.Array<number>([1, 2, 3, 4, 5, 6, 7])
    // apply typical stream modifications, like filtering out even numbers
    .pipe(new Transform.Filter<number>(num => num % 2 === 0))
    // data mining for some statistics: use counter from above
    .pipe(counter)
    // transform data one item at a time
    .pipe(new Transform.Map<number, number>(num => num + 1))
    // data must stream into something that collects it, like in-memory array
    .pipe(new Collect.Array<number>())
    // every collector implements this, promise resolves when all is done
    .promise()

  // output the results:
  console.log('result elements:', result) // => [ 3, 5, 7 ]
  console.log('counter:', counter.result()) // => 2
}

As you might notice, these are some functional building blocks implemented on top of the node.js stream module with a charming API. When using via typescript, there are also generics leveraged to aid you when building your data pipelines (less debugging), this is optional for plain JS. Also, you never have to implement .on('data') or .on('end') event handlers when using Collectors, since they expose a .promise() which can be awaited on.

It is also quite easy to parallelize multiple pipelines: don't await on one, but stuff handlers of many pipelines into an array and await all of them as you wish, like with await Promise.all(myPipelines).

Installation

npm install -S aws-data-science

This package also requires a peer dependency of aws-sdk.

Data Sources ("Origins")

All data sources (called "Origins") implement the stream.Readable interface and must be the starting point of all data analysing efforts. The following data sources can be used currently for data mining:

  • [x] Origin.Array: start stream from simple arrays
  • [x] Origin.String: start stream from string, emits words
  • [x] Origin.CloudWatchLog: stream CloudWatchLog entries
  • [ ] CloudFront Logs (via S3)
  • [ ] CloudTrail Logs
  • [ ] Billing API
  • [ ] DynamoDB Tables
  • ...

Transformations

On every data stream, you can apply as many transformation steps as you wish. Since the stream pipe data flow model applies backpressure nicely for you, your computer should handle practically infinite loads of data without hassle.

  • [x] Transform.Map: same as .map() in javascript
  • [x] Transform.Filter: same as .filter() in javascript
  • [x] Transform.ParseLambdaLog: unifies multi-line event outputs from Lambda
  • ...

Aggregations

This is where data mining comes into play. You can pipe you data stream into several "Aggregators" to generate additional data, for example counting even numbers in a number stream. Or occurences of words within a text corpus.

  • [x] Aggregate.Count: count truthy statements in stream
  • [x] Aggregate.List: store things from the stream in an array
  • [x] Aggregate.Mean: count numbers from the stream and return the mean value
  • [x] Aggregate.Rank: count occurences of things and sort by highscore
  • [x] Aggregate.Sum: add all numbers in a stream
  • ...

Collectors

Once your data pipeline has done everything you want, you must choose where the data should end up. You might collect anything in an in-memory array, or store stuff in files, or even discard everything completely since you only want some aggregated informations.

  • [x] Collect.Array: stream sink as simple array
  • [x] Collect.JsonFile: stream sink directly into JSON array file
  • [x] Collect.Nothing: when you don't need the data any longer