npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

datahub-client

v0.5.8

Published

APIs for interacting with DataHub

Downloads

57

Readme

Node client and utilities for interacting with https://DataHub.io and handling Data Packages.

Build Status XO code style Issues

Introduction

The DataHub platform stores a lot of different datasets - which are packages of useful data alongside with the description (here is the dataset specification). The data, stored on the DataHub, has a nice structure, views and a description that help people to get insights.

You can also store and share your own datasets on the DataHub

As a programmer, you may want to automate the process of getting or storing the data. Also you may want to integrate your project and the DataHub.

The datahub-client library is designed for this. Let's explore it together.

Important notes:

  • You need to use Node version > 7.6
  • When you see the await keyword you should wrap this peace of the code in the async function.

Install

npm install datahub-client --save
const datahub = require('datahub-client')

Quick overview

With datahub-client you can do things like:

  • login to DataHub
  • authenticate with the jwt token
  • push a dataset:
  • get the data from the DataHub:
  • init a new dataset
  • verify that existing dataset is correct
  • transform tabular files info to different formats

Let's explore datahub-client features more deeply below.

login and authenticate

Documentation is not ready at the moment. Information will be here after refactoring Login module.

Datahub class

Datahub class contains push() and pushFlow() methods, that is used to upload a dataset to the DataHub.io

push a dataset (dataset is an instance of a Dataset class: https://github.com/datahq/data.js#datasets):

const {DataHub} = require('datahub-client')
const {Dataset} = require('data.js')

/* secure jwt token and userId is taken from the Login&Auth module */
const datahubConfigs = {
     apiUrl: 'http://api.datahub.io/',
     token: 'jwt token',
     debug: false,
     ownerid: 'userId',
   }
const datahub = new DataHub(datahubConfigs)

const pushOptions = {findability: 'unlisted'}
const res = await datahub.push(dataset, pushOptions)
console.log(res)

Possible push options:

  • findability: one of 'unlisted', 'published', 'private'
  • sheets: used to define excel sheets to push. Could be the sheet number, name or array of numbers, names
  • schedule: 'every X[m|h|d|w]' (min, hours, days, weeks)

This is an example of correct datahub.push() response:

{ dataset_id: 'username/finance-vix',
  errors: [],
  flow_id: 'username/finance-vix/40',
  success: true }

If you get any errors - change the debug option: datahubConfigs = {debug: true, ...}, to see the detailed log.

**pushFlow() is an experimental method, its documentation is not ready yet.

Get data using the datahub-client

const datahub = require('datahub-client');
const {Dataset} = require('data.js');

const dataset = await Dataset.load(datasetUrl);
const resources = await datahub.get(dataset);

Dataset.load() takes a path to the data package and returns a dataset object: https://github.com/datahq/data.js#datasets

datahub-client.get() method accept the dataset object and returns an array with resources from it.

Each resource in the resources is the special File object from the data.js lib: https://github.com/datahq/data.js#files

info

Info module contains two methods:

  • infoPackage(dpObj) Shows the meta information about the dataset. @param {data.js/Datapackage object} @return: {string}

  • async infoResource(resource) Shows the information about one particular resource @param {data.js/File object} - only tabular file objects are supported @return: {string} - ascii table

const data = require('data.js');
const datahub = require('datahub-client');

let dataset = await data.Dataset.load('http://github.com/datasets/finance-vix')

console.log(
  datahub.info.infoPackage(dataset),
  await datahub.info.infoResource(dataset.resources[0])
)

init

Init module is used to interactively create a new datapackage or update an existing one.

  1. init.init() scan files/directories in the current directory
  2. Asks user interactively about adding found files to the datapackage
  3. Generates/extends a datapackage.json file, ask user if it is correct
  4. save datapackage.json on the disk.

Example: save this code into init.js

const datahub = require('datahub-client');
datahub.init.init()

Run the snippet in the terminal:

node init.js

This process initializes a new datapackage.json file.
Once there is a datapackage.json file, you can still run `data init` to
update/extend it.
Press ^C at any time to quit.

? Enter Data Package name - Some-Package-Name
? Enter Data Package title - Some-Package-Title
? Do you want to scan following directory ".idea" - y/n? n
? Do you want to scan following directory "basic-csv" - y/n? y
? Do you want to add following file as a resource "comma.csv" - y/n? y
comma.csv is just added to resources
? Going to write to /home/user/data/datapackage.json:

{
  "name": "some-name",
  "title": "some-title",
  "resources": [
    {
      "path": "basic-csv/comma.csv",
<<<<<<<<< cut >>>>>>>>

Is that OK - y/n? y
datapackage.json file is saved in /home/user/data/datapackage.json

validate

This module contains Validator class, which checks:

  • the datapackage data is valid against the descriptor schema
  • the descriptor itself is correct.

Using:

const datahub = require('datahub-client')

const validator = new datahub.Validator({identifier: path_to_descriptor})
validator.validate().then(console.log)

If the datapackage is valid - the validator will return True Otherwise it will return an object with the information:

  • a TableSchemaError exception
  • a list of errors that was found
  • help info to find where the error is
{ TableSchemaError: There are 1 type and format mismatch errors (see error.errors') ...
  _errors:
   [ { TableSchemaError: The value "true" in column "boolean" is not type "date" and format "default" ... } ],
  rowNumber: 2,
  resource: 'comma',
  path: '/home/user/work/basic-csv/comma.csv' }

cat

This module allows you to read the tabular data from inside the data.js/File and to transform this data into a different formats.

Cat module has several writer functions, for different formats:

  • ascii, csv, md, xlsx, html

Each of the writers function convert the given source file into the stream with appropriate format.

The module exports the 'writers' object, that contains all this functions together:

writers = {
  ascii: dumpAscii,
   csv: dumpCsv,
   md: dumpMarkdown,
   xlsx: dumpXlsx,
   html: dumpHtml
}

Example of use:

const {writers} = require('datahub-client').cat
const data = require('data.js')

const resource = data.File.load('data.csv')

Promise.resolve().then(async ()=>{
  const stream = await writers.ascii(resource)
  stream.pipe(process.stdout)

  // or you can save the stream into a file:
  const writeStream = fs.createWriteStream('filename', {flags : 'w'})
  stream.pipe(writeStream)
})

Output for writers.ascii:

┌────────────────────────────────┬────────────────────────────────┬────────────────────────────────┐
│ number                         │ string                         │ boolean                        │
├────────────────────────────────┼────────────────────────────────┼────────────────────────────────┤
│ 1                              │ one                            │ true                           │
├────────────────────────────────┼────────────────────────────────┼────────────────────────────────┤
│ 2                              │ two                            │ false                          │
└────────────────────────────────┴────────────────────────────────┴────────────────────────────────┘

Output for writers.md:

| number | string | boolean |
| ------ | ------ | ------- |
| 1      | one    | true    |
| 2      | two    | false   |

CSV:

number,string,boolean
1,one,true
2,two,false

HTML:

<table class="table table-striped table-bordered">
<thead>
<th>number</th>
<th>string</th>
<th>boolean</th>
</thead>
<tbody>
<tr>
<td>1</td>
<td>one</td>
....................

XLSX: excel file

For developers

You need to have Node.js version >7.6

Install

$ npm install

Running tests

We use Ava for our tests. For running tests use:

$ [sudo] npm test

To run tests in watch mode:

$ [sudo] npm run watch:test

Lint

We use XO for checking our code for JS standard/convention/style:

# When you run tests, it first runs lint:
$ npm test

# To run lint separately:
$ npm run lint # shows errors only

# Fixing erros automatically:
$ xo --fix