dpm2
v0.4.0
Published
Like npm but for data packages!
Downloads
6
Readme
dpm2
Like npm but for data packages!
Usage:
##CLI
$ dpm --help
Usage: dpm <command> [options] where command is:
- cat <datapackage name>[@<version>]
- get <datapackage name>[@<version>] [-f, --force] [-c, --cache]
- clone <datapackage name>[@<version>] [-f, --force]
- install <datapackage name 1>[@<version>] <datapackage name 2>[@<version>] ... [-c, --cache] [-s, --save] [-f, --force]
- publish
- unpublish <datapackage name>[@<version>]
- adduser
- owner <subcommand> where subcommand is:
- ls <datapackage name>
- add <user> <datapackage name>
- rm <user> <datapackage name>[@<version>]
- search [search terms]
Publishing and getting data packages
Given a data package:
$ cat package.json
{
"name": "mydpkg",
"description": "my datapackage",
"version": "0.0.0",
"keywords": ["test", "datapackage"],
"resources": [
{
"name": "inline",
"schema": { "fields": [ {"name": "a", "type": "string"}, {"name": "b", "type": "integer"}, {"name": "c", "type": "number"} ] },
"data": [ {"a": "a", "b": 1, "c": 1.2}, {"a": "x", "b": 2, "c": 2.3}, {"a": "y", "b": 3, "c": 3.4} ]
},
{
"name": "csv1",
"format": "csv",
"schema": { "fields": [ {"name": "a", "type": "integer"}, {"name": "b", "type": "integer"} ] },
"path": "x1.csv"
},
{
"name": "csv2",
"format": "csv",
"schema": { "fields": [ {"name": "c", "type": "integer"}, {"name": "d", "type": "integer"} ] },
"path": "x2.csv"
}
]
}
stored on the disk as
$ tree
.
├── package.json
├── scripts
│ └── test.r
├── x1.csv
└── x2.csv
we can:
$ dpm publish
dpm http PUT http://registry.standardanalytics.io/mydpkg/0.0.0
dpm http 201 http://registry.standardanalytics.io/mydpkg/0.0.0
+ [email protected]
and reclone it:
$ dpm clone mydpkg
dpm http GET http://registry.standardanalytics.io/mydpkg?clone=true
dpm http 200 http://registry.standardanalytics.io/mydpkg?clone=true
dpm http GET http://registry.standardanalytics.io/mydpkg/0.0.0/debug
dpm http 200 http://registry.standardanalytics.io/mydpkg/0.0.0/debug
dpm http GET http://registry.standardanalytics.io/mydpkg/0.0.0/csv1
dpm http GET http://registry.standardanalytics.io/mydpkg/0.0.0/csv2
dpm http 200 http://registry.standardanalytics.io/mydpkg/0.0.0/csv1
dpm http 200 http://registry.standardanalytics.io/mydpkg/0.0.0/csv2
.
└─┬ mydpkg
├── package.json
├─┬ scripts
│ └── test.r
├── x1.csv
└── x2.csv
But to save space or maybe because you just need 1 resource, you can also simply ask to get a package.json where all the resource data have been replaced by and URL.
$ dpm get mydpkg
dpm http GET http://registry.standardanalytics.io/mydpkg
dpm http 200 http://registry.standardanalytics.io/mydpkg
.
└─┬ mydpkg
└── package.json
For instance (using jsontool)
$ cat mydpkg/package.json | json resources | json -c 'this.name === "csv1"' | json 0.url
returns:
http://registry.standardanalytics.io/mydpkg/0.0.0/csv1
Then you can consume the resources you want with the module data-streams.
On the opposite, you can also cache all the resources data (including external URLs) in a standard directory structure, available for all the data packages stored on the registry.
$ dpm get mydpkg --cache
dpm http GET http://registry.standardanalytics.io/mydpkg
dpm http 200 http://registry.standardanalytics.io/mydpkg
dpm http GET http://registry.standardanalytics.io/mydpkg/0.0.0/inline
dpm http GET http://registry.standardanalytics.io/mydpkg/0.0.0/csv2
dpm http GET http://registry.standardanalytics.io/mydpkg/0.0.0/csv1
dpm http 200 http://registry.standardanalytics.io/mydpkg/0.0.0/inline
dpm http 200 http://registry.standardanalytics.io/mydpkg/0.0.0/csv1
dpm http 200 http://registry.standardanalytics.io/mydpkg/0.0.0/csv2
.
└─┬ mydpkg
├── package.json
└─┬ data
├── inline.json
├── csv1.csv
└── csv2.csv
Each resources of package.json now have a path
property. For instance
$ cat mydpkg/package.json | json resources | json -c 'this.name === "csv1"' | json 0.path
returns
data/csv1.csv
Installing data packages as dependencies of your project
Given a package.json with
{
"name": "test",
"version": "0.0.0",
"dataDependencies": {
"mydpkg": "0.0.0"
}
}
one can run
$ dpm install
dpm http GET http://registry.standardanalytics.io/versions/mydpkg
dpm http 200 http://registry.standardanalytics.io/versions/mydpkg
dpm http GET http://registry.standardanalytics.io/mydpkg/0.0.0
dpm http 200 http://registry.standardanalytics.io/mydpkg/0.0.0
.
├── data_modules
└─┬ mydpkg
└── package.json
Combined with the --cache option, you get:
$ dpm install --cache
dpm http GET http://registry.standardanalytics.io/versions/mydpkg
dpm http 200 http://registry.standardanalytics.io/versions/mydpkg
dpm http GET http://registry.standardanalytics.io/mydpkg/0.0.0
dpm http 200 http://registry.standardanalytics.io/mydpkg/0.0.0
dpm http GET http://registry.standardanalytics.io/mydpkg/0.0.0/inline
dpm http GET http://registry.standardanalytics.io/mydpkg/0.0.0/csv2
dpm http GET http://registry.standardanalytics.io/mydpkg/0.0.0/csv1
dpm http 200 http://registry.standardanalytics.io/mydpkg/0.0.0/inline
dpm http 200 http://registry.standardanalytics.io/mydpkg/0.0.0/csv1
dpm http 200 http://registry.standardanalytics.io/mydpkg/0.0.0/csv2
.
├── data_modules
└─┬ mydpkg
├── package.json
└─┬ data
├── inline.json
├── csv1.csv
└── csv2.csv
dpm
aims to bring all the goodness of the
npm workflow for your data needs. Run dpm
--help
to see the available options.
Using dpm programaticaly
You can also use dpm
programaticaly.
var Dpm = require('dpm2);
var dpm = new Dpm(conf);
See bin/dpm
for examples.
Using dpm with npm
dpm
use the dataDependencies
property of
package.json
and store the dependencies in a data_modules/
directory so it can be used safely, without conflict as a
post-install script of
npm.
Registry
By default, dpm
uses our CouchDB powered
data registry
hosted on cloudant.
Why dpm2 ?
There is already a dpm
being developed here but it leverages
npm
and the npm registry.
License
MIT