npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

dcat

v0.1.4

Published

Archive and make discoverable data and links with schema.org metadata.

Downloads

18

Readme

dcat

Archive and make discoverable data and links with schema.org metadata.

NPM

Usage (CLI)

tl;dr

dcat --help

Registering an User (adduser)

Run

dcat adduser

and follow the prompting wizard.

Publishing (publish)

Simple document

dcat allows to publish JSON-LD documents using dcat.io context. This context extends schema.org with terms relevant to do I/O and preserve data integrity (like filepath and Checksum).

A minimum document has to contain

  • a context (@context), set to https://dcat.io,
  • an id (@id) used to uniquely identify things published on dcat.io with URLs. All relative URLs will be resolve with a base (defined in the context (@base)) of https://dcat.io

e.g:

{
  "@context": "https://dcat.io",
  "@id": "mydoc"
}

To publish this document, create a file named JSONLD and, in the directory containing it run:

dcat publish

After publication the document will be available at https://dcat.io/mydoc.

Documents can contains any properties from schema.org or from any other ontologies as long as the associated @context are provided.

Versioning

If a version property is specified in the document, the document will be versionned, that is each update will require a new version value to be published (preventing to overwrite existing versions).

When appropriate version number SHOULD follow semantic versionning

e.g:

{
  "@context": "https://dcat.io",
  "@id": "mydoc",
  "version": "0.0.1"
}

After publication this document will be available at https://dcat.io/mydoc?version=0.0.1 whereas the latest version will always be available at https://dcat.io/mydoc.

In case the document is versionned following Semantic Versioning, a range (e.g <0.0.1) can be specified as version (e.g. https://dcat.io/mydoc?version=<0.0.1)

Nodes

Document can be arbitrarily complex (having multiple nodes) and sometimes, it makes sense to want to assign an URL to a node so that it can be referred. This is achieved by setting @id properties to the desired nodes e.g:

{
  "@context": "https://dcat.io",
  "@id": "mydoc",
  "version": "0.0.1",
  "hasPart": {
    "@id": "mydoc/data",
    "@type": "Dataset",
    "description": "a dataset part of the document"
  }
}

The whole document can be retrieved at https://dcat.io/mydoc whereas the part can be retrieved at https://dcat.io/mydoc/data

Note: nodes can be any valid URLs but they have to be namespaced within the top level @id (for a document of ""@id": "mydoc"", "@id": "mydoc/arbitrarily/long/pathname" will be valid whereas "@id": "part" won't).

Adding metadata to existing URLs

dcat can be used to add machine readable metadata to any resources already published on the web. For instance running:

dcat init https://github.com/standard-analytics/dcat.git

we get a basic machine readable document:

{
  "@context": "https://dcat.io",
  "@id": "mydoc",
  "@type": "Code",
  "codeRepository": "https://github.com/standard-analytics/dcat",
  "encoding":  {
    "@type": "MediaObject",
     "contentUrl": "https://api.github.com/repos/standard-analytics/dcat/tarball/master",
     "encodingFormat": "application/x-gzip",
     "contentSize": 690980
   }
}

This document should be extended with more properties (from schema.org (such as author, contributor, about, programmingLanguage, runtime... ) or any other web ontologies (taking care to add contexts in the latter case)) to improve the discoverability and reusability of the resource.

Note, in addition to absolute URLs, dcat supports CURIE for the prefixes defined in the dcat.io @context. Using a CURIE, the previous is simplified to:

dcat init github:standard-analytics/dcat.git

Files (raw data)

For all the subclasses of schema.org/CreativeWork (e.g Dataset, Code, SoftwareApplication, Article, Book, ImageObject, VideoObject, AudioObject, ...) dcat allows to publish raw data from files (dataset, binaries, images, media...) along with documents.

For instance if you have an a PDF of MedicalScholarlyArticle and an associated Dataset in CSV you can run:

dcat init --main article.pdf::MedicalScholarlyArticle --part data.csv

Note: ::MedicalScholarlyArticle allows to associate a type (@type) with the resource (article.pdf).

This will generate a machine readable document (JSONLD) that you can edit to provide additional metadata.

{
  "@context": "https://dcat.io",
  "@id": "mydoc",
  "@type": "MedicalScholarlyArticle",
  "encoding": {
    "@type": "MediaObject",
    "filePath": "article.pdf"
  },
  "hasPart": {
    "@type": "Dataset",
    "distribution": {
      "@type": "DataDownload",
      "filePath": "data.csv"
    }
  }
}

After publication (dcat publish) the document will acquire additional URLs properties that can be dereferenced to retrieved the original raw data:

{
  "@context": "https://dcat.io",
  "@id": "mydoc",
  "@type": "MedicalScholarlyArticle",
  "encoding": {
    "@type": "MediaObject",
    "filePath": "article.pdf",
    "contentUrl": "http://example.com/article.pdf" //generated URL
  },
  "hasPart": {
    "@type": "Dataset",
    "distribution": {
      "@type": "DataDownload",
      "filePath": "data.csv",
      "contentUrl": "http://example.com/data.csv" //generated URL
    }
  }
}

Note: dcat init supports globbing so you can run commands like:

dcat init --main article.pdf --part *.csv

or repeat --part (or the shorter -p) if you need more complex matching e.g:

dcat init --m article.pdf -p *.csv -p *.jpg

TODO describe directories

Unpublishing (unpublish)

To delete a specific version of a document of ```"@id": "mydoc"`` run:

dcat unpublish ldr:mydoc?version=0.1.1

ldr is the prefix used for https://dcat.io (defined in the dcat.io @context).

To delete all versions of a document of "@id": "mydoc" run:

dcat unpublish ldr:mydoc

Retrieving documents and raw data (search, show, clone)

Search

Document containing keywords, name or description properties can be searched by keyword with dcat search followed by a list of keywords.

For more powerful search, all data published on dcat.io are valid linked data fragments and can be queried using SPARQL.

Show (expanded, compacted, flattened, normalized )

dcat show followed by a CURIE allows to display on stdout the latest JSON-LD document corresponding to the CURIE.

Different options (-e, --expand, -f, --flatten, -c, --compact, -n, --normalize) allow to have different representation of the document. For instance,

dcat show ldr:mydoc?version=<2.1.0 --normalize

will serialize the latest version smaller than 2.1.0 of the document of "@id": "mydoc" to N-Quads (RDF).

Clone

dcat clone followed by a CURIE allows to download the raw data associated with a document and store them along with the document on disk at the paths specified by the filepath properties.

Listing / Adding / Removing maintainers (maintainer)

Only maintainers of a document can publish or remove versions of a document. Maintainers of a document can be listed with:

dcat maintainer ls <CURIE>

Maintainers can give users maintainer rights by running:

dcat maintainer add <user CURIE> <doc CURIE>

Note: all user of dcat.io of a CURI of ldr:users/{username}

Maintainers can remove maintainer rights by running:

dcat maintainer rm <user CURIE> <doc CURIE>

API

dcat can also be used programmatically.

var Dcat = require('dcat');
var dcat = new Dcat();

var doc = {
  '@context': 'https://dcat.io,
  '@id': 'test',
  name: 'hello world'
};

dcat.publish(doc, function(err, cdoc){
  console.log(err, cdoc); //cdoc is compacted
});

See test/test.js for more examples.

History

package.json -> datapackage.json -> package.jsonld -> JSON-LD + schema.org + hydra + linked data fragment.

Registry

By default, dcat uses dcat.io linked data registry hosted on cloudant.

Tests

You need a local instance of the linked data registry running on your machine on port 3000. Then, run:

npm test

License

Apache-2.0.