npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

dga-sync

v1.1.0

Published

Sync datasets from data.gov.au, the Australian government's open data website.

Downloads

17

Readme

dga-sync README

Sync data.gov.au datasets easily

The Australian government's data.gov.au website references a growing abundance of public and open data government data resources - more than 3700 datasets at the time of writing. While in some cases, data.gov.au provides an API to access a dataset, it doesn't always. For this reason and others, there are often advantages in downloading the data for local use or to re-package it. The dga-sync utility eases the task of synchronising that data to a local file system.

dga-sync uses the JSON metadata stored on data.gov.au for each dataset to ensure that data files are only downloaded if they are newer than what has previously been downloaded. A local copy of the metadata is also stored.

Getting started

npm install dga-sync

Simple usage

For each data.gov.au dataset, there is a JSON metadata file (accessed from the JSON button on the web page) that leads to a URL of the following form:

http://data.gov.au/api/3/action/package_show?id=23218e8f-babe-4e37-81d1-5424a4d1c568

Use the id parameter to identify the package to sync:

var sync = require('dga-sync');
sync.syncByPackageId('23218e8f-babe-4e37-81d1-5424a4d1c568');

This is what the console output looks like (actual output is colourised where supported):

fetching metadata for package ID: 23218e8f-babe-4e37-81d1-5424a4d1c568
found: "Public Barbeques"
reply lists 5 resources:
   barbeque.kmz "2014 Public Barbeques" @ 2014-09-16T02:05:54.523Z
   wfs?request=GetFeature&typeName=23218e8f_babe_4e37_81d1_5424a4d1c568&outputFormat=csv "Public Barbeques CSV" @ 2014-09-16T02:05:54.523Z
   wfs?request=GetFeature&typeName=23218e8f_babe_4e37_81d1_5424a4d1c568&outputFormat=json "Public Barbeques GeoJSON" @ 2014-09-16T02:05:54.523Z
   wms?request=GetCapabilities "Public Barbeques - Preview this Dataset (WMS)" @ 2014-09-16T02:05:54.523Z
   wfs?request=GetCapabilities "Public Barbeques Web Feature Service API Link" @ 2014-09-16T02:05:54.523Z
preparing to download barbeque.kmz
preparing to download wfs?request=GetFeature&typeName=23218e8f_babe_4e37_81d1_5424a4d1c568&outputFormat=csv
preparing to download wfs?request=GetFeature&typeName=23218e8f_babe_4e37_81d1_5424a4d1c568&outputFormat=json
preparing to download wms?request=GetCapabilities
preparing to download wfs?request=GetCapabilities
downloading completed
  .. moving data/._DGA_DOWNLOAD_barbeque.kmz to data/barbeque.kmz
  .. moving data/._DGA_DOWNLOAD_wfs?request=GetFeature&typeName=23218e8f_babe_4e37_81d1_5424a4d1c568&outputFormat=csv to data/wfs?request=GetFeature&typeName=23218e8f_babe_4e37_81d1_5424a4d1c568&outputFormat=csv
  .. moving data/._DGA_DOWNLOAD_wfs?request=GetFeature&typeName=23218e8f_babe_4e37_81d1_5424a4d1c568&outputFormat=json to data/wfs?request=GetFeature&typeName=23218e8f_babe_4e37_81d1_5424a4d1c568&outputFormat=json
  .. moving data/._DGA_DOWNLOAD_wms?request=GetCapabilities to data/wms?request=GetCapabilities
  .. moving data/._DGA_DOWNLOAD_wfs?request=GetCapabilities to data/wfs?request=GetCapabilities
writing download metadata to: data/._METADATA_.json

At this point, a directory called data under the current working directory will have been created and will contain the downloaded resources plus a metadata file created by dga-sync:

$ ls -lhA data
total 744K
-rw-r--r-- 1 sam sam  44K Sep 24 11:14 barbeque.kmz
-rw-r--r-- 1 sam sam 6.0K Sep 24 11:15 ._METADATA_.json
-rw-r--r-- 1 sam sam  72K Sep 24 11:15 wfs?request=GetCapabilities
-rw-r--r-- 1 sam sam  95K Sep 24 11:14 wfs?request=GetFeature&typeName=23218e8f_babe_4e37_81d1_5424a4d1c568&outputFormat=csv
-rw-r--r-- 1 sam sam 384K Sep 24 11:14 wfs?request=GetFeature&typeName=23218e8f_babe_4e37_81d1_5424a4d1c568&outputFormat=json
-rw-r--r-- 1 sam sam 139K Sep 24 11:14 wms?request=GetCapabilities

The metadata file will ensure that next time we check, only newer resources than we already have will be downloaded, saving on bandwidth.

Limiting what gets downloaded

As you can see from above, all resources are downloaded by default. This can be changed by adding an idFilter regex option. So if we only want the KMZ files in our example:

sync.syncByPackageId(
  '23218e8f-babe-4e37-81d1-5424a4d1c568',
  {
    idFilter: /.*\.kmz$/,
    deleteUnlisted: true
  }
);

The use of deleteUnlisted is optional - it tells dga-sync to delete previously downloaded files now excluded by the filter. The contents of data is now:

$ ls -lhA data
total 48K
-rw-r--r-- 1 sam sam  44K Sep 24 11:14 barbeque.kmz
-rw-r--r-- 1 sam sam 1.4K Sep 24 11:26 ._METADATA_.json

API

There is currently only one method:

syncByPackageId(packageId, options, andThen)

packageId - the ID of the package/dataset

options - an object with the following options:

  • idFieldName - specifies the field in a resource to use as the resource ID [default: 'url']

  • idCanonicaliser - a function that takes the resource ID (according to the idFieldName option) and creates a canonical ID for future comparison in later sync operations [default: split the ID at '/'s and use use the last part: this assumes that idFieldName is the default value of 'url']

  • idFilter - applied to the (canonicalised) resource ID to choose which resources will be synced [default: undefined - that is, accept all IDs]

  • dataDestination - the directory to store the downloaded resources in

  • deleteUnlisted - boolean: true means delete extraneous files in the destination directory that don't correspond to a resource IDs in the filtered list [default: false]

andThen(err) - optional callback, where err is any error encountered that prevented successful completion