npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

gather-cli

v0.2.6

Published

Merge JSON files, with a twist: optionally add metadata from the filename or the files' stats to each dataset.

Downloads

11

Readme

Gather

Build Status

Gather is a command-line tool that merges JSON files, with a twist: gather can optionally add metadata from the filename or the file's stats to each dataset. Because sometimes filenames are just meaningless descriptors, but often they're not.

Install with NPM (bundled with node.js):

npm install gather-cli -g

Examples

Combine all of last month's analytics data into a single file, without losing track of when those analytics were recorded:

gather 'analytics/{date}.json' > metrics.json

Convert your Markdown blogposts with YAML frontmatter into JSON, bundle them together with Gather and then render them:

yaml2json posts \
    --output posts \
    --prose \
    --convert markdown
gather 'posts/{year}-{month}-{day}-{permalink}.json' \
    --annotate \
    --output posts/all.json
render post.jade
    --input posts/all.json \
    --output 'build/{year}/{permalink}.html' \
    --many

Reorganize your data with a gather-and-groupby one-two punch:

gather 'staff/{department}/{username}.json' | \
groupby 'staff/{office}/{firstName}-{lastName}.json' --unique

Path metadata

By default, filled-in filename placeholders will get added to the data.

With this gather command...

gather 'analytics/{date}.json' > metrics.json

... the resulting metrics.json file will contain a date key

[
    {
        "date": "2014-10-01", 
        ...
    }, 
    {
        "date": "2014-10-02", 
        ...
    }, 
    ...
]

File metadata

File metadata includes:

  • an extended JSON representation of the file's created, modified and accessed date
  • if the file path contains {year}, {month} and {day} placeholders, a date inferred from these variables in the same extended JSON format
  • the file's absolute and relative path, basename and extension

While path metadata is enabled by default, file metadata is not. Use the --annotate flag to enable file metadata.

Here's an example of file metadata:

{
    "origin": {
        "relative": "...", 
        "absolute": "...", 
        "basename": "...", 
        "extension": "..."
    }, 
    "date": {
        "accessed": {
            "iso": "...", 
            "year": ..., 
            "month": ..., 
            "day": ...,
            ...
        }, 
        "modified": ..., 
        "created": ..., 
        "inferred": ...
    }
}

Compact, underscored and extended metadata naming schemes

Metadata from the filename or from the file's stats can conflict with keys already present in the data. If you are concerned about naming clashes, there are two ways to avoid this:

  • ask gather to either underscore any metadata with the --scheme underscored option
  • put the original data under data and metadata under metadata with --scheme extended, as opposed to merging those in at the root.

An example of the extended naming scheme:

{
    "origin": "file path, extension et cetera", 
    "date": "created, modified, accessed and inferred dates", 
    "metadata": "metadata extracted from path placeholders", 
    "data": "the original data"
}

Partial rebuilds

When adding additional metadata using the --annotate option, the origin of each piece of data that makes up the merged dataset will be a part of the output. This metadata makes it possible, on subsequent gathering operations, to only update or remove data that has changed rather than redoing the entire merge from scratch.

For example, you've added a new staff member at /staff/smith.json and would like to update the staff.json file which contains thousands of staff members. For every staff member in /staff, gather will first try to see if it can't get up-to-date information from the existing staff.json file. Only for smith.json it can't, so only the smith.json will need to be loaded and parsed from disk.

Especially when merging thousands of files, these partial rebuilds dramatically speed up gathering operations. Because the caching mechanism is generally safe (it will never use stale data, it will remove data for files that are no longer there, et cetera) it is enabled by default.

Nevertheless, it is possible to disable partial rebuilds: use --force to force a full redo of the merge. Alternatively, just rm the output file before using gather.

Use from node.js

var gather = require('gather-cli');
var source = 'examples/staff';
var options = {
    "extended": true, 
    "scheme": "underscored"
}
gather(source, options, function(err, staffMembers) {
    staffMembers.forEach(function(staff){
        console.log(staff.name);
    });
});