npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@bearer/shape-hash

v0.0.7

Published

A PID-free, lightweight format, for saving and analyzing the behavior of APIs at scale.

Downloads

1,335

Readme

Shape Hash

A PID-free, lightweight format, for saving and analyzing the behavior of APIs at scale.

Usage

npm install shape-hash --save
const {toHash} = require('shape-hash')
const hash1 = toHash({firstName: 'Aidan', os: 'mac'})
const hash2 = toHash({firstName: 'Lisa', os: 'linux'})
const hash3 = toHash({firstName: 'Dev', os: ['mac', 'linux']})

console.log(hash1)
console.log(hash2)
console.log(hash3)

Live Demo

Specification

Abstract

The Shape Hasher produces the same hash for two similar JSON-like objects. The Shape Hash answers abstract questions about the shape of the data, but strips out the data making it safe to share with logging tools and persist for later analysis.

{
  "firstname": "Aidan",
  "homestate": "PA"
}
{
  "firstname": "Dev",
  "homestate": "New Jersey"
}

Produce the same hash: a163f39e56668850646b63d7d32a4e64a0ffd469

As they are not intended to be written manually by developers, Shape Hashes have a far simpler specification than JSON Schema. This makes them faster to generate from real data, compare to other shape hashes, and implement with identical behavior in any language.

Use Cases at Optic and Beyond

  • Chaos engineering / anomaly detection. Have the APIs I depend on changed their behavior? You see 1M requests a day with a distinct set of 12 Shape Hashes being used. Now you see 4 new Hashes and no traffic to two others you know our code works well with. Something changed, better look!
  • Analyzing API Behavior at Scale. JSON Schema and other schema languages require parallel traversal of the schema and data. Doing that 1M times requires you to store a huge amount of traffic (megabytes/gigabytes) and takes many minutes to compute. Alternatively, computing a shape hash from data is a single fast traversal of the data and then comparing it to known hashes is just string comparison. [Insert big data problem regarding APIs here]
  • Quickly documenting hundreds of APIs from traffic (what Optic does) without having to store/share any sensitive data

The Shape Hash Data Structure

A Shape Hash is encoded as a Protocol Buffer message. There are only two message types:

  • FieldDesciptor -- a named field, typically used in an object or map
  • ShapeDescriptor -- a description of a shape. Includes type (enum), fields (array) and items (array). The root of any Shape Hash must always be a ShapeDesriptor
syntax = "proto3";
package optic_shape_hash;

message Field {
  string key = 1;
  ShapeHash hash = 2;
}

message ShapeDescriptor {
  enum PrimitiveType {
    OBJECT = 0;
    ARRAY = 1;
    STRING = 2;
    NUMBER = 3;
    BOOLEAN = 4;
    NULL = 5;
  }
  PrimitiveType type = 1;
  repeated Field fields = 2;
  repeated ShapeHash items = 3;
}

This structure allows you to encode basic information about a Shape. It intentionally does not try to handle any kind of polymorphism like optionals or one-ofs. That information can only be inferred by looking at Shape Hashes in the aggregate and should not be a concern of the hasher itself which should be stateless.

Building the Shape Hash

Building a Shape Hash is as easy as traversing a JSON-like structure recursively and outputting a ShapeDescriptor Tree.

The input:

{
  "firstname": "Aidan",
  "homestate": "PA"
}

Is represented by:

 {
    "type": 0,
    "fields": [
        {
            "key": "firstname",
            "hash": {
                "type": 2,
                "fields": [],
                "items": []
            }
        },
        {
            "key": "homestate",
            "hash": {
                "type": 2,
                "fields": [],
                "items": []
            }
        }
    ],
    "items": []
}

There's a full reference implementation used in production by Optic in the repo here:

Encoding
  • All shapes hashes should be serialized to byte arrays using Protocol Buffers (above example becomes) 0800120f0a0966697273746e616d6512020802120f0a09686f6d65737461746512020802
  • These hashes are decodable using the original proto into the structure above. That's helpful for tools like Optic that want to be able to diff a Shape against an expected Shape or Schema
  • If you don't care about being able to decode and strickly want to compare the shapes to one another, we suggest putting the serialized bytes through a SHA-256 hasher to reduce the size of the message and further obscure data.