npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

data-prism

v0.0.30

Published

Data prism is a library with working with data graphs. It uses a user-provided schema to power various graph operations include fetching, creating, updating, deleting and querying resources.

Downloads

297

Readme

Data Prism

Data prism is a library with working with data graphs. It uses a user-provided schema to power various graph operations include fetching, creating, updating, deleting and querying resources.

Schema

The schema is a document of vital information that the library makes extensive use of. A well thought out and written schema is worth its weight in gold.

Schema Structure

{
  resources: {
    [resourceType]: {
      idAttribute?: "id",
      attributes: {
        [attr1]: { type: "boolean" },
        [attr2]: { type: "string", pattern: "^someregex.*$" }
      },
      relationships: {
        [rel1]: {
          type: "otherResourceType",
          cardinality: "many",
          inverse: "referenceBackToThisResource"
        }
      }
    }
  }
}

Data Types

There are various data types that determine how data is cast and validated. Some of these data types have subtypes that provide further precision.

Basic Types

  • array
  • boolean
  • integer
  • null
  • number
  • object

Additional Types (see AJV Guide)

  • date
  • time
  • date-time
  • duration - RFC3339
  • uri
  • uri-reference
  • url - (deprecated)
  • email
  • hostname
  • ipv4
  • ipv6
  • regex
  • uuid - RFC4122
  • json-pointer - RFC6901
  • relative-json-pointer - according to this draft

GeoJSON Types

  • geojson - including subtypes:
    • point
    • line-string
    • polygon
    • multi-point
    • multi-line-string
    • multi-polygon
    • geometry-collection
    • feature
    • feature-collection

This borrows a great deal from AJV formats. However, it is important to note that rather than being formats, the options are elevated to actual types. Data Prism will do the work of transforming the Data Prism schema into a JSON Schema.

Other JSON Schema properties are available to be added to any attribute definition.

Graph Queries

This library exposes the ability to query graphs to receive result trees using a robust query language. In addition it provides a suite of utility functions to help wrangle data into appropriate formats and interact with data structures effectively. See helper functions.

This document focuses on constructing queries as this is the most common and use case for the library that requires a fair bit of explaination.

Resource Data

Resource data is a representation of the graph of data to be queried on. It should be presented in normal form, which looks like this:

{
  [resourceType]: {
    [resourceId]: {
      attributes: {
        attr1: "value1",
        attr2: "value2"
      },
      relationships: {
        relationship1: { type: "other_resource", id: "1234" },
      [relationship2]: [
          { type: "bar_resource", id: 1 },
          { type: "bar_resource", id: 2 }
        ]
      }
    }
  }
}

Some effort may be required to get resources into this form, but it is designed to be as straightforward as possible. Having structure like this allows the query engine to make good assumptions about the data and allows it to execute many of the more powerful query features.

Queries

Queries are what make the library useful. They aim to match the format of data you want as the output for your tree as best as they can. There are many types of things you can do within a query. Here's a small example first:

{
	"type": "resource type",
	"select": ["attribute1", "attribute2"]
}

Here's the overall structure. Notice that it can be represented in JSON. Also, try not to get overwhelmed with the number of things going on.

{
	"type": "[resource type]",
	"id": "[resource id]",
	"select": [
		"attribute1",
		"relationship ref1",
		{
			"relationship 2": {
				"subquery": "goes here"
			},
			"some sum": { "$sum": "numeric field" }
		}
	],
	"where": {
		"some": "criterion"
	},
	"order": [{ "some field": "asc" }],
	"limit": 5,
	"offset": 3
}

There's are a lot of options and power in there. Let's try to break it down across the top level keys first:

  • type indicates the type of the query. It's required at the root, but not in subqueries.
  • id gets a single resource with an ID. It's optional, and can't be used in subqueries.
  • select is a required field that instructs the engine what fields it is to return. There are a few types of things it can do, but we'll return to those in a moment.
  • where adds filters to the data that comes back. It's optional.
  • order sorts the resources, using one or more fields. It's also optional.
  • limit and offset take a subset of results. They can be useful for pagination and such. These are also optional.

The guts and focus of most queries are going to be on what gets selected. We'll start there with the different types of things.

type

The type of an attribute determines which type of resource is being queried at the root level. It's required.

id

An id attribute targets the query to a specific resource. With it, you'll get a single resource; without it, you'll get a collection of resources. If the ID isn't found in the graph, you'll get a result of null.

select

select can be either an array or an object.

If it's an array, its members should be strings of attributes (or relationship refs) to get, or an object that adds additional fields select fields.

If it's an object, the object can be of one of three types:

  • A string, in which case that attribute or relationship ref will be returned (and possibly be renamed).
  • A subquery, where a relationship will be traversed.
  • An expression, which processes the resource's data in some way (we'll come back to these much later as they can safely be ignored).

A couple of examples:

Select an Attribute
{ "type": "teams", "select": ["name"] }

Might return:

[{ "name": "Arizona Bay FC" }, { "name": "Scottsdale Surf" }]

In this example we select the name from each team in our resources.

Rename an Attribute
{ "type": "teams", "select": { "nombre": "name" } }

Might return:

[{ "nombre": "Arizona Bay FC" }, { "nombre": "Scottsdale Surf" }]

Here we rename the "name" attribute to "nombre". You may have noticed that { "select": ["name"] } is equivalent to { "select": { "name": "name" } }.

Run a Subquery
{
  "type": "teams",
  "id": 1,
  "select": [
    "name",
    "matches": {
      "select": "field"
    }
  ]
}

Might return:

{
	"name": "Arizona Bay FC",
	"matches": [{ "field": "Phoenix Park 1" }, { "field": "Mesa Elementary B" }]
}

Here we add an id key, meaning that we'll get a single resource back. Additionally, we've reached into one of its relationships and run a query there. The type of the subquery can be inferred from what's in the parent resource's relationships. Presumably we'd see something like this for the Arizona Bay resource:

{
	"attribute": {
		"name": "Arizona Bay FC"
	},
	"relationships": {
		"matches": [
			{ "type": "matches", "id": 1 },
			{ "type": "matches", "id": 2 }
		]
	}
}

This is one reason why the normal form for resources is important: we can traverse the resource to elsewhere in the graph.

Conclusion

We've seen the basics of querying. Expressions will be discussed later, but any data can be fetched without them. Hopefully you've noticed that the results of the queries closely line up with what's in the select field, including the nested subqueries. For more examples, you can check out the test suite.

where

The where property allows you to filter the result based on either properties, expressions, or property expressions. We'll leave the full expressions for later, but touch on the property expressions a little bit here because they're an integral part of some results and hopefully don't introduce too much complexity.

Equality
{
	"type": "matches",
	"select": ["field", "ageGroup"],
	"where": {
		"field": "Phoenix Park 1"
	}
}

Might give us:

[
	{ "field": "Phoenix Park 1", "ageGroup": 11 },
	{ "field": "Phoenix Park 1", "ageGroup": 14 }
]

The where clause has whittled the results down to just the matches with the correct field name.

Numeric Comparison
{
	"type": "matches",
	"select": ["field", "ageGroup"],
	"where": {
		"ageGroup": { "$gt": 11 }
	}
}

Might give us:

[
  { "field": "Mesa HS", "ageGroup": 17 }
  { "field": "Phoenix Park 1", "ageGroup": 14 }
]

{ "$gt": 11 } is an expression that does a "greater than" comparison for its filtering. I'll document these at some point.

order

The order clause sorts results. It takes an array of field/direction pairs and sorts by them in order. If the first sorting is equal the second is applied, etc.

{
	"order": [{ "ageGroup": "desc" }, { "field": "asc" }]
}

limit and offset

These two properties work in tandem to reduce a list of results to a particular size. [1, 2, 3, 4] with limit 2, offset 1 would be [2, 3] for example. This pattern is well documented within the SQL world.