npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@openactive/dataset-utils

v2.0.1

Published

Utilities for working with OpenActive data catalogs and dataset sites

Downloads

265

Readme

dataset-utils

@openactive/dataset-utils is a Node.js utility library designed to simplify the handling of OpenActive data catalogs and dataset sites. The library facilitates fetching, parsing, and manipulating data from various dataset URLs within a specified catalog, ensuring a seamless interaction with OpenActive data.

Features

  • Recursive Data Catalog Crawling: Methodically navigates through data catalogs, fetches datasets, and extracts JSON-LD from dataset HTML.
  • Data URL Retrieval: Efficiently retrieves an array of dataset site URLs from data catalogs and part collections.
  • Metadata Extraction: Extracts JSON-LD metadata from HTML dataset pages.

Installation

Install the package via npm:

npm install @openactive/dataset-utils

Usage

getAllDatasetSiteUrls(dataCatalogUrl)

Description

This is a recursive function that returns an array of dataset site URLs. If the URL supplied is a data catalog collection, it gets all the data catalogs in hasPart and crawls them. If the URL supplied is a data catalog, it gets the dataset array and flattens it.

Parameters

Returns

A Promise that resolves with an object containing:

  • catalogMetadata: A JSON-LD object of the root data catalog provided.
  • urls - An array of strings, each being a URL for a dataset.
  • errors - An array of error objects, each containing details about errors encountered during the retrieval process. If no errors were encountered, this array is empty. Each error object includes:
    • url: The URL from which data was being fetched when the error occurred.
    • status: HTTP status code of the error response (if available).
    • message: A descriptive message detailing the nature of the error.

Example

const { getAllDatasetSiteUrls } = require('@openactive/dataset-utils');

const { urls, errors } = await getAllDatasetSiteUrls();

console.log(`Retrieved ${urls.length} dataset URLs`);
if (errors.length > 0) {
  console.error(`${errors.length} errors encountered during retrieval:`);
  errors.forEach(error => {
    console.error(`- [${error.status}] ${error.url}: ${error.message}`);
  });
}

extractJSONLDfromHTML(url, html)

This function extracts JSON-LD metadata from a given Dataset Site html, using the provided url to resolve relative URLs within the JSON-LD.

Note that relative URLs are not generally permissible within OpenActive data, however the underlying JSON-LD library still requires that this be specified.

Parameters:

  • url: The URL used to resolve relative URLs in the HTML page.
  • html: HTML content from which JSON-LD data will be extracted.

Returns:

An object representing the extracted JSON-LD, or null if extraction fails.

Example:

const { extractJSONLDfromHTML } = require('@openactive/dataset-utils');

const jsonld = extractJSONLDfromHTML('https://example.com/dataset', '<html>...</html>');
console.log(jsonld);

getAllDatasets([dataCatalogUrl])

This function recursively crawls through a data catalog, fetches datasets, and extracts JSONLD from the dataset HTML. This combines getAllDatasetSiteUrls() and extractJSONLDfromHTML().

The errors array it returns will detail any issues that occurred during the process of fetching and extracting data from URLs. This can be large in number due to the fractured nature of maintainence of OpenActive feeds.

Parameters:

Returns:

A Promise that resolves with an object containing:

  • catalogMetadata: A JSON-LD object of the root data catalog provided.
  • datasets: An array of extracted JSON-LD objects from the Dataset Sites.
  • errors: An array of error objects indicating any issues encountered during fetching. Each error object includes:
    • url: The URL from which data was being fetched when the error occurred.
    • status: HTTP status code of the error response (if available).
    • message: A descriptive message detailing the nature of the error.

Example:

const { getAllDatasets } = require('@openactive/dataset-utils');

getAllDatasets().then(({ datasets, errors }) => {
  console.log(datasets);
  
  // Iterating through the errors
  errors.forEach(error => {
    console.log(`Error fetching URL: ${error.url}`);
    console.log(`HTTP Status Code: ${error.status}`);
    console.log(`Message: ${error.message}`);
  });
});

validateJsonLdId(id, expectHtml)

Description

This function validates the @id (or id, for backwards compatibility) property within a JSON-LD Dataset or DataCatalog. It fetches JSON-LD data from a specified URL, checks whether the data is embedded in HTML or raw JSON-LD, extracts the JSON-LD, and ensures that the @id field within the document matches the provided id. This function acts as a safety check, affirming that the expected identifier aligns exactly with the identifier found within the fetched JSON-LD document. Note that @id is case sensitive and must match exactly.

Parameters

  • id (string): A string that specifies the expected @id or id value in the JSON-LD document.
  • expectHtml (boolean): A boolean flag indicating whether the fetched data is expected to be embedded within HTML such as for a Dataset Site (when true), or expected to be raw JSON-LD such as for a Data Catalogue (when false).

Returns

A Promise that resolves with an object containing:

  • isValid - A boolean that is true if the validation is successful (the expected @id matches the found @id) and false otherwise.
  • error - A string describing the error encountered during the validation process or null if the validation is successful.

Usage

async function exampleUsage() {
  const id = "https://example.com/data.jsonld";
  const { isValid, error } = await validateJsonLdId(id, false);

  if (isValid) {
    console.log(`Validation successful for ID: ${id}`);
  } else {
    console.error(`Validation failed for ID: ${id}. Error: ${error}`);
  }
}

Testing

Execute test cases using:

npm test

The test suite, located in ./test/getAllDatasets-test.js, utilises mocks to simulate various use cases.

Contributions

We welcome your contributions! Feel free to submit a pull request.