@openactive/dataset-utils
v2.0.1
Published
Utilities for working with OpenActive data catalogs and dataset sites
Downloads
265
Readme
dataset-utils
@openactive/dataset-utils
is a Node.js utility library designed to simplify the handling of OpenActive data catalogs and dataset sites. The library facilitates fetching, parsing, and manipulating data from various dataset URLs within a specified catalog, ensuring a seamless interaction with OpenActive data.
Features
- Recursive Data Catalog Crawling: Methodically navigates through data catalogs, fetches datasets, and extracts JSON-LD from dataset HTML.
- Data URL Retrieval: Efficiently retrieves an array of dataset site URLs from data catalogs and part collections.
- Metadata Extraction: Extracts JSON-LD metadata from HTML dataset pages.
Installation
Install the package via npm:
npm install @openactive/dataset-utils
Usage
getAllDatasetSiteUrls(dataCatalogUrl)
Description
This is a recursive function that returns an array of dataset site URLs.
If the URL supplied is a data catalog collection, it gets all the data catalogs in hasPart
and crawls them.
If the URL supplied is a data catalog, it gets the dataset
array and flattens it.
Parameters
dataCatalogUrl
(optional): A custom data catalog URL. Defaults to the OpenActive Data Catalog.
Returns
A Promise
that resolves with an object containing:
catalogMetadata
: A JSON-LD object of the root data catalog provided.urls
- An array of strings, each being a URL for a dataset.errors
- An array of error objects, each containing details about errors encountered during the retrieval process. If no errors were encountered, this array is empty. Each error object includes:url
: The URL from which data was being fetched when the error occurred.status
: HTTP status code of the error response (if available).message
: A descriptive message detailing the nature of the error.
Example
const { getAllDatasetSiteUrls } = require('@openactive/dataset-utils');
const { urls, errors } = await getAllDatasetSiteUrls();
console.log(`Retrieved ${urls.length} dataset URLs`);
if (errors.length > 0) {
console.error(`${errors.length} errors encountered during retrieval:`);
errors.forEach(error => {
console.error(`- [${error.status}] ${error.url}: ${error.message}`);
});
}
extractJSONLDfromHTML(url, html)
This function extracts JSON-LD metadata from a given Dataset Site html
, using the provided url
to resolve relative URLs within the JSON-LD.
Note that relative URLs are not generally permissible within OpenActive data, however the underlying JSON-LD library still requires that this be specified.
Parameters:
url
: The URL used to resolve relative URLs in the HTML page.html
: HTML content from which JSON-LD data will be extracted.
Returns:
An object representing the extracted JSON-LD, or null
if extraction fails.
Example:
const { extractJSONLDfromHTML } = require('@openactive/dataset-utils');
const jsonld = extractJSONLDfromHTML('https://example.com/dataset', '<html>...</html>');
console.log(jsonld);
getAllDatasets([dataCatalogUrl])
This function recursively crawls through a data catalog, fetches datasets, and extracts JSONLD from the dataset HTML. This combines getAllDatasetSiteUrls()
and extractJSONLDfromHTML()
.
The errors
array it returns will detail any issues that occurred during the process of fetching and extracting data from URLs. This can be large in number due to the fractured nature of maintainence of OpenActive feeds.
Parameters:
dataCatalogUrl
(optional): A custom data catalog URL. Defaults to the OpenActive Data Catalog.
Returns:
A Promise
that resolves with an object containing:
catalogMetadata
: A JSON-LD object of the root data catalog provided.datasets
: An array of extracted JSON-LD objects from the Dataset Sites.errors
: An array of error objects indicating any issues encountered during fetching. Each error object includes:url
: The URL from which data was being fetched when the error occurred.status
: HTTP status code of the error response (if available).message
: A descriptive message detailing the nature of the error.
Example:
const { getAllDatasets } = require('@openactive/dataset-utils');
getAllDatasets().then(({ datasets, errors }) => {
console.log(datasets);
// Iterating through the errors
errors.forEach(error => {
console.log(`Error fetching URL: ${error.url}`);
console.log(`HTTP Status Code: ${error.status}`);
console.log(`Message: ${error.message}`);
});
});
validateJsonLdId(id, expectHtml)
Description
This function validates the @id
(or id
, for backwards compatibility) property within a JSON-LD Dataset
or DataCatalog
. It fetches JSON-LD data from a specified URL, checks whether the data is embedded in HTML or raw JSON-LD, extracts the JSON-LD, and ensures that the @id
field within the document matches the provided id
. This function acts as a safety check, affirming that the expected identifier aligns exactly with the identifier found within the fetched JSON-LD document. Note that @id
is case sensitive and must match exactly.
Parameters
id
(string): A string that specifies the expected@id
orid
value in the JSON-LD document.expectHtml
(boolean): A boolean flag indicating whether the fetched data is expected to be embedded within HTML such as for a Dataset Site (whentrue
), or expected to be raw JSON-LD such as for a Data Catalogue (whenfalse
).
Returns
A Promise
that resolves with an object containing:
isValid
- A boolean that istrue
if the validation is successful (the expected@id
matches the found@id
) andfalse
otherwise.error
- A string describing the error encountered during the validation process ornull
if the validation is successful.
Usage
async function exampleUsage() {
const id = "https://example.com/data.jsonld";
const { isValid, error } = await validateJsonLdId(id, false);
if (isValid) {
console.log(`Validation successful for ID: ${id}`);
} else {
console.error(`Validation failed for ID: ${id}. Error: ${error}`);
}
}
Testing
Execute test cases using:
npm test
The test suite, located in ./test/getAllDatasets-test.js
, utilises mocks to simulate various use cases.
Contributions
We welcome your contributions! Feel free to submit a pull request.