npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@netwerk-digitaal-erfgoed/ld-workbench

v2.8.0

Published

LDWorkbench is a Linked Data Transformation tool designed to use only SPARQL as transformation language.

Downloads

41

Readme

LD Workbench

LD Workbench is a command-line tool for transforming large RDF datasets using pure SPARQL.

[!NOTE] Although LD Workbench is stable, we consider it a proof of concept. Please use the software and report any issues you encounter.

Approach

Components

Users define LD Workbench pipelines. An LD Workbench pipeline reads data from SPARQL endpoints, transforms it using SPARQL queries, and writes the result to a file or triple store.

A pipeline consists of one or more stages. Each stage has:

  • an iterator, which selects URIs from a dataset using a paginated SPARQL SELECT query, binding each URI to a $this variable
  • one or more generators, which generate triples about each URI using SPARQL CONSTRUCT queries.

Stages can be chained together, with the output of one stage becoming the input of the next. The output of each stage combined becomes the final output of the pipeline.

Design principles

The main design principes are scalability and extensibility.

LD Workbench is scalable due to its iterator/generator approach, which separates the selection of URIs from the generation of triples.

LD Workbench is extensible because it uses pure SPARQL queries (instead of code or a DSL) for configuring transformation pipelines. The SPARQL query language is a widely supported W3C standard, so users will not be locked into a proprietary tool or technology.

Usage

To get started with LD Workbench, you can either use the NPM package or a Docker image.

To use the NPM package, install Node.js, then run:

npx @netwerk-digitaal-erfgoed/ld-workbench@latest --init

Alternatively, to run the Docker image, first create a directory to store your pipeline configurations, then run the Docker image (mounting the pipelines/ directory with -v, using -it for an interactive and colorful console):

mkdir pipelines
docker run -it -v $(pwd)/pipelines:/pipelines ghcr.io/netwerk-digitaal-erfgoed/ld-workbench:latest

This creates an example LD Workbench pipeline in the pipelines/configurations/example directory and runs that pipeline right away. The output is written to pipelines/data.

To run the pipeline again:

npx @netwerk-digitaal-erfgoed/ld-workbench@latest

Your workbench is now ready for use. You can continue by creating your own pipeline configurations.

Configuration

An LD Workbench pipeline is defined with a YAML configuration file, validated by a JSON Schema.

A pipeline must have a name, one or more stages, and optionally a description. Multiple pipelines can be configured as long as they have unique names. See the example configuration file for a boilerplate configuration file. You can find more examples in the ld-workbench-configuration repository.

Iterator

Each stage has a single iterator. The iterator selects URIs from a dataset that match certain criteria. The iterator SPARQL SELECT query must return a $this binding for each URI that will be passed to the generator(s).

The query can be specified either inline:

# config.yml
stages:
  - name: Stage1
    iterator:
      query: "SELECT $this WHERE { $this a <https://schema.org/Thing> }"

or by referencing a file:

# config.yml
stages:
  - name: Stage1
    iterator:
      query: file://iterator.rq
# iterator.rq
prefix schema: <https://schema.org/>

select $this where {
  $this a schema:Thing .
}

[!TIP] LD Workbench paginates iterator queries (using SPARQL LIMIT/OFFSET) to support large datasets. However, a large OFFSET can be slow on SPARQL endpoints. Therefore, prefer creating multiple stages to process subsets (for example each RDF type separately) over processing the entire dataset in a single stage.

Generator

A stage has one or more generators, which are run for each individual URI from the iterator. A SPARQL CONSTRUCT query takes a $this binding from the iterator and generates triples about it.

Just as with the iterator query, the query can be specified either inline or by referencing a file:

# config.yml
stages:
  - name: Stage1
    generator:
      - query: "CONSTRUCT { $this a <https://schema.org/CreativeWork> } WHERE { $this a <https://schema.org/Book> }"

Stores

To query large local files, you may need to load them into a SPARQL store first. Do so by starting a SPARQL store, for example Oxigraph:

docker run --rm -v $PWD/data:/data -p 7878:7878 oxigraph/oxigraph --location /data serve --bind 0.0.0.0:7878

Then configure the store in your pipeline, configuring at least one store under stores and using the importTo parameter to import the endpoint’s data to the store, referencing the store’s queryUrl:

# config.yml
stores:
  - queryUrl: "http://localhost:7878/query" # SPARQL endpoint for read queries.
    storeUrl: "http://localhost:7878/store" # SPARQL Graph Store HTTP Protocol endpoint. 

stages:
  - name: ...
    iterator:
      query: ...
      endpoint: file://data.nt
      importTo: http://localhost:7878/query
    generator:
      - query: ...

The data is loaded into a named graph <import:filename>, so in this case <import:data.nt>.

Example configuration

# config.yml
name: MyPipeline
description: Example pipeline configuration
destination: output/result.ttl
stages:
  - name: Stage1
    iterator:
      query: "SELECT $this WHERE { $this a <https://schema.org/Thing> }"
      endpoint: "http://example.com/sparql-endpoint"
    generator:
      - query: "CONSTRUCT { ?s ?p ?o } WHERE { ?s ?p ?o }"
        batchSize: 50
    destination: output/stage1-result.ttl
  - name: Stage2
    iterator:
      query: file://queries/iteratorQuery.rq
      endpoint: "http://example.com/sparql-endpoint-1"
      batchSize: 200
    generator:
      - query: file://queries/generator1Query.rq
        endpoint: "http://example.com/sparql-endpoint-1"
        batchSize: 200
      - query: file://queries/generator2Query.rq
        endpoint: "http://example.com/sparql-endpoint-2"
        batchSize: 100
    destination: output/stage2-result.ttl

Configuration options

For a full overview of configuration options, please see the schema.

Development

If you want to help develop LD Workbench, please see the CONTRIBUTING.md file.