npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@rdfc/sds-processors-ts

v1.1.0

Published

[![Bun CI](https://github.com/rdf-connect/sds-processors/actions/workflows/build-test.yml/badge.svg)](https://github.com/rdf-connect/sds-processors/actions/workflows/build-test.yml) [![npm](https://img.shields.io/npm/v/@rdfc/sds-processors-ts.svg?style=po

Downloads

465

Readme

sds-processors

Bun CI npm

Collection of RDF-Connect Typescript processors for handling SDS (Smart Data Streams)-related operations. It currently exposes 5 functions:

js:Sdsify

This processor takes as input a stream of (batched) RDF data entities and wraps them as individual SDS records to be further processed downstream. By default, it will extract individual entities by taking every single named node subject and extracting a Concise Bounded Description (CBD) of that entity with respect to the input RDF graph.

Alternatively, a set of types may be specified (js:typeFilter) to target concrete entities. A SHACL shape can be given to concretely define the bounds target entities and their properties, that want to be extracted and packaged as SDS records. This processor relies on the member extraction algorithm implemented by the W3C TREE Hypermedia community group.

If the js:timestampPath is specified, the set of SDS records will be streamed out in temporal order to avoid out of order writing issues downstream.

An example of how to use this processor within a RDF-Connect pipeline definition is shown next:

@prefix : <https://w3id.org/conn#>.
@prefix js: <https://w3id.org/conn/js#>.
@prefix sh: <http://www.w3.org/ns/shacl#>.

[ ] a js:Sdsify;
    js:input <inputChannelReader>;
    js:output <outputChannerWriter>;
    js:stream <http://ex.org/myStream>;
    js:typeFilter ex:SomeClass, ex:SomeOtherClass;
    js:timestampPath <http://ex.org/timestamp>;
    js:shape """
        @prefix sh: <http://www.w3.org/ns/shacl#>.
        @prefix ex: <http://ex.org/>.

        [ ] a sh:NodeShape;
            sh:xone (<shape1> <shape2>).

        <shape1> a sh:NodeShape;
            sh:targetClass ex:SomeClass;
            sh:property [ sh:path ex:someProperty ].

        <shape2> a sh:NodeShape;
            sh:targetClass ex:SomeOtherClass;
            sh:property [ 
                sh:path ex:someProperty 
            ], [
                sh:path ex:someOtherProperty;
                sh:node [
                    a sh:NodeShape;
                    sh:targetClass ex:YetAnotherClass
                ]
            ].
    """.

js:Bucketize

This processor takes as input a stream of SDS records and SDS metadata and proceeds to bucketize them according to a predefined strategy (see example). The SDS metadata will be also transformed to reflect this transformation. Multiple SDS streams can be present on the incoming data channel.

You can define bucketizers as follows:

Example of a subject and page fragmentation

<bucketize> a js:Bucketize;
  js:channels [
    js:dataInput <...data input>;
    js:metadataInput <... metadata input>;
    js:dataOutput <... data output>;
    js:metadataOutput <... metadata output>;
  ];
  js:bucketizeStrategy ( [            # One or more bucketize strategies
    a tree:SubjectFragmentation;      # Create a bucket based on this path
    tree:fragmentationPath ( );
  ] [
    a tree:PageFragmentation;         # Create a new bucket when the previous bucket has 2 members
    tree:pageSize 2;
  ] );
  js:savePath <./buckets_save.json>;
  js:outputStreamId <MyEpicStream>.

Example of a time-based fragmentation

<bucketize> a js:Bucketize;
  js:channels [
    js:dataInput <...data input>;
    js:metadataInput <... metadata input>;
    js:dataOutput <... data output>;
    js:metadataOutput <... metadata output>;
  ];
  js:bucketizeStrategy ( [
    a tree:TimebasedFragmentation;
    tree:timestampPath <https://www.w3.org/ns/activitystreams#published>;
    tree:maxSize 100;
    tree:k 4;
    tree:minBucketSpan 3600;        # In seconds
  ]);
  js:savePath <./buckets_save.json>;
  js:outputStreamId <MyEpicStream>.

This will create buckets based on a time-based fragmentation. The tree:timestampPath specifies the path to the timestamp property in the SDS records. The tree:maxSize specifies the maximum size of a bucket. When the bucket reaches the maximum size, it will be split into tree:k new buckets, each with 1/k of the original bucket's timespan. The members will be redistributed to the new buckets based on their timestamps. The tree:minBucketSpan specifies the minimum timespan of a bucket. If a bucket is full, but splitting the bucket would result in a bucket with a timespan smaller than tree:minBucketSpan, the bucket will not be split, but a relation will be added to a new page bucket with same timespan as the full bucket, similar to the page fragmentation.

The members need to be arrived in order of their timestamps. When a member arrives, all buckets that hold members with a timestamp older than the new member's timestamp will be made immutable and no new members can be added to them.

js:Ldesify

This processor takes a stream of raw entities (e.g., out from a RML transformation process) and creates versioned entities appending the current timestamp to the entity IRI to make it unique. It is capable of keeping a state so that unmodified entities are filtered.

js:LdesifySDS

Transform SDS-records in SDS-members, creating versioned objects. The resulting objects are encapsulated in a graph (overriding other graphs).

Specify:

  • js:input input channel
  • js:output output channel
  • js:statePath path for state file
  • optional js:sourceStream
  • js:targetStream newly created sds stream id
  • optional js:timestampPath, defaults to http://purl.org/dc/terms/modified
  • optional js:versionOfPath, defaults to http://purl.org/dc/terms/isVersionOf

js:Shapify

Execute Extract CBD Shape algorithm on all sds records. Note: this processor does not create a new sds stream.

Specify:

  • js:input input channel
  • js:output output channel
  • js:shape used sh:NodeShape

js:StreamJoin

This processor can be used to join multiple input streams or Reader Channels (js:input) and pipe their data flow into a single output stream or Writer Channel (js:output). The processor will guarantee that all data elements are delivered downstream and will close the output if all inputs are closed.

js:Generate

This a simple RDF data generator function used for testing. This processor will periodically generate RDF objects with 3 to 4 predicates.