npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@treecg/actor-init-ldes-client

v4.0.9

Published

An init actor that fetches members from an Event Stream API

Downloads

96

Readme

@treecg/actor-init-ldes-client

Metadata harvester for a Linked Data Event Stream.

Install

npm install -g @treecg/actor-init-ldes-client

In order to use it as a library, you can leave out the -g.

How to use it

Usage from the command line

actor-init-ldes-client --parameter ${PARAMETER} ${URL}

URL can be a tree:Node from where relations will be followed OR the URI of a tree:Collection. For the latter, the collection's URI will be dereferenced and one tree:view will be followed.

Possible parameters are:

| Parameter | Description | Possible values | | ------------- |------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| ------------- | | pollingInterval | Number of milliseconds before refetching uncacheable fragments | for example: 5000 | | mimeType | the MIME type of the output | application/ld+json, text/turtle... | | context | path to a file with the JSON-LD context you want to use when MIME type is application/ld+json. Pro-tip: provide the full context without references to external context files, because there is no cache and thus they are requested each time a page gets loaded. | for example: ./context.jsonld | | requestHeadersPath | path to a file with the HTTP request headers you want to use | for example: ./headers.json | | fromTime | datetime to prune relations that have a lower datetime value | for example: 2020-01-01T00:00:00 | | emitMemberOnce | whether to emit a member only once, because collection contains immutable version objects. | true / false | | disableSynchronization | whether to disable synchronization or not (by default set to "false", syncing is enabled) | true / false | | disableFraming | whether to disable JSON-LD framing when mimeType is 'application/ld+json' or when representation is 'Object' (by default set to "false"). Value can be set to "true" or "false" | true / false | | dereferenceMembers | whether to dereference members, because the collection pages do not contain all information (by default: false). | true / false | | requestsPerMinute | how many requests per minutes may be sent to the same host (optional) | any number | | loggingLevel | The detail level of logging; useful for debugging problems. (default: info) | 'error', 'warn', 'info', 'verbose', 'debug', 'silly' | | processedURIsCount | The maximum number of processed URIs (members and fragments) that remain in the cache. (default: 10000) | any number |

Example commando with parameters:

actor-init-ldes-client --pollingInterval 5000 --mimeType application/ld+json --context context.jsonld --fromTime 2021-02-03T15:48:12.309Z --emitMemberOnce true --disablePolling true https://apidg.gent.be/opendata/adlib2eventstream/v1/dmg/objecten

Usage within application

The easiest way to create an engine (with default config) is as follows:

const newEngine = require('@treecg/actor-init-ldes-client').newEngine;

const LDESClient = new newEngine();

With the engine or client created, you can now use it to call the async createReadStream(url, options) or createReadStream(url, options, state) method. Note that next to retrieving a serialized string (mimeType option) of member data, an Object (JSON-LD) or Quads representation is also possible with the Javascript API using the representation option. If you want to pause the streama and want to resume it later, you can export the state of the stream using the exportState() method once the stream is paused using pause().

Here is an example synchronizing with a TREE root node of an Event Stream with polling interval of 5 seconds:

import { newEngine } from '@treecg/actor-init-ldes-client';
try {
    let url = "https://apidg.gent.be/opendata/adlib2eventstream/v1/dmg/objecten";
    let options = {
        "pollingInterval": 5000, // millis
        "representation": "Object", //Object or Quads
        "requestHeaders": { // Optional request headers, useful when e.g. the endpoint requires Auth headers
            Accept: 'application/ld+json', 
        }
        "fromTime": new Date("2021-02-03T15:46:12.307Z"),
        "emitMemberOnce": true,
        "disableSynchronization": true,
        "disableFraming": true,
        "jsonLdContext": { //Only necessary for Object representation
            "@context": [
                "https://apidg.gent.be/opendata/adlib2eventstream/v1/context/cultureel-erfgoed-object-ap.jsonld",
                "https://apidg.gent.be/opendata/adlib2eventstream/v1/context/persoon-basis.jsonld",
                "https://apidg.gent.be/opendata/adlib2eventstream/v1/context/cultureel-erfgoed-event-ap.jsonld",
                {
                     "dcterms:isVersionOf": {
                         "@type": "@id"
                    },
                    "prov": "http://www.w3.org/ns/prov#"
                }
            ]
        }
    };
    let LDESClient = new newEngine();
    let eventstreamSync = LDESClient.createReadStream(url, options);
    // OR if you have a previous state
    // let eventstreamSync = LDESClient.createReadStream(url, options, state);
    eventstreamSync.on('data', (member) => {
        if (options.representation) {
            if (options.representation === "Object") {
                const memberURI = member.id;
                console.log(memberURI);
                const object = member.object;
                console.log(object);
            } else if (options.representation === "Quads") {
                /* When using Quads representation, the members adhere to the [@Treecg/types Member interface](https://github.com/TREEcg/types/blob/main/lib/Member.ts) 
                    interface Member {
                        id: RDF.Term;
                        quads: Array<RDF.Quad>;
                    }
                */
                const memberURI = member.id.value;
                console.log(memberURI);
                const quads = member.quads;
                console.log(quads);
            }
        } else {
            console.log(member);
        }

        // Want to pause event stream?
        eventstreamSync.pause();
    });
    eventstreamSync.on('metadata', (metadata) => {
        if (metadata.treeMetadata) console.log(metadata.treeMetadata); // follows the structure of the TREE metadata extractor (https://github.com/TREEcg/tree-metadata-extraction#extracted-metadata)
        console.log(metadata.url); // page from where metadata has been extracted
    });
    eventstreamSync.on('pause', () => {
        // Export current state, but only when paused!
        let state = eventstreamSync.exportState();
    });
    eventstreamSync.on('end', () => {
        console.log("No more data!");
    });
} catch (e) {
    console.error(e);
}

How resuming works

We save and load the EventStream state:

  • during or after the run of the LDES Client, we pause it, and export its state.
  • before a run of the LDES Client, we can load a previous state

Pro-tip: write/read the state from a JSON file, check out the example code.

interface State {
    bookkeeper: Bookkeeper;
    memberBuffer: Array<Member>;
    processedURIs: LRUCache;
}

interface Bookkeeper {
    queue: PriorityQueue;
    queued: LRUCache;
    blacklist: Set<string>;
}
  • queue is a priorityQueue that stores all page ULRs that will be fetched, sorted on ascending refetch time. A page will not be re-added if the page-cache is set to immutable.
  • queued is a least-recently-used Cache containings the last 500 URLS added to the queue, to prevent adding to many duplicates to the queue.
  • blacklist is a Set containing blacklisted URLs that should not be added to the queue.
  • memberBuffer is the internal Buffer containing the unread Members from the EventStream.
  • processedURIs is a least-recently used Cache containing all processed URIs. When refetching pages, members having their URI in this set should not be emitted again when "emitMemberOnce": true.

example

import { newEngine } from '@treecg/actor-init-ldes-client';

// load previous state here (e.g. load from a json file on disk)
const previousState;

try {
    let url = "https://apidg.gent.be/opendata/adlib2eventstream/v1/dmg/objecten";
    let options = {
        "representation": "Quads", //Object or Quads
        "emitMemberOnce": true,
        "disableSynchronization": false,
    };
    let LDESClient = new newEngine();
    
    if (previousState === undefined || previousState === null) {
        // if you don't have a previous state, the created EventStream will start from scratch
        let eventstreamSync = LDESClient.createReadStream(url, options);
    }
    else {
        // if you have a previous state, use it to create the EventStream
        let eventstreamSync = LDESClient.createReadStream(url, options, previousState);
    }
    
    // If the run takes longer than x minutes, pause the LDES Client
    cont timeoutms = 3600000; // amount of milliseconds before timeout
    const timeout = setTimeout(() => eventstreamSync.pause(), timeoutms);
    
    eventstreamSync.on('data', (member) => {
        console.log(member);
    });
    
    eventstreamSync.on('metadata', (metadata) => {
        if (metadata.treeMetadata)
            // follows the TREE metadata extractor structure (https://github.com/TREEcg/tree-metadata-extraction#extracted-metadata)
            console.log(metadata.treeMetadata);
        console.log(metadata.url); // page from where metadata has been extracted
    });
    
    eventstreamSync.on('now only syncing', () => {
        // All known pages have been fetched at least once when receiving this event.
        // This would be the point where we receive the `end` event in the `"disableSynchronization": true` equivalent
        timeout.unref();
        eventstreamSync.pause();
    });
    
    eventstreamSync.on('pause', () => {
        // Export current state, but only when paused!
        let state = eventstreamSync.exportState();
        // Save state here to reuse in a later run (e.g. save as a json file on disk)
    });
    
    eventstreamSync.on('end', () => {
        timeout.unref();
        console.log("No more data!");
    });
} catch (e) {
    console.error(e);
}

Project Assumptions

This contains a list of things to look out for or possibly fix in the following improvements.

  • Not a really maintained project. Some library dependencies are deprecated and no longer supported.
  • In a JSON-LD output, all the floating type values are represented in a scientific notation.
  • Some public helper methods don't have an implementation.
  • One can provide a context URL via the jsonLdContext configuration option, but in certain cases due the dependency on the jsonld.context-parser.js might throw an exception due to remote throttling of a certain context URL. Therefor it would be better that this part should be split in the application so that the application contains a cache of given context URL's with related context values. => This will solve issue that one notice when trying to resolve NGSI-LD related context files.