ldests
v1.0.0-rc4
Published
LDESTS aims to make time series data streams using RDF feasible on platforms such as Solid. It achieves this by using a data model (also referred to as "shape") defining the per-sample structure. By defining this structure in advance, repetition in struct
Downloads
17
Readme
LDESTS: Linked Data Event Streams for Time Series
Introduction
LDESTS aims to make time series data streams using RDF feasible on platforms such as Solid. It achieves this by using a data model (also referred to as "shape") defining the per-sample structure. By defining this structure in advance, repetition in structure can be avoided by filtering out the unique properties of individual samples. As only these unique properties are encoded in the resulting data, a lossless, highly compressed stream can be achieved.
Currently supported
This tool is developed using Kotlin Multiplatform, allowing for multiple target platforms. Currently, only JS/TS on Node.js is supported. There are currently no plans to support JS/TS on Web. There are plans to add Java (and Android) integration at a later stage.
Getting started
Notice: building from source has only been tested on Linux. While the build process should work for any platform capable of using Gradle, other platforms have not been tested. If you come across an issue building the library from source, be sure to let us know by creating an issue.
JS/TS on Node.js
NPM
Currently, no NPM releases are available, but are planned. If you want to integrate LDESTS in your own project right now, you have to build the library from source.
From source
The build process requires the commands git
, yarn
and npm
to function, so make sure these are properly installed and configured before proceeding.
Begin by cloning the repository to a known location:
[user@host ~]$ git clone https://github.com/SolidLabResearch/LDESTS.git
Next, navigate to the new folder and start the building process (either debug
if you want to diagnose a problem in the source code, or release
):
[user@host ~]$ cd LDESTS
[user@host LDESTS]$ ./gradlew js:release
This process can take some time, as it fetches and configures all dependencies for the build to take place. After the process has finished, the folder bin/js
should contain the resulting library. From here, you can add it as a dependency to your own project by running:
[user@host MyProject]$ npm install file:path/to/LDESTS/bin/js/ldests-$version.tgz
Finally, you should be able to include the library into your own project, with type definitions available in TS projects:
import { LDESTS, Shape } from "ldests";
How to use
Node.js
After having built and included LDESTS in your own project, you can take inspiration from the example found in jsTest
to get started.
Creating a stream
Integrating custom streams is typically a two steps process. Create a data model/shape
const shape: Shape = Shape.Companion.parse({
type: "myDataType",
identifier: "myDataIdentifier",
constants: {
"myConstantProperty": ["myConstantValues", "..."]
},
variables: {
"myVariableProperty": "myVariableDatatype"
}
});
Initialise or continue an LDESTS stream using the shape from above
const stream = await new LDESTS.Builder("myStream")
.shape(shape)
.build();
The stream created above can be configured to your needs: you can
- customise the configuration to have fragments to your desired size;
- attach different publishers (ranging from Solid pods to in-memory
N3Store
s); - configure how to fragment your stream.
If you no longer need your stream object, you can close
the stream, allowing all last transactions to finalise and any connections to stop properly:
await stream.close();
Inserting data
A stream can append new data through (single) triple or entire file insertion:
stream.insertTriple(triple); // adds a single triple "asynchronously" to the input stream
await stream.insertFile("path/to/file.nt"); // adds an entire RDF file to the stream and `await`s until finished
It is possible for the resulting stream to not reflect new data yet. To make sure the stream has these new additions available to consumers, the operations have to be await
ed and the stream has to be flushed:
await stream.flush(); // ensures all additional data is processed and published before it returns
Consuming a stream
First, the stream instance has to be created (as seen here). Later, it will be possible to automatically infer the stream's settings (including the shape) when using a single publisher. Currently, only Solid pods are compatible with querying.
const pod = { type: PublisherType.Solid, url: "http://solid.local" } as SolidPublisherConfig;
const query = "SELECT * WHERE { ?s ?p ?o . }";
// looks for "myStream" as defined by the creation of `stream` above on the pod "solid.local"
await stream.query(pod, query, (binding) => { callback(binding) });
Later, time and property constraints found in the provided query will be used to filter the available data so only relevant fragments are retrieved. await
ing the result of query
is not required, but can help with flow control.
Note: as these triples are regenerated from the compressed stream, the exact subject URIs are lost. Every sample's subject is still unique throughout the query, however.
How it works
The stream's data model describes every possible variation of the incoming data. By providing a property or a set of properties the stream has to fragment the incoming data with, all possible fixed data models can be generated. Every resulting model gets its own data stream. These data streams are being appended to when incoming data matches that stream's model.
Matching what data belongs to which data stream is being done through multiple SPARQL queries. As the individual data models represent a template for the individual samples of that stream, generic queries for these samples can be made, allowing for multiple SPARQL queries to occur over the input stream. By then analysing the resulting bindings generated from these queries, the data can be associated with the right fragment of that model's data stream.
Rereading this data also heavily uses the data models: by first analysing the requested data's constraints, a selection of data streams can be made. The fragments making up the various streams of interest are then filtered based on time. Every matching fragment is then read and parsed, which involves the recreation of all triples making up the original data sample, thanks to the presence of the data models.
Roadmap/Future work
Current features and changes are either planned/possible (depending on requirements/demand, in no particular order):
- Support for the JVM (& Android)
- Automatic shape generation using samples from the data streams
- Inferring stream properties
- Supporting variable stream layout on a per-publisher basis
- Manual publisher management, including more granular control over their configurations
- Support for custom publisher implementations
- Proper (automated) releases for Node.js JS/TS on NPM
Credits
This research was partly funded the Flemish Government under the “Onderzoeksprogramma
Artificiële Intelligentie (AI) Vlaanderen” programme, the SolidLab Vlaanderen project (Flemish
Government, EWI and RRF project VV023/10) and the FWO Project FRACTION (Nr. G086822N).
The Node.js JS/TS integration uses
- RDFJS/N3 for everything related to RDF storage (both in-memory and turtle files);
- Comunica for querying triple sources, such as in-memory stores and Solid pods, through SPARQL queries;
- Incremunica for querying ongoing RDF streams, created by manual insertion, through SPARQL queries
under the hood.