sparql-engine
v0.8.3
Published
A framework for building SPARQL query engines in Javascript
Downloads
110
Maintainers
Readme
sparql-engine
An open-source framework for building SPARQL query engines in Javascript/Typescript.
Main features:
- Build a SPARQL query engine on top of any data storage system.
- Supports the full features of the SPARQL syntax by implementing a single class!
- Support for all SPARQL property Paths.
- Implements advanced SPARQL query rewriting techniques for transparently optimizing SPARQL query processing.
- Supports full text search queries.
- Supports Custom SPARQL functions.
- Supports Semantic Caching, to speed up query evaluation of reccurent patterns.
- Supports the SPARQL UPDATE protocol.
- Supports Basic Federated SPARQL queries using SERVICE clauses.
- Customize every step of SPARQL query processing, thanks to a modular architecture.
- Support for SPARQL Graph Management protocol.
Table of contents
- Installation
- Getting started
- Enable caching
- Full text search
- Federated SPARQL Queries
- Custom Functions
- Advanced Usage
- Documentation
- Aknowledgments
- References
Installation
npm install --save sparql-engine
Getting started
The sparql-engine
framework allow you to build a custom SPARQL query engine on top of any data storage system.
In short, to support SPARQL queries on top of your data storage system, you need to:
- Implements a subclass of
Graph
, which provides access to the data storage system. - Gather all your Graphs as a
Dataset
(using your own implementation or the default one). - Instantiate a
PlanBuilder
and use it to execute SPARQL queries.
Examples
As a starting point, we provide you with two examples of integration:
- With N3.js, available here.
- With LevelGraph, available here.
Preliminaries
SPARQL.js algebra and TypeScript
The sparql-engine
framework use the SPARQL.js
library for parsing and manipulating SPARQL queries as JSON objects. For TypeScript compiltation, we use a custom package sparqljs-legacy-type
for providing the types information.
Thus, if you are working with sparql-engine
in TypeScript, you will need to install the sparqljs-legacy-type
package.
If want to know why we use a custom types package, see the discussion of this issue.
RDF triples representation
This framework represents RDF triples using Javascript Object. You will find below, in Java-like syntax, the "shape" of such object.
interface TripleObject {
subject: string; // The Triple's subject
predicate: string; // The Triple's predicate
object: string; // The Triple's object
}
PipelineStage
The sparql-engine
framework uses a pipeline of iterators to execute SPARQL queries. Thus, many methods encountered in this framework needs to return PipelineStage<T>
, i.e., objects that generates items of type T
in a pull-based fashion.
An PipelineStage<T>
can be easily created from one of the following:
- An array of elements of type
T
- A Javascript Iterator, which yields elements of type
T
. - An EventEmitter which emits elements of type
T
on adata
event. - A Readable stream which produces elements of type
T
.
To create a new PipelineStage<T>
from one of these objects, you can use the following code:
const { Pipeline } = require('sparql-engine')
const sourceObject = // the object to convert into a PipelineStage
const stage = Pipeline.getInstance().from(sourceObject)
Fore more information on how to create and manipulate the pipeline, please refers to the documentation of Pipeline
and PipelineEngine
.
RDF Graphs
The first thing to do is to implement a subclass of the Graph
abstract class. A Graph
represents an RDF Graph and is responsible for inserting, deleting and searching for RDF triples in the database.
The main method to implement is Graph.find(triple)
, which is used by the framework to find RDF triples matching
a triple pattern in the RDF Graph.
This method must return an PipelineStage<TripleObject>
, which will be consumed to find matching RDF triples. You can find an example of such implementation in the N3 example.
Similarly, to support the SPARQL UPDATE protocol, you have to provides a graph that implements the Graph.insert(triple)
and Graph.delete(triple)
methods, which insert and delete RDF triple from the graph, respectively. These methods must returns Promises, which are fulfilled when the insertion/deletion operation is completed.
Finally, the sparql-engine
framework also let your customize how Basic graph patterns (BGPs) are evaluated against
the RDF graph. The engine provides a default implementation based on the Graph.find
method and the
Index Nested Loop Join algorithm. However, if you wish to supply your own implementation for BGP evaluation, you just have to implement a Graph
with an evalBGP(triples)
method.
This method must return a PipelineStage<Bindings>
. You can find an example of such implementation in the LevelGraph example.
You will find below, in Java-like syntax, an example subclass of a Graph
.
const { Graph } = require('sparql-engine')
class CustomGraph extends Graph {
/**
* Returns an iterator that finds RDF triples matching a triple pattern in the graph.
* @param triple - Triple pattern to find
* @return An PipelineStage which produces RDF triples matching a triple pattern
*/
find (triple: TripleObject, options: Object): PipelineStage<TripleObject> { /* ... */ }
/**
* Insert a RDF triple into the RDF Graph
* @param triple - RDF Triple to insert
* @return A Promise fulfilled when the insertion has been completed
*/
insert (triple: TripleObject): Promise { /* ... */ }
/**
* Delete a RDF triple from the RDF Graph
* @param triple - RDF Triple to delete
* @return A Promise fulfilled when the deletion has been completed
*/
delete (triple: : TripleObject): Promise { /* ... */ }
}
RDF Datasets
Once you have your subclass of Graph
ready, you need to build a collection of RDF Graphs, called a RDF Dataset. A default implementation, HashMapDataset
, is made available by the framework, but you can build your own by subclassing Dataset
.
const { HashMapDataset } = require('sparql-engine')
const CustomGraph = require(/* import your Graph subclass */)
const GRAPH_A_IRI = 'http://example.org#graph-a'
const GRAPH_B_IRI = 'http://example.org#graph-b'
const graph_a = new CustomGraph(/* ... */)
const graph_b = new CustomGraph(/* ... */)
// we set graph_a as the Default RDF dataset
const dataset = new HashMapDataset(GRAPH_A_IRI, graph_a)
// insert graph_b as a Named Graph
dataset.addNamedGraph(GRAPH_B_IRI, graph_b)
Running a SPARQL query
Finally, to run a SPARQL query on your RDF dataset, you need to use the PlanBuilder
class. It is responsible for parsing SPARQL queries and building a pipeline of iterators to evaluate them.
const { PlanBuilder } = require('sparql-engine')
// Get the name of all people in the Default Graph
const query = `
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name
WHERE {
?s a foaf:Person .
?s foaf:name ?name .
}`
// Creates a plan builder for the RDF dataset
const builder = new PlanBuilder(dataset)
// Get an iterator to evaluate the query
const iterator = builder.build(query)
// Read results
iterator.subscribe(
bindings => console.log(bindings),
err => console.error(err),
() => console.log('Query evaluation complete!')
)
Enable caching
The sparql-engine
provides support for automatic caching of Basic Graph Pattern evaluation using the Semantic Cache algorithm. Basically, the cache will save the results of BGPs already evaluated and, when the engine wants to evaluates a BGP, it will look for the largest subset of the BGP in the cache. If one is available, it will re-use the cached results to speed up query processing.
By default, semantic caching is disabled. You can turn it on/off using the PlanBuilder.useCache
and PlanBuilder.disableCache
methods, respectively. The useCache
method accepts an optional parameter, so you can provide your own implementation of the semantic cache. By defaults, it uses an in-memory LRU cache which stores up to 500MB of items for 20 minutes.
// get an instance of a PlanBuilder
const builder = new PlanBuilder(/* ... */)
// activate the cache
builder.useCache()
// disable the cache
builder.disableCache()
Full Text Search
The sparql-engine
provides a non-standard full text search functionnality,
allowing users to execute approximate string matching on RDF Terms retrieved by SPARQL queries.
To accomplish this integration, it follows an approach similar to BlazeGraph and defines several magic predicates that are given special meaning, and when encountered in a SPARQL query, they are interpreted as configuration parameters for a full text search query.
The simplest way to integrate a full text search into a SPARQL query is to use the magic predicate ses:search
inside of a SPARQL join group. In the following query, this predicate is used to search for the keywords neil
and gaiman
in the values binded to the ?o
position of the triple pattern.
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX ses: <https://callidon.github.io/sparql-engine/search#>
SELECT * WHERE {
?s foaf:knows ?o .
?o ses:search “neil gaiman” .
}
In a way, full text search queries allows users to express more complex SPARQL filters that performs approximate string matching over RDF terms. Each result is annotated with a relevance score (how much it matches the keywords, higher is better) and a rank (they represent the descending order of relevance scores). These two values are not binded by default into the query results, but you can use magic predicates to get access to them (see below). Note that the meaning of relevance scores is specific to the implementation of the full text search.
The full list of magic predicates that you can use in a full text search query is:
ses:search
defines keywords to search as a list of keywords separated by spaces.ses:matchAllTerms
indicates that only values that contain all of the specified search terms should be considered.ses:minRelevance
andses:maxRelevance
limits the search to matches with a minimum/maximum relevance score, respectively. In the default implementation, scores are floating numbers, ranging from 0.0 to 1.0 with a precision of 4 digits.ses:minRank
andses:maxRank
limits the search to matches with a minimum/maximum rank value, respectively. In the default implementation, ranks are positive integers starting at 0.ses:relevance
binds each term's relevance score to a SPARQL variable.ses:rank
binds each term's rank to a SPARQL variable.
Below is a more complete example, that use most of these keywords to customize the full text search.
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX ses: <https://callidon.github.io/sparql-engine/search#>
SELECT ?s ?o ?score ?rank WHERE {
?s foaf:knows ?o .
?o ses:search “neil gaiman” .
?o ses:minRelevance “0.25” .
?o ses:maxRank “1000” .
?o ses:relevance ?score .
?o ses:rank ?rank .
?o ses:matchAllTerms “true” .
}
To provide a custom implementation for the full text search that is more integrated with your backend,
you simply need to override the fullTextSearch
method of the Graph
class.
You can find the full signature of this method in the relevant documentation.
The sparql-engine
framework provides a default implementation of this method, which computes relevance scores as the average ratio of keywords matched by words in the RDF terms.
Notice that this default implementation is not suited for production usage.
It will performs fine for small RDF datasets, but,
when possible, you should always provides a dedicated implementation that leverages your backend.
For example, for SQL databases, you could use GIN or GIST indexes.
Federated SPARQL Queries
The sparql-engine
framework provides automatic support for evaluating federated SPARQL queries, using the SERVICE
keyword.
To enable them, you need to set a Graph Factory for the RDF dataset used to evaluate SPARQL queries.
This Graph factory is used by the dataset to create new RDF Graph on-demand.
To set it, you need to use the Dataset.setGraphFactory
method, as detailed below.
It takes a callback as parameter, which will be invoked to create a new graph from an IRI.
It's your responsibility to define the graph creation logic, depending on your application.
const { HashMapDataset } = require('sparql-engine')
const CustomGraph = require(/* import your Graph subclass */)
const my_graph = new CustomGraph(/* ... */)
const dataset = new HashMapDataset('http://example.org#graph-a', my_graph)
// set the Graph factory of the dataset
dataset.setGraphFactory(iri => {
// return a new graph for the provided iri
return new CustomGraph(/* .. */)
})
Once the Graph factory is set, you have nothing more to do! Juste execute your federated SPARQL queries as regular queries, like before!
Custom Functions
SPARQL allows custom functions in expressions so that queries can be used on domain-specific data.
The sparql-engine
framework provides a supports for declaring such custom functions.
A SPARQL value function is an extension point of the SPARQL query language that allows URI to name a function in the query processor.
It is defined by an IRI
in a FILTER
, BIND
or HAVING BY
expression.
To register custom functions, you must create a JSON object that maps each function's IRI
to a Javascript function that takes a variable number of RDF Terms arguments and returns one of the following:
- A new RDF Term (an IRI, a Literal or a Blank Node) in RDF.js format.
- An array of RDF Terms.
- An Iterable or a Generator that yields RDF Terms.
- The
null
value, to indicates that the function's evaluation has failed.
RDF Terms are represented using the RDF.js data model.
The rdf
subpackage exposes a lot
of utilities methods to create and manipulate RDF.js terms in the context of custom SPARQL functions.
The following shows a declaration of some simple custom functions.
// load the utility functions used to manipulate RDF terms
const { rdf } = require('sparql-engine')
// define some custom SPARQL functions
const customFunctions = {
// reverse a RDF literal
'http://example.com#REVERSE': function (rdfTerm) {
const reverseValue = rdfTerm.value.split("").reverse().join("")
return rdf.shallowCloneTerm(rdfTerm, reverseValue)
},
// Test if a RDF Literal is a palindrome
'http://example.com#IS_PALINDROME': function (rdfTerm) {
const result = rdfTerm.value.split("").reverse().join("") === rdfTerm.value
return rdf.createBoolean(result)
},
// Test if a number is even
'http://example.com#IS_EVEN': function (rdfTerm) {
if (rdf.termIsLiteral(rdfTerm) && rdf.literalIsNumeric(rdfTerm)) {
const jsValue = rdf.asJS(rdfTerm.value, rdfTerm.datatype.value)
const result = jsValue % 2 === 0
return rdf.createBoolean(result)
}
return terms.createFalse()
}
}
Then, this JSON object is passed into the constructor of your PlanBuilder.
const builder = new PlanBuilder(dataset, {}, customFunctions)
Now, you can execute SPARQL queries with your custom functions! For example, here is a query that uses our newly defined custom SPARQL functions.
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX example: <http://example.com#>
SELECT ?length
WHERE {
?s foaf:name ?name .
# this bind is not critical, but is here for illustrative purposes
BIND(<http://example.com#REVERSE>(?name) as ?reverse)
BIND(STRLEN(?reverse) as ?length)
# only keeps palindromes
FILTER (!example:IS_PALINDROME(?name))
}
GROUP BY ?length
HAVING (example:IS_EVEN(?length))
Advanced usage
Customize the pipeline implementation
The class PipelineEngine
(and its subclasses) is the main component used by sparql-engine
to evaluate all SPARQL operations. It defines basic operations (map
, filter
, etc) that can be used
to manipulate intermediate results and evaluate SPARQL queries.
By default, the framework uses an implementation of PipelineEngine
based on rxjs
, to implements a SPARQL query execution plan as a pipeline of iterators.
However, you are able to switch to others implementations of PipelineEngine
, using Pipeline.setInstance
.
const { Pipeline, PipelineEngine } = require('sparql-engine')
class CustomEngine extends PipelineEngine {
// ...
}
// add this before creating a new plan builder
Pipeline.setInstance(new CustomEngine())
// ...
Two implementations of PipelineEngine
are provided by default.
RxjsPipeline
, based onrxjs
, which provides a pure pipeline approach. This approach is selected by default when loading the framework.VectorPipeline
, which materializes all intermediate results at each pipeline computation step. This approach is more efficient CPU-wise, but also consumes a lot more memory.
These implementations can be imported as follows:
const { RxjsPipeline, VectorPipeline } = require('sparql-engine')
Customize query execution
A PlanBuilder
implements a Builder pattern in order to create a physical query execution plan for a given SPARQL query.
Internally, it defines stages builders to generates operators for executing all types of SPARQL operations.
For example, the OrderByStageBuilder
is invoked when the PlanBuilder
needs to evaluate an ORDER BY
modifier.
If you want to customize how query execution plans are built, you have to implement your own stage builders, by extending existing ones.
Then, you need to configure your plan builder to use them, with the use
function.
const { PlanBuilder, stages } = require('sparql-engine')
class MyOrderByStageBuilder extends stages.OrderByStageBuilder {
/* Define your custom execution logic for ORDER BY */
}
const dataset = /* a RDF dataset */
// Creates a plan builder for the RDF dataset
const builder = new PlanBuilder(dataset)
// Plug-in your custom stage builder
builder.use(stages.SPARQL_OPERATION.ORDER_BY, MyOrderByStageBuilder(dataset))
// Now, execute SPARQL queries as before with your PlanBuilder
You will find below a reference table of all stage builders used by sparql-engine
to evaluate SPARQL queries. Please see the API documentation for more details.
Executors
| SPARQL Operation | Default Stage Builder | Symbol |
|------------------|-----------------------|--------|
| Aggregates | AggregateStageBuilder | SPARQL_OPERATION.AGGREGATE
|
| Basic Graph Patterns | BGPStageBuilder | SPARQL_OPERATION.BGP
|
| BIND | BindStageBuilder | SPARQL_OPERATION.BIND
|
| DISTINCT | DistinctStageBuilder | SPARQL_OPERATION.DISTINCT
|
| FILTER | FilterStageBuilder | SPARQL_OPERATION.FILTER
|
| Property Paths | PathStageBuilder | SPARQL_OPERATION.PROPERTY_PATH
|
| GRAPH | GraphStageBuilder | SPARQL_OPERATION.GRAPH
|
| MINUS | MinusStageBuilder | SPARQL_OPERATION.MINUS
|
| OPTIONAL | OptionalStageBuilder | SPARQL_OPERATION.OPTIONAL
|
| ORDER_BY | OrderByStageBuilder | SPARQL_OPERATION.ORDER_BY
|
| SERVICE | ServiceStageBuilder | SPARQL_OPERATION.SERVICE
|
| UNION | UnionStageBuilder | SPARQL_OPERATION.UNION
|
| UPDATE | UpdateStageBuilder | SPARQL_OPERATION.UPDATE
|
Documentation
To generate the documentation in the docs
director:
git clone https://github.com/Callidon/sparql-engine.git
cd sparql-engine
yarn install
npm run doc
Acknowledgments
This framework is developed since 2018 by many contributors, and we thanks them very much for their contributions to this project! Here is the full list of our amazing contributors.
- Corentin Marionneau (@Slaanaroth)
- Corentin created the first version of
sparql-engine
during its research internship at the Laboratoire des Sciences du Numérique de Nantes (LS2N). He is now a Web developer at SII Atlantique.
- Corentin created the first version of
- Merlin Barzilai (@Rintarou)
- Merlin designed the first SPARQL compliance tests for the framework during its research internship at the LS2N.
- Dustin Whitney (@dwhitney)
- Dustin implemented the support for custom SPARQL functions and provided a lot of feedback during the early stages of development.
- Julien Aimonier-Davat (@Lastshot97)
- Julien implemented the support for SPARQL Property Paths evaluation during its research internship at the LS2N. He is now a Ph.D. Student at the University of Nantes.
- Arnaud Grall (@folkvir)
- Arnaud contributed to many bugfixes and provided a lot of feedback throughout the development of the framework. He is now a Software Engineer at SII Atlantique.
- Thomas Minier (@Callidon)
- Thomas developed the framework during his PhD thesis in the Team "Gestion des Données Distribuées" (GDD) and supervise its evolution ever since. He is now a Software Engineer at SII Atlantique.