rdf-stores
v1.0.0
Published
A TypeScript/JavaScript implementation of the RDF/JS store interface with support for quoted triples.
Downloads
16,852
Maintainers
Readme
RDF Stores
This package provides an in-memory triple/quad store with triple/quad pattern access. It allows you to configure indexes to tune performance for specific cases. It works in both JavaScript and TypeScript.
Main features:
- 🧠 In-memory indexing
- ⚙️ Full configurability of indexes and dictionaries
- 🔮 Quoted triples support (RDF-star / RDF 1.2)
- 🚀 Highly performant: Fastest JavaScript store in terms of query speed
- ✅ Extensively tested (39.331 unit tests)
- 👥 Implements the RDF/JS Store and RDF/JS DatasetCore interfaces
If using TypeScript, it is recommended to use this in conjunction with @rdfjs/types
.
Installation
$ npm install rdf-stores
or
$ yarn add rdf-stores
This package also works out-of-the-box in browsers via tools such as webpack and browserify.
Quick start
The example below shows how to create a new store with default settings, adding two quads, and querying it.
import { RdfStore } from 'rdf-stores';
import { DataFactory } from 'rdf-data-factory';
// Create a new store with default settings
const store = RdfStore.createDefault();
// Ingest manually defined data
const DF = new DataFactory();
store.addQuad(
DF.quad(DF.namedNode('ex:s1'), DF.namedNode('ex:p1'), DF.namedNode('ex:o1')),
);
store.addQuad(
DF.quad(DF.namedNode('ex:s1'), DF.namedNode('ex:p2'), DF.namedNode('ex:o2')),
);
// Find data matching '<ex:s1> ?p ?o'
const stream = store.match(DF.namedNode('ex:s1'), undefined, undefined);
stream.on('data', (quad) => {
console.log(quad);
});
stream.on('end', () => {
console.log('Done!');
});
// Interacting with the store as a DatasetCore object
const dataset = store.asDataset();
console.log(dataset.size);
dataset.add(DF.quad(DF.namedNode('ex:s1'), DF.namedNode('ex:p1'), DF.namedNode('ex:o1')));
console.log(dataset.has(DF.quad(DF.namedNode('ex:s1'), DF.namedNode('ex:p1'), DF.namedNode('ex:o1'))));
Usage
All public getters and methods of an RdfStore
are illustrated below.
The examples assume the following imports and objects:
import { DataFactory } from 'rdf-data-factory';
const streamifyArray = require('streamify-array');
const DF = new DataFactory();
size
Determining the number of (asserted) quads inside the store:
console.log(store.size);
addQuad
Adding a quad to the store:
store.addQuad(
DF.quad(DF.namedNode('ex:s1'), DF.namedNode('ex:p1'), DF.namedNode('ex:o1')),
);
removeQuad
Removing a quad from the store:
store.removeQuad(
DF.quad(DF.namedNode('ex:s1'), DF.namedNode('ex:p1'), DF.namedNode('ex:o1')),
);
remove
Remove a stream of quads from the store:
const result = store.remove(streamifyArray([
DF.quad(DF.namedNode('ex:s1'), DF.namedNode('ex:p1'), DF.namedNode('ex:o1')),
DF.quad(DF.namedNode('ex:s1'), DF.namedNode('ex:p1'), DF.namedNode('ex:o2')),
]));
result.on('end', () => {
console.log('Done!');
});
removeMatches
Remove all quads matching the given quad pattern from the store:
const result = store.remove(DF.namedNode('ex:s1'), undefined, DF.namedNode('ex:o1'), undefined);
result.on('end', () => {
console.log('Done!');
});
deleteGraph
Remove all quads with the given graph element from the store:
const result = store.deleteGraph('ex:g1');
result.on('end', () => {
console.log('Done!');
});
import
Add a stream of quads into the store:
const result = store.import(streamifyArray([
DF.quad(DF.namedNode('ex:s1'), DF.namedNode('ex:p1'), DF.namedNode('ex:o1')),
DF.quad(DF.namedNode('ex:s1'), DF.namedNode('ex:p1'), DF.namedNode('ex:o2')),
]));
result.on('end', () => {
console.log('Done!');
});
readQuads
Returns an iterable iterator producing all quads matching the given pattern:
for (const quad of store.readQuads(DF.namedNode('ex:s1'), undefined, DF.namedNode('ex:o1'), undefined)) {
console.log(quad);
}
getQuads
Returns an array containing all quads matching the given pattern:
const array = store.getQuads(DF.namedNode('ex:s1'), undefined, DF.namedNode('ex:o1'), undefined)''
console.log(array);
match
Returns a stream producing all quads matching the given pattern:
const stream = store.match(DF.namedNode('ex:s1'), undefined, DF.namedNode('ex:o1'), undefined);
stream.on('data', (quad) => {
console.log(quad);
});
stream.on('end', () => {
console.log('Done!');
});
countQuads
Count the number of quads matching the given pattern:
const count = store.countQuads(DF.namedNode('ex:s1'), undefined, DF.namedNode('ex:o1'), undefined);
asDataset
Interact with this store using the RDF/JS DatasetCore
interface.
const dataset = store.asDataset();
console.log(dataset.size);
dataset.add(DF.quad(DF.namedNode('ex:s1'), DF.namedNode('ex:p1'), DF.namedNode('ex:o1')));
console.log(dataset.has(DF.quad(DF.namedNode('ex:s1'), DF.namedNode('ex:p1'), DF.namedNode('ex:o1'))));
Configuring a store
Instead of using the default settings, you may optionally decide to configure the following aspects of a store:
- Index combinations: In what orders quads should be stored, which will determine storage size, and query efficiency.
- Index type: The type of index datastructure that will be used for each index combination.
- Dictionary: The dictionary that will be used for encoding RDF terms.
- Data Factory: The RDF/JS data factory for creating quads and terms.
Below, you can learn more about each of these aspects.
Default settings
When creating a new store using RdfStore.createDefault()
,
a store with the following settings will be created:
- Index combinations:
GSPO
,GPOS
,GOSP
. - Index type:
RdfStoreIndexNestedRecord
- Dictionary:
TermDictionaryQuotedIndexed
withTermDictionaryNumberRecordFullTerms
. - Data factory:
DataFactory
fromrdf-data-factory
.
These default settings correspond to the following invocation:
const store = new RdfStore<number>({
indexCombinations: [
[ 'graph', 'subject', 'predicate', 'object' ],
[ 'graph', 'predicate', 'object', 'subject' ],
[ 'graph', 'object', 'subject', 'predicate' ],
],
indexConstructor: subOptions => new RdfStoreIndexNestedMapQuoted(subOptions),
dictionary: new TermDictionaryQuotedIndexed(new TermDictionaryNumberRecordFullTerms()),
dataFactory: new DataFactory(),
});
Note: These default settings are considered the "best" for average usage. It is possible that future updates may tweak these default settings. Therefore, if you want more predictable performance across updates, it may be safer to manually configure your store.
Index combinations
The indexCombinations
option inside the RdfStore
constructor allows you
to configure in what orders quads should be stored.
The value of this option must always be an array containing one or more representations of quad component orders,
where each order must always contain the following 4 elements in any order:
'subject'
, 'predicate'
, 'object'
, 'graph'
.
For example, the following will store all triples in a single index using GSPO
order:
{
indexCombinations: [
[ 'graph', 'subject', 'predicate', 'object' ],
]
}
The following will contain 2 indexes, the first in GPOS
order, and the second in GOSP
order:
{
indexCombinations: [
[ 'graph', 'predicate', 'object', 'subject' ],
[ 'graph', 'object', 'subject', 'predicate' ],
]
}
These indexes enable a trade-off between storage size and query performance.
The more indexes, the higher the storage requirements, but the faster query performance.
Therefore, if memory is limited, it is better to pick fewer (at least one) indexes,
but if query performance is more important, then more indexes could be configured.
If the order of the returned triples is not important, then the default index combinations
(GSPO
, GPOS
, GOSP
) should provide sufficient level of performance,
as all triple pattern queries can efficiently be resolved using these indexes.
Index types
This library implements different approaches for storing indexes.
RdfStoreIndexNestedRecord
: Stores quads inside nestedRecord
objects. (Fastest ingestion)RdfStoreIndexNestedRecordQuoted
: Stores quads inside nestedRecord
objects, and supports quoted triples.RdfStoreIndexNestedMap
: Stores quads inside nestedMap
objects. (Fastest querying)RdfStoreIndexNestedMapQuoted
: Stores quads inside nestedMap
objects, and supports quoted triples. (Fastest querying and ingestion for quoted triples)
The following types also exist, but are mainly for illustration purposes, as they are always outperformed by other approaches:
RdfStoreIndexNestedMapRecursive
: Stores quads inside nestedMap
objects, and traverses the tree using recursive methods.RdfStoreIndexNestedMapRecursiveQuoted
: Stores quads inside nestedMap
objects, supports quoted triples, and traverses the tree using recursive methods.
Different JavaScript engine implementations may lead to different levels of performance across these index types.
For example, the following will use RdfStoreIndexNestedRecord
for all indexes:
{
indexConstructor: subOptions => new RdfStoreIndexNestedRecord(subOptions)
}
Dictionaries
This library implements different approaches for dictionary encoding.
TermDictionaryNumberMap
: Encodes stringified representations of terms tonumber
usingMap
objects.TermDictionaryNumberRecord
: Encodes stringified representations of terms tonumber
usingRecord
objects.TermDictionaryNumberRecordFullTerms
: Encodes stringified representations of terms tonumber
usingRecord
objects, but keeps track of original term objects during decoding. (Fastest when not requiring quoted triples)TermDictionaryQuoted
: Delegates quoted triples and other RDF terms to separate dictionaries.TermDictionaryQuotedIndexed
: Stores quoted triples inside an index structure, and other RDF terms using a separate dictionary. (Fastest when requiring quoted triples)TermDictionaryQuotedReferential
: Delegates quoted triples and other RDF terms to separate dictionaries, but terms inside quoted triples are stored in the plain terms dictionary.TermDictionarySymbol
: Encodes stringified representations of terms toSymbol
usingMap
objects.
For example, the following will use TermDictionaryNumberRecordFullTerms
:
{
dictionary: new TermDictionaryNumberRecordFullTerms()
}
For example, the following will use TermDictionaryQuotedIndexed
with a TermDictionaryNumberRecordFullTerms
for non-quoted-triple terms:
{
dictionary: new TermDictionaryQuotedIndexed(new TermDictionaryNumberRecordFullTerms())
}
Data Factory
When terms are decoded from indexes, a dictionary is used to construct terms and quads. Any RDF/JS data factory implementation can be used for this.
Performance
Experimental results show the following:
- A single
RdfStoreIndexNestedRecord
inGSPO
order withTermDictionaryNumberRecordFullTerms
achieves similar ingestion speeds asN3Store
. - Storing multiple indexes improves query performance, at the cost of slower ingestion.
RdfStoreIndexNestedMap
outperformsRdfStoreIndexNestedRecord
andN3Store
on query performance.TermDictionaryNumberRecordFullTerms
is generally the most efficient dictionary implementation, and it can be used in combination withTermDictionaryQuotedIndexed
if quoted triples are to be used.RdfStoreIndexNestedMapQuoted
andRdfStoreIndexNestedRecordQuoted
have a small overhead (~10%) on ingestion and query performance compared to their non-quoted index variants.
These conclusions are draw from the measurements of the command node perf/run.js -d 128 -o
(part of this repository):
# N3Store
- Adding 2097152 triples to the default graph: 1.676s
* Memory usage for triples: 141MB
- Finding all 2097152 triples in the default graph 2097152 times (0 variables): 5.523s
- Finding all 2097152 triples in the default graph 32768 times (1 variable): 1.114s
- Finding all 2097152 triples in the default graph 384 times (2 variables): 1.057s
- Finding all 2097152 triples in the default graph 32768 times (1 variable) via a stream: 1.745s
- Counting all 2097152 triples in the default graph 32768 times (1 variable): 194.804ms
- Adding 1048576 quads: 1.126s
* Memory usage for quads: 180MB
- Finding all 1048576 quads 131072 times: 724.579ms
# 3 Map indexes (number) OPT-QUERY
- Adding 2097152 triples to the default graph: 3.247s
* Memory usage for triples: 277MB
- Finding all 2097152 triples in the default graph 2097152 times (0 variables): 3.267s
- Finding all 2097152 triples in the default graph 32768 times (1 variable): 1.037s
- Finding all 2097152 triples in the default graph 384 times (2 variables): 1.032s
- Finding all 2097152 triples in the default graph 32768 times (1 variable) via a stream: 1.365s
- Counting all 2097152 triples in the default graph 32768 times (1 variable): 96.452ms
- Adding 1048576 quads: 2.044s
* Memory usage for quads: 366MB
- Finding all 1048576 quads 131072 times: 828.931ms
- Adding 262144 quoted triples: 563.647ms
* Memory usage for quoted triples: 448MB
- Finding all 262144 quoted triples 192 times: 9.132s
# 3 Record indexes (number) OPT-INGEST
- Adding 2097152 triples to the default graph: 2.148s
* Memory usage for triples: 262MB
- Finding all 2097152 triples in the default graph 2097152 times (0 variables): 3.098s
- Finding all 2097152 triples in the default graph 32768 times (1 variable): 1.016s
- Finding all 2097152 triples in the default graph 384 times (2 variables): 1.062s
- Finding all 2097152 triples in the default graph 32768 times (1 variable) via a stream: 1.513s
- Counting all 2097152 triples in the default graph 32768 times (1 variable): 165.439ms
- Adding 1048576 quads: 1.501s
* Memory usage for quads: 181MB
- Finding all 1048576 quads 131072 times: 907.828ms
- Adding 262144 quoted triples: 1.213s
* Memory usage for quoted triples: 668MB
- Finding all 262144 quoted triples 192 times: 9.048s
# 1 Map indexes (number) OPT-QUERY
- Adding 2097152 triples to the default graph: 1.966s
* Memory usage for triples: 451MB
- Finding all 2097152 triples in the default graph 2097152 times (0 variables): 3.560s
- Finding all 2097152 triples in the default graph 32768 times (1 variable): 1.463s
- Finding all 2097152 triples in the default graph 384 times (2 variables): 2.019s
- Finding all 2097152 triples in the default graph 32768 times (1 variable) via a stream: 2.026s
- Counting all 2097152 triples in the default graph 32768 times (1 variable): 382.02ms
- Adding 1048576 quads: 1.099s
* Memory usage for quads: 454MB
- Finding all 1048576 quads 131072 times: 1.174s
- Adding 262144 quoted triples: 344.364ms
* Memory usage for quoted triples: 309MB
- Finding all 262144 quoted triples 192 times: 8.755s
# 1 Record indexes (number) OPT-INGEST
- Adding 2097152 triples to the default graph: 1.801s
* Memory usage for triples: 727MB
- Finding all 2097152 triples in the default graph 2097152 times (0 variables): 3.502s
- Finding all 2097152 triples in the default graph 32768 times (1 variable): 1.733s
- Finding all 2097152 triples in the default graph 384 times (2 variables): 1.491s
- Finding all 2097152 triples in the default graph 32768 times (1 variable) via a stream: 2.851s
- Counting all 2097152 triples in the default graph 32768 times (1 variable): 240.031ms
- Adding 1048576 quads: 1.068s
* Memory usage for quads: 634MB
- Finding all 1048576 quads 131072 times: 1.386s
- Adding 262144 quoted triples: 414.742ms
* Memory usage for quoted triples: 641MB
- Finding all 262144 quoted triples 192 times: 10.448s
# 3 Nested Map Quoted indexes with indexed quoted dict (number) OPT-QUERY
- Adding 2097152 triples to the default graph: 3.309s
* Memory usage for triples: 829MB
- Finding all 2097152 triples in the default graph 2097152 times (0 variables): 4.469s
- Finding all 2097152 triples in the default graph 32768 times (1 variable): 1.071s
- Finding all 2097152 triples in the default graph 384 times (2 variables): 1.057s
- Finding all 2097152 triples in the default graph 32768 times (1 variable) via a stream: 1.494s
- Counting all 2097152 triples in the default graph 32768 times (1 variable): 110.942ms
- Adding 1048576 quads: 1.952s
* Memory usage for quads: 881MB
- Finding all 1048576 quads 131072 times: 856.267ms
- Adding 262144 quoted triples: 663.763ms
* Memory usage for quoted triples: 1046MB
- Finding all 262144 quoted triples 192 times: 252.158ms
# 3 Nested Record Quoted indexes with indexed quoted dict (number) OPT-INGEST
- Adding 2097152 triples to the default graph: 2.338s
* Memory usage for triples: 188MB
- Finding all 2097152 triples in the default graph 2097152 times (0 variables): 3.840s
- Finding all 2097152 triples in the default graph 32768 times (1 variable): 1.022s
- Finding all 2097152 triples in the default graph 384 times (2 variables): 1.116s
- Finding all 2097152 triples in the default graph 32768 times (1 variable) via a stream: 1.743s
- Counting all 2097152 triples in the default graph 32768 times (1 variable): 190.12ms
- Adding 1048576 quads: 1.607s
* Memory usage for quads: 239MB
- Finding all 1048576 quads 131072 times: 984.885ms
- Adding 262144 quoted triples: 1.713s
* Memory usage for quoted triples: 670MB
- Finding all 262144 quoted triples 192 times: 321.245ms
Note that memory usage measurements are inaccurate due to all stores running in the same process, and no garbage collection occurring.
License
This software is written by Ruben Taelman.
This code is released under the MIT license.