ldtr
v0.6.4
Published
Linked Data Transcriber
Downloads
21
Maintainers
Readme
LDTR
A Linked Data (as in RDF) Transcriber.
Try out the Demo Web Application for visualization, format conversion and light editing.
Install the NPM Package.
Check out the Source Code on GitHub.
About
LDTR turns various representations of RDF into JSON-LD.
LDTR works by transcribing the input syntax verbatim into a valid JSON-LD structure that represents the same RDF. Only use output from this tool directly when input is strictly under your control. In any other case, LDTR includes a JSON-LD expansion implementation, to turn the data into a normalized, fully predicable RDF data representation. Use that if you want to use LDTR for general RDF data processing.
This tool strives to be usable both in the browser, on the command line and on the server (mainly NodeJS). It is built using modern JS and ES modules, along with some minimal provisioning for different runtimes.
Input Formats
RDFa 1.1 embedded in HTML.
All the old RDF/XML out there on the web.
While JSON-LD is the output format, this tool can of course either pass data through verbatim, or expand it and act upon that.
Experimental support for RDF-star expressed as TriG-star and JSON-LD-star.
For RDF/XML the asserted reification shorthand using rdf:ID is directly mapped to RDF-star annotations. This goes beyond any spec.
Experimental support for named graphs in RDF/XML.
Output Forms and Formats
For flexible compact JSON-LD, you can, and often should, also pass the result through a JSON-LD processor (such as jsonld.js) in order to to have control over the shapes and terms of the results.
While primarily designed to produce a native data structure representable as JSON-LD, LDTR also includes serializers for:
- TriG
- RDF/XML (including named graphs and RDF-star annotations)
These work similarly to the parsers, by transcribing the JSON-LD as directly as possible. This "shortcut" imposes some restrictions on the data, depending upon which compaction features of JSON-LD that the output format can faithfully transcribe. Regular "URI-to-PName" term compaction is guaranteed to work (which is the form of compact JSON-LD that the parsers output; representing a kind of common intersection between these formats).
Install
$ npm install ldtr
Command Line Usage
Examples:
$ ldtr RDF_FILE_OR_URL
$ cat TURTLE_FILE | ldtr
$ cat RDFA_FILE | ldtr -t html
$ ldtr RDF_FILE_OR_URL -o trig
CLI options:
$ ldtr -h
Usage: ldtr [options] [arguments]
Options:
-t, --type TYPE Media type or file suffix
-b, --base BASE Base URL if different from input URL
-e, --expand Expand JSON-LD
-i, --index Index on keys, types and reverses
-p, --pattern Use RDFa pattern copying
-o, --output OUTPUT Media type or file suffix
--max-redirects NUMBER
-v, --verbose
-h, --help
Library Usage
Use the top level interface: read
and write
, with input data, and
optionally a media type if it isn't "obvious" (e.g. a DOM Document, an URL or a
file with a common suffix).
For text-based formats, the input is expected to be a regular string. For XML- and HTML-based formats, the input can also be a DOM Document. (Any W3C XML DOM Level 2 Core compliant DOMParser and XMLSerializer will do.)
In a browser, you can use the internals by themselves. See the demo web application for an example of how.
Parsing:
import * as ldtr from 'ldtr'
let data
// Guess type by suffix
data = await ldtr.read('some-data.trig')
// Supply file path and type
data = await ldtr.read('some-data.trig', 'application/trig')
// Supply URL and use respone content-type
data = await ldtr.read('http://www.w3.org/1999/02/22-rdf-syntax-ns')
// Supply URL and type
data = await ldtr.read('http://example.org', 'application/trig')
// Supply data and type
data = await ldtr.read({ data: '<a> :b "c" .', type: 'text/turtle' })
// Parse RDF/XML from a DOMDocument
doc = new DOMParser().parseFromString(rdfStr, 'text/xml')
data = await ldtr.read({data: doc})
// Parse RDFa from a DOMDocument
doc = new DOMParser().parseFromString(rdfStr, 'text/html')
data = await ldtr.read({data: doc})
Internals
The TriG parser is generated from a grammar file (based on the TriG W3C EBNF Grammar) using PEG.js.
By default on Node (e.g. when using the CLI) LDTR uses xmldom for HTML and XML parsing.
(Caveat: Internal XML entity declarations are not handled by xmldom yet.)
Rationale
RDF is about meaning, not structure. Of course, meaning is always – indirectly but intrinsically – conveyed by structure. And if the structure is yours to begin with, you can leverage its shape directly for speed and convenience. As such, the practise of using JSON-LD as plain JSON is a bit like using C. Very effective and close to the metal, but rather dangerous if you don't know what you're doing.
To a certain point, this tool can be used as a teaching aid, for showing the isomorphisms of different RDF serializations. Note that prefix mechanisms (QNames/CURIEs/PNames) are basically only useful in RDF syntaxes for humans, when reading and writing data directly.
Crucially, they are not intended to be handled directly (syntactically, from the source) in code. Thus, by producing a JSON-LD compliant semi-compact transcript like LDTR does, consumers who are unaware of what the tokens really mean (in RDF) may be misled to consider them fixed and atomic, instead of the locally defined shorthand forms they really are. This is why this form can only be trusted when you are in control of the source data. When you are, however, the compact form can keep both data and code fairly succinct, which may be of benefit to certain applications. You do trade away general RDF processing by doing so though. It's a matter of tradeoffs.