@wikipathways/cxml
v0.2.14
Published
Advanced schema-aware streaming XML parser
Downloads
1,394
Readme
cxml
NOTE: the master branch of the source repo for this project did not compile. It also did not support xpath queries. This fork updates the code so that master compiles, augments the testing and adds xpath support.
cxml
aims to be the most advanced schema-aware streaming XML parser for JavaScript and TypeScript.
It fully supports namespaces, derived types and substitution groups.
It can handle pretty hairy schema such as
GML,
WFS and extensions to them defined by
INSPIRE.
Output is fully typed and structured according to the actual meaning of input data, as defined in the schema.
Introduction
For example this XML:
<dir name="123">
<owner>me</owner>
<file name="test" size="123">
data
</file>
</dir>
can become this JSON (run npm test
to see it happen):
{
"dir": {
"name": "123",
"owner": "me",
"file": [
{
"name": "test",
"size": 123,
"content": "data"
}
]
}
}
Note the following:
"123"
can be a string or a number depending on the context.- The
name
attribute andowner
child element are represented in the same way. - A
dir
has a single owner but can contain many files, sofile
is an array butowner
is not. - Output data types are as simple as possible while correctly representing the input.
See the example schema
that makes it happen. Schemas for formats like
GML and
SVG are nastier,
but you don't have to look at them to use them through cxml
.
Relevant schema files should be downloaded and compiled using cxsd before using them to parse documents. Check out the example schema converted to TypeScript.
There's much more. What if we parse an empty dir:
import * as cxml from 'cxml';
import * as example from 'cxml/test/xmlns/dir-example';
var parser = new cxml.Parser();
var result = parser.parse('<dir name="empty"></dir>', example.document);
Now we can print the result and try some magical features:
result.then((doc: example.document) => {
console.log( JSON.stringify(doc) ); // {"dir":{"name":"empty"}}
var dir = doc.dir;
console.log( dir instanceof example.document.dir.constructor ); // true
console.log( dir instanceof example.document.file.constructor ); // false
console.log( dir instanceof example.DirType ); // true
console.log( dir instanceof example.FileType ); // false
console.log( dir._exists ); // true
console.log( dir.file[0]._exists ); // false (not an error!)
});
Unseen in the JSON output, every object is an instance of a constructor for the appropriate XSD schema type.
Its prototype also contains placeholders for valid children, which means you can refer to a.b.c.d._exists
even if a.b
doesn't exist.
This saves irrelevant checks when only the existence of a deeply nested item is interesting.
The magical _exists
flag is true
in the prototypes and false
in the placeholder instances, so it consumes no memory per object.
We can also process data as soon as the parser sees it in the incoming stream:
parser.attach(class DirHandler extends (example.document.dir.constructor) {
/** Fires when the opening <dir> and attributes have been parsed. */
_before() {
console.log('Before ' + this.name + ': ' + JSON.stringify(this));
}
/** Fires when the closing </dir> and children have been parsed. */
_after() {
console.log('After ' + this.name + ': ' + JSON.stringify(this));
}
});
The best part: your code is fully typed with comments pulled from the schema! See the screenshot at the top.
Related projects
- node-xml4js uses schema information to read XML into nicely structured objects.
License
Copyright (c) 2016-2017 BusFaster Ltd