@biblioteksentralen/xml-utils

v0.0.1

Published

3 months ago

XML parsing utils

Downloads

0High
0Medium
0Low

`@biblioteksentralen/xml-utils`

XML parsing utils based on @xmldom/xmldom and xpath.

Usage example:

import fs from "node:fs";
import { parseXml } from "@biblioteksentralen/xml-utils";

const data = fs.readFileSync("test-fixtures/marcxchange-v1.xml", "utf-8");

const xml = parseXml(data, {
  namespaces: {
    marc: "info:lc/xmlns/marcxchange-v1",
  },
});

const recordIds = xml
  .elements("//marc:record") // returns XmlElement[]
  .map((record) => record.text("./marc:controlfield[@tag='001']"));

console.log(recordIds);

A helper method is also included to remove namespaces. This uses quite simple regular expressions, so use at your own risk, it should be safe for most documents, but is likely to fail in edge cases.

import { parseXml, stripNamespaces } from "@biblioteksentralen/xml-utils";

const xml = parseXml(stripNamespaces(data));

const recordIds = xml
  .elements("//record")
  .map((record) => record.text("./controlfield[@tag='001']"));

Why not use fast-xml-parser (or something similar)?

fast-xml-parser outputs nice and friendly JSON for simple XML documents, but that is because it ignores attributes and namespaces and doesn't preserve ordering by default. So it excels at converting XML documents that should probably have been JSON in the first place. It can be configured to not ignore attributes and namespaces and to preserve ordering, but then the output is suddenly quite verbose since all elements then get an extra level – and no longer something that can easily be used to infer a nice-looking JSON schema.
Any XML element can be repeated and you cannot see from the structure alone which ones are repeated and not. fast-xml-parser solves this by guessing, so the author field for a book with one author becomes an object, while the same field for a book with multiple authors becomes an array field. It can be configured to always parse specific fields as arrays, but it's hard to know if you have an exhaustive list without knowing the source really well.
(We don't need the "fast" part since I/O will usually be the bottleneck and we're not doing time-sensitive stuff)

Pkg
Stats

Discover Tips

General search

Package details

User packages

Sponsor

About

Twitter

GitHub

Twitter

GitHub

Site

Open Software & Tools

Framework

Server

Data Store

Caching

CSS / Styling

Typeface

Avatars

Data Viz

Date formatting

Infinite scrolling

Markdown rendering

Repository url parsing

User data

Compiling

Types

Odds & Ends

@biblioteksentralen/xml-utils

v0.0.1

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

`@biblioteksentralen/xml-utils`