npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@giancosta86/wiki-transform

v1.3.0

Published

Stream transforming raw XML into wiki pages

Downloads

13

Readme

wiki-transform

Stream transforming raw XML into wiki pages

GitHub CI npm version MIT License

Overview

wiki-transform provides a WikiTransform hybrid stream for NodeJS: it takes XML chunks and outputs WikiPage objects.

It is an extremely fast stream, because it internally uses a SAX parser combined with a hyper-minimalist algorithm.

Last but not least, WikiTransform is a standard stream, so you can use it in pipelines, or you can manually control it via the usual stream methods.

Installation

npm install @giancosta86/wiki-transform

or

yarn add @giancosta86/wiki-transform

The public API entirely resides in the root package index, so you shouldn't reference specific modules.

Usage

Just create a new instance of WikiTransform - maybe passing options. You will then be able to:

  • add it to a pipeline - via a chain of .pipe() method calls, or via the pipeline() function provided by NodeJS

  • call its standard methods - like .write(), .end(), .on() and .once()

Supported format

WikiTransform will create a WikiPage object whenever it encounters the following XML pattern:

<page>
  <title>The title</title>
  <text>The text</text>
</page>

with the following rules:

  • The order of the subfields is ignored

  • Additional subfields are ignored

  • Ancestor nodes are ignored

  • Whitespace is ignored

  • XML entities like &gt; are substituted with their actual characters

  • CDATA blocks within significant fields are correctly parsed, and can be freely mixed with non-CDATA text

  • in lieu of <page>, the root tag can be something else - just pass the related opening tag (without angle brackets) to the pageTag constructor option

Please, note: this library does NOT support nested tags within the <text> element! To handle them, you should instead rely on dedicated SAX parsing.

Example

This basic but fairly general-purpose function:

  • extracts wiki pages from any source stream actually generating XML chunks - for example, an HTTP connection, or a file

  • outputs such WikiPage objects to the given target stream

import { Readable, Writable } from "node:stream";
import { pipeline } from "node:stream/promises";
import { WikiTransform } from "@giancosta86/wiki-transform";

export async function extractWikiPages(
  source: Readable,
  target: Writable
): Promise<void> {
  const wikiTransform = new WikiTransform();

  return pipeline(source, wikiTransform, target);
}

Constructor parameters

  • pageTag: if present, defines the tag opening each page, without angle brackets. Default: "page"

  • logger: a Logger interface, as exported by unified-logging. Default: no logger

  • highWaterMark: if present, passed to the base constructor

  • signal: if present, passed to the base constructor

Additional notes

As a convenience utility, especially for testing, the package also provides a wikiPageToXml() function, which converts a WikiPage to XML - using a CDATA block in every field.

Further reference

For additional examples, please consult the unit tests in the source code repository.