npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

union-replacer

v2.0.1

Published

One-pass String.prototype.replace-like processor with multiple regexps and replacements

Downloads

128,299

Readme

UnionReplacer

UnionReplacer provides one-pass global search and replace functionality using multiple regular expressions and corresponging replacements. Otherwise the behavior matches String.prototype.replace(regexp, newSubstr|function).

Outline

Installation and usage

In browsers:

<script src="https://unpkg.com/union-replacer/dist/union-replacer.umd.js" />

Using npm:

npm install union-replacer

In Node.js:

const UnionReplacer = require('union-replacer');

With TypeScript:

// with esModuleInterop enabled in tsconfig (recommended):
import UnionReplacer from 'union-replacer';
// without esModuleInterop enabled in tsconfig:
import * as UnionReplacer from 'union-replacer';
// regardless esModuleInterop setting:
import UnionReplacer = require('union-replacer');

Synopsis

replacer = new UnionReplacer(replace_pairs, [flags])
newStr = replacer.replace(str)

Parameters

  • replace_pairs: array of [regexp, replacement] arrays, where
  • flags: regular expression flags to be set on the main underlying regexp, defaults to gm.

API updates

  • v2.0 removes the addReplacement() method, see #4 for details.
  • v2.0 introduces TypeScript type definitions along with precise JSDoc type definitions.

Examples

Convenient one-pass escaping of HTML special chars

const htmlEscapes = [
  [/</, '&lt;'],
  [/>/, '&gt;'],
  [/"/, '&quot;'],

  // not affected by the previous replacements producing '&'
  [/&/, '&amp;']
];
const htmlEscaper = new UnionReplacer(htmlEscapes);
const toBeHtmlEscaped = '<script>alert("inject & control")</script>';
console.log(htmlEscaper.replace(toBeHtmlEscaped));

Output:

&lt;script&gt;alert(&quot;inject &amp; control&quot;)&lt;/script&gt;

Simple Markdown highlighter

Highlighting Markdown special characters while preserving code blocks and spans. Only a subset of Markdown syntax is supported for simplicity.

const mdHighlighter = new UnionReplacer([

  // opening fence = at least three backticks
  // closing fence = opening fence or longer
  // regexp backreferences are ideal to match this
  [/^(`{3,}).*\n([\s\S]*?)(^\1`*\s*?$|\Z)/, (match, fence1, pre, fence2) => {
    let block = `<b>${fence1}</b><br />\n`
    block += `<pre>${htmlEscaper.replace(pre)}</pre><br />\n`
    block += `<b>${fence2}</b>`
    return block;
  }],

  // Code spans are delimited by two same-length backtick strings.
  // Note that backreferences within the regexp are numbered as usual,
  // i.e. \1 still means first capturing group.
  // Union replacer renumbers them when composing the final internal regexp.
  [/(^|[^`])(`+)(?!`)(.*?[^`]\2)(?!`)/, (match, lead, delim, code) => {
    return `${htmlEscaper.replace(lead)}<code>${htmlEscaper.replace(code)}</code>`
  }],

  // Subsequent replaces are performed only outside code blocks and spans.
  [/[*~=+_-`]+/, '<b>$&</b>'],
  [/\n/, '<br />\n']

// HTML entity-like strings would be interpreted too
].concat(htmlEscapes));

const toBeMarkdownHighlighted = '\
**Markdown** code to be "highlighted"\n\
with special care to fenced code blocks:\n\
````\n\
_Markdown_ within fenced code blocks is not *processed*:\n\
```\n\
Even embedded "fence strings" work well with **UnionEscaper**\n\
```\n\
````\n\
*CommonMark is sweet & cool.*';

console.log(mdHighlighter.replace(toBeMarkdownHighlighted));

Produces:

<b>**</b>Markdown<b>**</b> code to be &quot;highlighted&quot;<br />
with special care to fenced code blocks:<br />
<b>````</b><br />
<pre>_Markdown_ within fenced code blocks is not *processed*:
```
Even embedded &quot;fence strings&quot; work well with **UnionEscaper**
```
</pre><br />
<b>````</b><br />
<b>*</b>CommonMark is sweet &amp; cool.<b>*</b>

Conservative markdown escaping

The code below escapes text, so that special Markdown sequences are protected from interpreting. Two considerations are applied:

  1. Avoid messing the output with too many unnecessary escapings.
  2. GFM autolinks are a special case, as escaping the special chars in them would cripple the result of rendering. We need to detect them and keep them untouched.
const mdEscaper = new UnionReplacer([

  // Keep urls untouched (simplified for demonstration purposes).
  // The same should apply for GFM email autolinks.
  [/\bhttps?:\/\/(?!\.)(?:\.?[\w-]+)+(?:[^\s<]*?)(?=[?!.,:*~]*(?:\s|$))/, '$&'],

  // global backslash escapes
  [/[\\*_[\]`&<>]/, '\\$&'],

  // backslash-escape at line start
  [/^(?:~~~|=+)/, '\\$&'],

  // strike-through w/o lookbehinds
  [/~+/, m => m.length == 2 ? `\\${m}` : m],

  // backslash-escape at line start if followed by space
  [/^(?:[-+]|#{1,6})(?=\s)/, '\\$&'],

  // backslash-escape the dot to supress ordered list
  [/^(\d+)\.(?=\s)/, '$1\\. ']
]);

const toBeMarkdownEscaped = '\
A five-*starred* escaper:\n\
1. Would preserve _underscored_ in the http://example.com/_underscored_/ URL.\n\
2. Would also preserve backspaces (\\) in http://example.com/\\_underscored\\_/.';

console.log(mdEscaper.replace(toBeMarkdownEscaped));

Produces:

A five-\*starred\* escaper:
1\.  Would preserve \_underscored\_ in the http://example.com/_underscored_/ URL.
2\.  Would also preserve backspaces (\\) in http://example.com/\_underscored\_/.

Background

The library has been created to support complex text processing in situations when certain configurability is desired. The initial need occured when using the Turndown project. It is a an excellent and flexible tool, but we faced several hard-to-solve difficulties with escaping special sequences.

Without UnionReplacer

When text processing with several patterns is required, there are two approaches:

  1. Iterative processing of the full text, such as
    // No UnionEscaper
    return unsafe
      .replace(/&/g, '&amp;')
      .replace(/</g, '&lt;')
      .replace(/>/g, '&gt;')
      .replace(/"/g, '&quot;')
    The issue is not only the performance. Since the subsequent replacements are performed on a partially-processed result, the developer has to ensure that no intermediate steps affect the processing. E.g.:
    // No UnionEscaper
    return 'a "tricky" task'
      .replace(/"/g, '&quot;')
      .replace(/&/g, '&amp;')
    // desired: 'a &quot;tricky&quot; task'
    // actual: 'a &amp;quot;tricky&amp;quot; task'
    So 'a "tricky" task' became 'a &quot;tricky&quot; task'. This particular task is manageable with carefuly choosing the processing order. But when the processing is context-dependent, iterative processing becomes impossible.
  2. One-pass processing using regexp with alternations, which is correct, but it might easily become overly complex, hard to read and hard to manage. As one can see, the result seems pretty static and very fragile in terms of keeping track of all the individual capture groups:
    // No UnionEscaper
    const mdHighlightRe = /(^(`{3,}).*\n([\s\S]*?)(^\2`*\s*?$|\Z))|((^|[^`])(`+)(?!`)(.*?[^`]\7)(?!`))|([*~=+_-`]+)|(\n)|(<)|(>)|(")|(&)/gm
    return md.replace(mdHighlightRe,
      (match, fenced, fence1, pre, fence2, codespan, lead, delim, code, special, nl, lt, gt, quot, amp) => {
        if (fenced) {
          let block = `<b>${fence1}</b><br />\n`
          block += `<pre>${htmlEscaper.replace(pre)}</pre><br />\n`
          block += `<b>${fence2}</b>`
          return block;
        } else if (codespan) {
          return `${myHtmlEscape(lead)}<code>${myHtmlEscape.replace(code)}</code>`
        } else if (special) {
          return `<b>${special}</b>`
        } else if (nl) {
          return '<br />\n'
        } // else etc.
      });

Introducing UnionReplacer

Iterative processing is simple and well-readable, though it is very limited. Developers are often trading simplicity for bugs.

While regexp with alternations is the way to go, we wanted to provide an easy way to build it, use it and even allow its variable composition in runtime.

Instead of using a single long regular regexp, developers can use an array of individual smaller regexps, which will be merged together by the UnionReplacer class. Its usage is as simple as in the iterative processing approach.

Features

  • Fast. The processing is one-pass and native regexps are used. There might be a tiny resource penalty when initially constructing the internal compound regexp.
  • Supports regexp backreferences. Backreferences in the compound regexp are renumbered, so that the user does not have to care about it.
  • Supports also ES2018 named capture group. See limitations.
  • You can reuse everything used with String.prototype.replace(), namely:
    • String replacements work the very same.
    • Function replacements work the same with just a subtle difference for ES2018 named capture groups.
  • Standard regexp alternation semantics. The first replace that matches consumes the match from input, no matter how long the match is. An example follows.

Alternation semantics

// The order of replaces is important
const replacer1 = new UnionReplacer([
  [/foo/, '(FOO)'],    // when foo is matched, subsequent parts are not examined
  [/.+/, '(nonfoo)'] // no mather that this also matches foo
]);
// replacer1 still eats the rest of the inputwhen foo is not matched
const replacer2 = new UnionReplacer([
  [/foo/, '(FOO)'],
  [/.+?(?=foo|$)/, '(nonfoo)'] // non-greedy match up to next foo or line end
]);
const text = 'foobarfoobaz'
replacer1.replace(text); // (FOO)(nonfoo)
replacer2.replace(text); // (FOO)(nonfoo)(FOO)(nonfoo)

Performance

Most important, the code was written with performance in mind.

In runtime, UnionReplacer performs one-pass processing driven by a single native regexp. The replacements are always done as an arrow function internally, even for string replacements. The eventual performance impact of this would be engine-dependent.

Feel free to benchmark the library and please share the results.

Limitations

Named capture groups

ES2018 named capture groups work with the following limitations:

  • Replacement functions are always provided with all the named captures, i.e. not limited to the matched rule.
  • Capture group names must be unique amongst all capture rules.

Octal escapes

Not supported. The syntax is the same as backreferences (\1) and their interpretation is input-dependent even in native regexps. It is better to avoid them completely and use hex escapes instead (\xNN).

Regexp flags

Any flags in paticular search regexps are ignored. The resulting replacement has always the flags from constructor call, which defaults to global (g) and multiline (m).