npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@nacelle/html-serializer

v1.0.5

Published

Convert HTML (or a `hast` syntax tree) to a valid Rich Text `rast` document

Downloads

16

Readme

html-serializer

This package contains utilities to convert HTML (or a Hast to a Rich Text rast (Rich Text Abstract Syntax Tree) document.

Please refer to the rast format docs to learn more about the syntax tree format and the available nodes.

Usage

The main utility in this package is htmlToRichText which takes a string of HTML and transforms it into a valid rast document.

htmlToRichText returns a Promise that resolves with a Rich Text document.

import { htmlToRichText } from 'html-serializer';

const html = `
  <article>
    <h1>Nacelle</h1>
    <p>Headless Commerce Explained</p>
  </article>
`;

htmlToRichText(html).then((richText) => {
  console.log(RichText);
});

htmlToRichText is meant to be used in a browser environment.

In Node.js you can use the parse5ToRichText helper which instead takes a document generated with parse5.

import parse5 from 'parse5';
import { parse5ToRichText } from 'html-serializer';

parse5ToRichText(
  parse5.parse(html, {
    sourceCodeLocationInfo: true,
  }),
).then((richText) => {
  console.log(richText);
});

Internally, both utilities work on a Hast. Should you have a hast already you can use a third utility called hastTorast.

Validate rast documents

rast is a strict format for Rich Text fields. As such the resulting document is generally a simplified, content-centric version of the input HTML.

When possible, the library relies on semantic HTML to generate a valid rast document.

The rich-text-utils package provides a validate utility to validate a value to make sure that the resulting tree is compatible with Rich Text field.

import { validate } from 'rich-text-utils';

// ...

htmlToRichText(html).then((richText) => {
  const { valid, message } = validate(richText);

  if (!valid) {
    throw new Error(message);
  }
});

We recommend to validate every rast to avoid errors later when creating records.

Advanced Usage

Options

All the *ToRichText utils accept an optional options object as second argument:

type Options = Partial<{
  newlines: boolean,
  // Override existing `hast` node handlers or add new ones.
  handlers: Record<string, CreateNodeFunction>,
  // Allows to tweak the `hast` tree before transforming it to a `rast` document.
  preprocess: (hast: HastRootNode) => HastRootNode,
  // Array of allowed Block nodes.
  allowedBlocks: Array<
    BlockquoteType | CodeType | HeadingType | LinkType | ListType,
  >,
  // Array of allowed marks.
  allowedMarks: Mark[],
}>;

Transforming Nodes

The utils in this library traverse a hast tree and transform supported nodes to rast nodes. The transformation is done by working on a hast node with a handler (async) function.

Handlers are associated to hast nodes by tagName or type when node.type !== 'element' and look as follow:

import { visitChildren } from 'html-serializer';

// Handler for the <p> tag.
async function p(createrastNode, hastNode, context) {
  return createrastNode('paragraph', {
    children: await visitChildren(createrastNode, hastNode, context),
  });
}

Handlers can return either a promise that resolves to a rast node, an array of rast Nodes or undefined to skip the current node.

To ensure that a valid rast is generated the default handlers also check that the current hastNode is a valid rast node for its parent and, if not, they ignore the current node and continue visiting its children.

Information about the parent rast node name is available in context.parentNodeType.

Please take a look at the default handlers implementation for examples.

The default handlers are available on context.defaultHandlers.

context

Every handler receives a context object that includes the following information:

export interface GlobalContext {
  // Whether the library has found a <base> tag or should not look further.
  // See https://developer.mozilla.org/en-US/docs/Web/HTML/Element/base
  baseUrlFound?: boolean;
  // <base> tag url. This is used for resolving relative URLs.
  baseUrl?: string;
}

export interface Context {
  // The current parent `rast` node type.
  parentNodeType: NodeType;
  // The parent `hast` node.
  parentNode: HastNode;
  // A reference to the current handlers - merged default + user handlers.
  handlers: Record<string, Handler<unknown>>;
  // A reference to the default handlers record (map).
  defaultHandlers: Record<string, Handler<unknown>>;
  // true if the content can include newlines, and false if not (such as in headings).
  wrapText: boolean;
  // Marks for span nodes.
  marks?: Mark[];
  // Prefix for language detection in code blocks.
  // Detection is done on a class name eg class="language-html"
  // Default is `language-`
  codePrefix?: string;
  // Array of allowed Block types.
  allowedBlocks: Array<
    BlockquoteType | CodeType | HeadingType | LinkType | ListType,
  >;
  // Array of allowed marks.
  allowedMarks: Mark[];
  // Properties in this object are available to every handler as Context
  // is not deeply cloned.
  global: GlobalContext;
}

Custom Handlers

It is possible to register custom handlers and override the default behavior via options:

import { paragraphHandler } from './customHandlers';

htmlToRichText(html, {
  handlers: {
    p: paragraphHandler,
  },
}).then((richText) => {
  console.log(richText);
});

It is highly encouraged to validate the rast when using custom handlers because handlers are responsible for dictating valid parent-children relationships and therefore generating a tree that is compliant with Rich Text.

preprocessing

Because of the strictness of the rast spec it is possible that some semantic or elements might be lost during the transformation.

To improve the final result, you might want to modify the hast before it is transformed to rast with the preprocess hook.

import { findAll } from 'unist-utils-core';
const html = `
  <p>convert this to an h1</p>
`;

htmlToRichText(html, {
  preprocess: (tree) => {
    // Transform <p> to <h1>
    findAll(tree, (node) => {
      if (node.type === 'element' && node.tagName === 'p') {
        node.tagName = 'h1';
      }
    });
  },
}).then((richText) => {
  console.log(richText);
});

Examples

In rast images can be presented as Block nodes but these are not allowed inside of ListItem nodes (ul/ol lists). In this example we will split the list in 3 pieces and lift up the image.

The same approach can be used to split other types of branches and lift up nodes to become root nodes.

import { findAll } from 'unist-utils-core';

const html = `
  <ul>
    <li>item 1</li>
    <li><div><img src="./img.png" alt></div></li>
    <li>item 2</li>
  </ul>
`;

const rast = await htmlToRichText(html, {
  preprocess: (tree) => {
    const liftedImages = new WeakSet();
    const body = find(tree, (node) => node.tagName === 'body');

    visit(body, (node, index, parents) => {
      if (
        !node ||
        node.tagName !== 'img' ||
        liftedImages.has(node) ||
        parents.length === 1 // is a top level img
      ) {
        return;
      }
      // remove image

      const imgParent = parents[parents.length - 1];
      imgParent.children.splice(index, 1);

      let i = parents.length;
      let splitChildrenIndex = index;
      let childrenAfterSplitPoint = [];

      while (--i > 0) {
        // Example: i == 2
        // [ 'body', 'div', 'h1' ]
        const /* h1 */ parent = parents[i];
        const /* div */ parentsParent = parents[i - 1];

        // Delete the siblings after the image and save them in a variable
        childrenAfterSplitPoint /* [ 'h1.2' ] */ = parent.children.splice(
          splitChildrenIndex,
        );
        // parent.children is now == [ 'h1.1' ]

        // parentsParent.children = [ 'h1' ]
        splitChildrenIndex = parentsParent.children.indexOf(parent);
        // splitChildrenIndex = 0

        let nodeInserted = false;

        // If we reached the 'div' add the image's node
        if (i === 1) {
          splitChildrenIndex += 1;
          parentsParent.children.splice(splitChildrenIndex, 0, node);
          liftedImages.add(node);

          nodeInserted = true;
        }

        splitChildrenIndex += 1;
        // Create a new branch with childrenAfterSplitPoint if we have any i.e.
        // <h1>h1.2</h1>
        if (childrenAfterSplitPoint.length > 0) {
          parentsParent.children.splice(splitChildrenIndex, 0, {
            ...parent,
            children: childrenAfterSplitPoint,
          });
        }
        // Remove the parent if empty
        if (parent.children.length === 0) {
          splitChildrenIndex -= 1;
          parentsParent.children.splice(
            nodeInserted ? splitChildrenIndex - 1 : splitChildrenIndex,
            1,
          );
        }
      }
    });
  },
  handlers: {
    img: async (createNode, node, context) => {
      // In a real scenario you would upload the image and get back an id.
      const entry = '123';
      return createNode('block', {
        entry,
      });
    },
  },
});
const html = `
  <ul>
    <li>item 1</li>
    <li><div><img src="./img.png" alt>item 2</div></li>
    <li>item 3</li>
  </ul>
`;
const rast = await htmlToRichText(html, {
  preprocess: (tree) => {
    findAll(tree, (node, index, parent) => {
      if (node.tagName === 'img') {
        // Add the image to the root's children.
        tree.children.push(node);
        // remove the image from the parent's children array.
        parent.children.splice(index, 1);
        return;
      }
    });
  },
  handlers: {
    img: async (createNode, node, context) => {
      // In a real scenario you would upload the image and get back an id.
      const entry = '123';
      return createNode('block', {
        entry,
      });
    },
  },
});

Utilities

To work with hast and rast trees we recommend using the unist-utils-core library.

License

MIT