npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@beforesemicolon/html-parser

v0.6.0

Published

HTML parser for any Javascript runtime environment

Downloads

320

Readme

HTML Parser

HTML parser for any Javascript runtime environment. Small, Fast, Easy to use, and highly customizable

npm npm Test

Motivation

Most HTML parsers will force you to learn their Javascript API after the parse result. They won't allow you to tap into the processing to access the nodes as they are parsed or let you create your own API for the final result that adapts to your project instead of the other way around.

This parser

  • Is one of the fastest HTML parser out there averaging 1ms per HTML page of different sizes. Check benchmark.
  • Uses a DOM like API which is a custom Lite DOM built for performance
  • Can use browser DOM API or JsDOM to give you the parsed HTML allowing it to be used in any js runtime environment
  • You can use your own custom DOM API like to gain absolute control
  • Accepts a callback, so you can access the nodes as they are being parsed
  • Super simple to use. No need for extensive options list. Parses everything in a performant way
  • Handles SVG and HTML easily including comments and script tags with HTML inside

Install

Node

npm install @beforesemicolon/html-parser

Browser

<!DOCTYPE html>
<html lang="en">
<head>

  <!-- Grab the latest version -->
  <script src="https://unpkg.com/@beforesemicolon/html-parser/dist/client.js"></script>

  <!-- Or a specific version -->
  <script src="https://unpkg.com/@beforesemicolon/[email protected]/dist/client.js"></script>

</head>
<body></body>
</html>
Good to know
  • Only works with HTML and SVG tags. Duh!
  • Handles custom tags, style, script tags and comments by default without differences in the performance
  • <!Doctype> tag is ignored
  • Honor the format by keeping all white spaces which are returned as text nodes

Usage

By default, it will return a document fragment as root. The API is DOM-like, meaning, if you know the DOM API you already know this. The DOM-like API is minimal and built for performance allowing you to easily use the same code in the browser, Node, Deno or any other javascript runtime environment.

See custom handler section to understand what this Document-like API looks like.

import {parse} from "@beforesemicolon/html-parser";

const frag = parse('<h1>site title</h1>'); // return DocumentFragment-like object

frag.children[0] // h1 Element

This parser works with the DOM API by default so if you want to use it in Node, Deno or any Javascript runtime environment, make sure to import jsDom or similar and provide the Document object.

import * as jsdom from "jsdom";
const {JSDOM} = jsdom;
const document = new JSDOM('').window.document;

// import the parser
import {parse} from "@beforesemicolon/html-parser";

const frag = parse('<h1>site title</h1>', document); // return DocumentFragment

frag.children[0] // h1 Element

Browser

<script>
  const {parse, Doc} = window.BFS;
  
  // uses a like Document-like object by default
  const frag1 = parse('<h1>site title</h1>'); // returns DocumentFragment-like
  
  // use the native DOM Document object
  const frag2 = parse('<h1>site title</h1>', document); // returns DocumentFragment object
  
  frag1.children[0] // h1 Element
  frag2.children[0] // h1 Element
</script>

Callback option

You may also pass a callback function as second parameter which will get called as the nodes are being parsed and created. This will use the document as default so the callback will be get called with DOM Nodes and Element.

const frag = parse('<h1>site title</h1>', (node) => {
  // handle node here
});

Benchmark

The parser itself if fast but depending on the API you use for the final parsed result the performance will varies on their algorithm. Here are two examples using htmlparser-benchmark.

import {parse} from "@beforesemicolon/html-parser";

parse(aReallyMassimeHTMLString);
// avg duration: 1.86113 ms/file ± 1.09698

This is up to 30 times faster than the DOM Document API

Using jsdom Document

This is using the custom jsDom in NodeJs:

import * as jsdom from "jsdom";
import {parse} from "@beforesemicolon/html-parser";

const {JSDOM} = jsdom;
const document = new JSDOM('').window.document;

parse(aReallyMassimeHTMLString, document);
// avg duration: 27.3563 ms/file ± 19.1060`

Creating your custom handler

The best thing about this parser is the ability to crate your own handler to transform HTML into anything you like.

Here is an example of a simple implementation you can start from.

const MyCustomDoc = {
	createComment: (value: string) => ({type: 'comment', value}),
	createTextNode: (value: string) => ({type: 'text', value}),
	createDocumentFragment: () => {
		const children: unknown[] = []

		return {
			type: 'fragment',
			children,
			appendChild: (node: unknown) => {
				children.push(node)
			}
		}
	},
	createElementNS: (namespace: string, tagName: string) => {
		const children: unknown[] = []
		const attributes: Record<string, unknown> = {}

		return {
			namespace,
			tagName,
			children,
			attributes,
			type: 'node',
			appendChild(node: unknown) {
				children.push(node)
			},
			setAttribute(name: string, value: string) {
				attributes[name] = value;
			}
		}
	},
}

const result = parse<typeof MyCustomDoc>(`...`, MyCustomDoc);