npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

mashtml

v1.2.0

Published

html tokenizer

Downloads

6

Readme

mashtml

mashtml - sufficiently compliant tokenizer with a small footprint.

Use cases

  • inspection
  • validation
  • sanitization
  • templating

Quick Start

mashtml is a pure ESM package.

import { tokenStream, tokenArray } from "mashtml";

// handle tokens one by one
tokenStream(`<a href="#h1">top</a>`, console.log);

// or collect tokens in an array
const tokens = tokenArray(`<a href="#h1">top</a>`);

Compliance

This is a specialized tokenizer. It is compliant with the HTML5 specification to the extend that is reasonable for its use case. Specifically, chapter 13.2.5 Tokenization outlines the parser states of which the following have been implemented.

  • [x] 13.2.5.1 Data state
  • [x] 13.2.5.2 RCDATA state
  • [x] 13.2.5.3 RAWTEXT state
  • [x] 13.2.5.4 Script data state
  • [x] 13.2.5.5 PLAINTEXT state
  • [x] 13.2.5.6 Tag open state
  • [x] 13.2.5.7 End tag open state
  • [x] 13.2.5.8 Tag name state
  • [x] 13.2.5.9 RCDATA less-than sign state
  • [x] 13.2.5.10 RCDATA end tag open state
  • [x] 13.2.5.11 RCDATA end tag name state
  • [x] 13.2.5.12 RAWTEXT less-than sign state
  • [x] 13.2.5.13 RAWTEXT end tag open state
  • [x] 13.2.5.14 RAWTEXT end tag name state
  • [x] 13.2.5.15 Script data less-than sign state
  • [x] 13.2.5.16 Script data end tag open state
  • [x] 13.2.5.17 Script data end tag name state
  • [ ] 13.2.5.18 Script data escape start state
  • [ ] 13.2.5.19 Script data escape start dash state
  • [ ] 13.2.5.20 Script data escaped state
  • [ ] 13.2.5.21 Script data escaped dash state
  • [ ] 13.2.5.22 Script data escaped dash dash state
  • [ ] 13.2.5.23 Script data escaped less-than sign state
  • [ ] 13.2.5.24 Script data escaped end tag open state
  • [ ] 13.2.5.25 Script data escaped end tag name state
  • [ ] 13.2.5.26 Script data double escape start state
  • [ ] 13.2.5.27 Script data double escaped state
  • [ ] 13.2.5.28 Script data double escaped dash state
  • [ ] 13.2.5.29 Script data double escaped dash dash state
  • [ ] 13.2.5.30 Script data double escaped less-than sign state
  • [ ] 13.2.5.31 Script data double escape end state
  • [x] 13.2.5.32 Before attribute name state
  • [x] 13.2.5.33 Attribute name state
  • [x] 13.2.5.34 After attribute name state
  • [x] 13.2.5.35 Before attribute value state
  • [x] 13.2.5.36 Attribute value (double-quoted) state
  • [x] 13.2.5.37 Attribute value (single-quoted) state
  • [x] 13.2.5.38 Attribute value (unquoted) state
  • [x] 13.2.5.39 After attribute value (quoted) state
  • [x] 13.2.5.40 Self-closing start tag state
  • [x] 13.2.5.41 Bogus comment state
  • [x] 13.2.5.42 Markup declaration open state
  • [x] 13.2.5.43 Comment start state
  • [x] 13.2.5.44 Comment start dash state
  • [x] 13.2.5.45 Comment state
  • [x] 13.2.5.46 Comment less-than sign state
  • [x] 13.2.5.47 Comment less-than sign bang state
  • [x] 13.2.5.48 Comment less-than sign bang dash state
  • [x] 13.2.5.49 Comment less-than sign bang dash dash state
  • [x] 13.2.5.50 Comment end dash state
  • [x] 13.2.5.51 Comment end state
  • [x] 13.2.5.52 Comment end bang state
  • [x] 13.2.5.53 DOCTYPE state
  • [ ] 13.2.5.54 Before DOCTYPE name state
  • [ ] 13.2.5.55 DOCTYPE name state
  • [ ] 13.2.5.56 After DOCTYPE name state
  • [ ] 13.2.5.57 After DOCTYPE public keyword state
  • [ ] 13.2.5.58 Before DOCTYPE public identifier state
  • [ ] 13.2.5.59 DOCTYPE public identifier (double-quoted) state
  • [ ] 13.2.5.60 DOCTYPE public identifier (single-quoted) state
  • [ ] 13.2.5.61 After DOCTYPE public identifier state
  • [ ] 13.2.5.62 Between DOCTYPE public and system identifiers state
  • [ ] 13.2.5.63 After DOCTYPE system keyword state
  • [ ] 13.2.5.64 Before DOCTYPE system identifier state
  • [ ] 13.2.5.65 DOCTYPE system identifier (double-quoted) state
  • [ ] 13.2.5.66 DOCTYPE system identifier (single-quoted) state
  • [ ] 13.2.5.67 After DOCTYPE system identifier state
  • [x] 13.2.5.68 Bogus DOCTYPE state
  • [ ] 13.2.5.69 CDATA section state
  • [ ] 13.2.5.70 CDATA section bracket state
  • [ ] 13.2.5.71 CDATA section end state
  • [x] 13.2.5.72 Character reference state
  • [ ] 13.2.5.73 Named character reference state
  • [x] 13.2.5.74 Ambiguous ampersand state
  • [x] 13.2.5.75 Numeric character reference state
  • [x] 13.2.5.76 Hexadecimal character reference start state
  • [x] 13.2.5.77 Decimal character reference start state
  • [x] 13.2.5.78 Hexadecimal character reference state
  • [x] 13.2.5.79 Decimal character reference state
  • [x] 13.2.5.80 Numeric character reference end state

Spec compliance is not intended to be complete for the following reasons:

  • Trust is lost when non-empty script tags are encountered. Escaping contents won't bring it back.
  • Doctype details pose no threat and are preserved untreated.
  • The tokenizer is ignorant of document namespaces and will treat CDATA sections as bogus comments.
  • Named character references pose no threat and are preserved untreated.

Tests

mashtml uses the https://github.com/html5lib/html5lib-tests test-suite.

npm test
npm test:coverage

In alignment w/ the compliance requirements, tests involving any of the omitted parser states are skipped. Test criteria for doctypes have been softened. See ./test/html5lib.test.ts for details.