high5

v1.0.0

Published

3 years ago

(eventually) spec-compliant html 5 parser

Downloads

0High
0Medium
0Low

feedic

html5 parser tokenizer

#high5

(eventually) spec-compliant html5 parser

###Goals

My previous HTML parser, htmlparser2, reached a point where a clean cut was needed. high5 is this cut, even though it's based on htmlparser2 and will try to be backwards compatible (I even tried to preserve the git history, so all previous committers are still credited).

Some of the things that will be supported:

[x] doctypes were treated as processing instructions & not parsed at all.
[x] Several token types that were previously handled as processing instruction tokens are handled as (bogus) comments in the HTML5 spec.
[ ] The xmlMode option will still be available & conditionally switch these features on.
[ ] Add a document mode. (htmlparser2 is always in fragment mode, meaning that eg. the empty document ("") will result in an empty DOM.)
[ ] Implicit opening & closing tags. (htmlparser2 only checks the top element of the stack for the latter.)
[ ] Foster parenting (eg. <table><a>foo</a>… should be handled as <a>foo</a><table>…).
[ ] (Potentially) handle character encodings (?).

###State

Spec-compliant* tokenizer
Rudimentary tag-handling (still a long way to go, only marginally better than htmlparser2).

* The tokenizer takes several shortcuts, which greatly increase the speed of a JavaScript implementation, but disobay the spec implementation-wise. The output should be spec-compliant, though.

Pkg
Stats

Discover Tips

General search

Package details

User packages

Sponsor

About

Twitter

GitHub

Twitter

GitHub

Site

Open Software & Tools

Framework

Server

Data Store

Caching

CSS / Styling

Typeface

Avatars

Data Viz

Date formatting

Infinite scrolling

Markdown rendering

Repository url parsing

User data

Compiling

Types

Odds & Ends

high5

v1.0.0

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme