html-tokeniser

v0.0.2

Published

a year ago

HTML tokeniser that supports streaming interfaces. Converts HTML into tokens and provides options to transforma and convert those tokens back into a HTML string.

Downloads

0High
0Medium
0Low

wjarosz

#HTML Tokeniser

A streaming tokeniser that uses htmlparser2 to convert string content into a stream of token objects and another transform stream to convert them back into HTML at the other end.

This is something I found myself reusing a bit, a quick html parser that extracted HTML attributes and produced a flat stream of tokens instead of a DOM tree. It will do some conversion of recognised self closing tags so if you have <link rel="stylesheet"></link> it will be transformed into <link rel="stylesheet" />.

##Usage

Files

var t = require('html-tokeniser'),
    fs = require('fs');

fs.createReadStream('file.html')
  .pipe(t.tokeniser())
  .on('data', function (token) {
    console.log(token);
  });

Urls

// npm install request
var request = require('request');

request('http://www.example.com')
  .pipe(t.tokeniser())
  .on('data', function (token) {
    console.log(token);
  });

###Re-forming HTML from tokens

// npm install map-stream
var map = require('map-stream');

request('http://www.example.com')
  .pipe(t.tokeniser())
  .pipe(map(function(token, cb){
    // Transform the token, or remove it
    cb(null, token);
  }))
  .pipe(t.toHTML())
  .pipe(fs.createWriteStream('example.html'));

###tokeniser options

Any options passed to tokeniser will be automatically passed onto htmlparser2

Example:

//...
t.tokeniser({
  recognizeCDATA: false
})
//...

###toHTML options

selfClosingTags A list of tag names to automatically self close. Ensure that these tags do NOT contain content otherwise the rendered result will be screwy. default: ['meta', 'link', 'br', 'hr', 'input', 'img', 'embed', 'keygen', 'base', 'area', 'col', 'command', 'param', 'source', 'track', 'wbr']

Pkg
Stats

Discover Tips

General search

Package details

User packages

Sponsor

About

Twitter

GitHub

Twitter

GitHub

Site

Open Software & Tools

Framework

Server

Data Store

Caching

CSS / Styling

Typeface

Avatars

Data Viz

Date formatting

Infinite scrolling

Markdown rendering

Repository url parsing

User data

Compiling

Types

Odds & Ends

html-tokeniser

v0.0.2

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Files

Urls