@thenja/html-parser
v1.1.3
Published
A simple forgiving html parser
Downloads
179
Readme
Html-Parser
A simple forgiving html parser for javascript (browser and nodejs).
Features
- Works in NodeJs or the browser
- Parse HTML that may not be valid
- HTML is parsed into a json object, this object can be modified and converted back into HTML
- Ability to clean HTML, such as remove empty tags and more.
Not supported
- CDATA in html is not supported
How to use
Installation
npm install @thenja/html-parser --save
Typescript
import { HtmlParser } from "@thenja/html-parser";
let htmlParser = new HtmlParser();
Javascript (browser)
<script src="dist/thenja-html-parser.min.js" type="text/javascript"></script>
var htmlParser = new Thenja.HtmlParser();
Parse HTML
Basic usage
let html = "<div><p>Hello world!</p></div>";
let output = htmlParser.parse(html);
Example output
[
{
"type": "tag",
"tagType": "default",
"name": "div",
"attributes": {},
"children": [
{
"type": "tag",
"tagType": "default",
"name": "p",
"attributes": {},
"children": [
{
"type": "text",
"data": "Hello world!"
}
]
}
]
}
]
Parse html and reverse the output
let html = "<div><p>Hello world!</p></div>";
let output = htmlParser.parse(html);
let reversedHtml = htmlParser.reverse(output);
Listen for errors
let html = "<div><p>Hello world!</p></div>";
let output = htmlParser.parse(html, (err) => {
// handle errors here
});
Listen for nodes being added when parsing
// In this example we will replace .jpg extensions with .png
let html = "<div><img src='my-picture.jpg' /></div>";
let output = htmlParser.parse(html, null, (node, parentNode) => {
if(node.name === 'img' && node.attributes && node.attributes.src) {
node.attributes.src = node.attributes.src.replace('.jpg', '.png');
}
});
let newHtml = htmlParser.reverse(output);
// newHtml will equal: <div><img src='my-picture.png' /></div>
Listen for nodes being stringified when reversing
// In this example we will remove the class attribute
let html = "<div class='my-style'></div>";
let output = htmlParser.parse(html);
let newHtml = htmlParser.reverse(output, (node) => {
if(node.name === 'div') {
delete node.attributes['class'];
}
});
// newHtml will equal: <div></div>
Clean up the html
The clean function allows you to remove unwanted html tags (such as empty tags) and empty text nodes.
Available options:
|Options|Description|
|-------|-----------|
|removeEmptyTags|Remove empty html tags, such as <p></p>
|
|removeEmptyTextNodes|Basically remove a text node if it only contains whitespace|
let html = "<div>Hi there<p></p></div>";
// by default, clean options are true, so this is only here for demo purposes
let cleanOptions = { removeEmptyTags: true, removeEmptyTextNodes: true };
let output = htmlParser.parse(html);
output = htmlParser.clean(output, cleanOptions);
let newHtml = htmlParser.reverse(output);
// newHtml will equal: <div>Hi there</div>
Development
npm run init
- Setup the app for development (run once after cloning)
npm run dev
- Run this command when you want to work on this app. It will
compile typescript, run tests and watch for file changes.
Distribution
npm run build -- -v <version>
- Create a distribution build of the app.
-v (version) - [Optional] Either "patch", "minor" or "major". Increase the version number in the package.json file.
The build command creates a /compiled directory which has all the javascript compiled code and typescript definitions. As well, a /dist directory is created that contains a minified javascript file.
Testing
Tests are automatically ran when you do a build.
npm run test
- Run the tests. The tests will be ran in a nodejs environment.
You can run the tests in a browser environment by opening the file
/spec/in-browser/SpecRunner.html.
License
MIT © Nathan Anderson
ToDo
Add in more unit tests
Add in a flattenText() function. This will flatten many nested text nodes into one text node.
<p>My name is <strong>Nathan</strong></p>
Flattened to:
<p>My name is Nathan</p>