clean-html-js
v1.3.16
Published
extract reading material from a url
Downloads
1,110
Maintainers
Readme
clean-html-js
clean html content for reading. simply pass in your content as html and get a readability object
Installation Instructions
$ yarn add clean-html-js
Example
import cleanHtml from "clean-html-js";
const url = "https://www.a11ywatch.com";
async function grabReaderData() {
const source = await fetch(url);
const html = await source.text();
const readabilityArticle = await cleanHtml(html, url);
}
async function grabReaderDataSimple() {
const readabilityArticle = await cleanHtml("", url);
}
grabReaderData().then((data) => {
console.log(data);
});
// or just the url
grabReaderDataSimple().then((data) => {
console.log(data);
});
- For more help getting started checkout Example
Available Params
| param | default | type | description | | --------- | ------- | ------ | -------------------------------------------------------------------- | | html | "" | string | Required: html string to parse | | sourceUrl | "" | string | Optional: url of the html source to prevent fetching extra resources | | config | {} | Config | Optional: config object |
If html is not provided and sourceUrl is found an attempt to fetch the html is done.
Config
merges with config
| prop | default | type | description | | ----------- | ------- | ---------------- | ------------------------------------------------- | | allowedTags | null | array of strings | html elements allowed note:(svgs must be inlined) | | nonTextTags | null | array of strings | html elements that should not be treated as text |
Testing
to test custom pages pass in your params seperated by commas into the jest test example yarn jest '-params=mozilla,https://www.mozilla.com'
or yarn jest '-params=a11ywatch,https://www.a11ywatch.com'
. First param is the html file being pulled from the examples
folder and the second is an optional uri for the resources.
npm test