@candlelib/html
v0.2.10
Published
HTML Parser and DOM Polyfill
Downloads
11
Readme
CandleLibrary HTML is a HTML parser that builds a node graph of HTML elements. It provides methods for hooking into the parsing process to generate custom HTML node graphs.
Install
NPM
npm install --save @candlelib/html
Usage
note: This script uses ES2015 module syntax, and has the extension .mjs. To include this script in a project, you may need to use the node flag
--experimental-modules
; or, use a bundler that supports ES modules, such as rollup.
import html from "@candlelib/html"
html(`<div><a>hello world!</a></div>`).then(root=>{
root.tag //=> div
root.get //
})
Notes
CandleLibrary HTML makes use of a none standard attribute to provide asynchronous HTML building. The url
attribute can be used to fetch arbitrary data and insert that into the inner HTML of the element that has the attribute.
e.g.
<!--file src.html -->
<h1>
<button style="background-color:red">Don't Touch</button>
</h1>
In Javascript
//javascript file in same folder
html(`<div url="./src.html"></div>`).then( root=>{
const button = root.getTag("button", true)[0];
button.toString() //=> "<button style="background-color:red">Don't Touch</button>"
})
Members
HTMLNode
mixin @candlelib/ll - tree
import {HTMLNode} from "@candlelib/html"
Constructor
new HTMLNode ( )
Properties
class - String The class attribute value.
classList - Array Array of all class values.
DTD - Boolean True if the HTMLNode is a DTD element, such as a comment or
<!DOCTYPE>
.id - String The id attribute value.
nextElementSibling - HTMLNode Returns the next sibling HTMLNode or null
parentElement - HTMLNode Returns the parent HTMLNode or null;
previousElementSibling HTMLNode Returns the previous sibling HTMLNode or null
single - Boolean True if the element is a single tag element, such as <input>
tag - String The tag name of the object.
tagName - String Same as
tag
.type (Read-Only) - Number
0
(HTML
).url - CandleLibrary URL If the element tag in the orignal HTML string contained an attribute named url, then value of that attribute is applied to
url
.
Methods
HTMLElement - build ( [ parent ] ) Builds an HTMLElement tree from parsed nodes. If an HTMLNode is passed as
parent
, the HTMLElements will be appended toparent
.Object - getAttrib ( prop ) Returns the value of an attribute whose name matches
prop
, or it returns null if no attributes match the value.String - getClass ( class_name [ , INCLUDE_DESCENDANTS [ , array ] ] ) Returns an array of HTMLNodes that have values in their class attribute that matches
_class
. IfINCLUDE_DESCENDANTS
is set to true, all descendants of the node will searched, otherwise only the immediate children of the node will be searched. An optional Array can be passed asarray
to store the results in.String - getID ( id [ , INCLUDE_DESCENDANTS ] ) Returns an array of HTMLNodes whose id property matches
id
. IfINCLUDE_DESCENDANTS
is set to true, all descendants of the node will searched, otherwise only the immediate children of the node will be searched. An optional Array can be passed asarray
to store the results in.String - getTag ( tag [ , INCLUDE_DESCENDANTS [ , array ] ] ) Returns an array of HTMLNodes whose tag property matches
tag
. IfINCLUDE_DESCENDANTS
is set to true, all descendants of the node will searched, otherwise only the immediate children of the node will be searched. An optional Array can be passed asarray
to store the results in.Promise - parse ( lex [ , url ] ) Parses HTML string. Accepts a Whind Lexer or a string as the value for
lex
.String - toString ( [ offset ] ) Returns a string representation of the HTMLNode. This rebuilds the original HTML string starting at the calling node. A number can passed to
offset
to indent string offset spaces.
Private
TextNode - createTextNode ( lex , start , end ) Called by
parseRunner
to create a newTextNode
.parseOpenTag ( lex , DTD , old_wurl ) Called by
parseRunner
to parse an open HTML tag.parseRunner ( lex , OPENED , IGNORE_TEXT_TILL_CLOSE_TAG , parent , last_url) Called by various methods to continue parsing an HTML input string.
Hooks - Methods that can be overridden in derived objects
HTMLNode - createHTMLNodeHook ( tag , start ) Override this method to create a different node type for the given value of
tag
. Thestart
value is the character position offset at the start of the element open tag.If overridden, returned object should support:
- Linked List methods and properties provided by @candlelib/ll mixins.
- All properties and methods in HTMLNode
Boolean - endOfElementHook ( lex , parent ) Override this method to hook into the last stage of element parsing.
lex
will be set to just after the close tag of the element within the input string. The value oflex.off
combined with thestart
value passed increateHTMLNodeHook
define the bounds of the element in the input string, starting at the beginning of the open tag (start
) through to the end the>
character of the close tag (lex.off
).parent
is the parent HTMLNode.Boolean - ignoreTillHook ( tag ) Override this method and return true to tell the parser to not to parse inner HTML data of a tag and simply skip over it.
Object - processAttributeHook ( name , lex ) Override this method to parse attribute data. The returned object of this function should contain
name
andvalue
properties to allow the object to work with thegetAttrib
function eg:return {name:"id", value:"mango"}
. If null is returned instead, nothing will be inserted into theattributes
array.name
is a string value with the name of the attribute in the original HTML.lex
is a fenced Whind Lexer that contains the string value of the attribute.
Promise or null - processFetchHook ( lexer , OPENED , IGNORE_TEXT_TILL_CLOSE_TAG , parent , url ) Override this method to process how a url based resource is fetched.
If overridden:
This function should return either null or a Promise. If a Promise is returned, the parser will wait until the promise is resolved. This enables external content to be fetched and parsed.
If you want to continue processing the returned data with the HTMLNode parse mechanism, call
this.parseRunner
, and pass the string value of the fetched data wrapped in a Whind Lexer, OPENED , IGNORE_TEXT_TILL_CLOSE_TAG, parent, and url to the function. Passing these values will preserve the state of the parser.
e.g:
import whind from "@candlelib/whind" /*... ... ...*/ DerivedNode.prototype.processFetchHook = function(lexer, OPENED, IGNORE_TEXT_TILL_CLOSE_TAG, parent, url){ return fetch(url) .then(res => {res.text() .then(txt => this.parsesRunner(whind(txt), OPENED, IGNORE_TEXT_TILL_CLOSE_TAG, parent, url)) }) }
Warning: It is up to the implementer to follow best practices when dealing with external data with regard to client and server safety. Additional issues can occur if URL recursion is not taken into account, which can lead to an infinite fetching loop within the parser! Check that the URL has not already been fetched by an ancestor HTMLNode before attempting to fetch a resource.
TextNode - processTextNodeHook ( lex , IS_INNER_HTML ) Override this to process inner HTML text before creating and returning a TextNode. If null is returned, then the text data will be omitted from the resulting HTMLNode tree.
lex
is a fenced Whind Lexer that contains the raw text data that is to inserted into the TextNode.IS_INNER_HTML
a Boolean value set to true if the lex data contains the entirety of the elements inner HTML. If false, then the data is the text data between sibling HTMLNodes.
Boolean - selfClosingTagHook ( tag ) Override this method and return
true
to tell the parser that the HTML tag nametag
is self closing and to not look for a matching close tag. e.g.return (tag === "input") ? true : false;
TextNode
mixin @candlelib/ll - tree
import {TextNode} from "@candlelib/html"
Constructor
new TextNode ( [ str ] )
Properties
txt - String The string contents of the node.
type (Read-Only) - Number
1
(TEXT
)
Methods
HTMLTextNode - build ( ) Builds a and returns a HTMLTextNode.
String - toString ( [ offset ] ) Returns a string representation of the TextNode. A number can passed to
offset
to indent stringoffset
spaces.