als-document
v1.4.0
Published
A powerful HTML parser & DOM manipulation library for both backend and frontend.
Downloads
108
Maintainers
Readme
als-document: HTML Parser & DOM Manipulation Library
Overview
als-document
is a powerful library for parsing HTML and XML, building and manipulating virtual DOM structure on backend and frontend. It provides a robust and intuitive API for querying and interacting with DOM elements using selectors, making it a valuable tool for web developers.
Installation
To install the als-document
library, use the following npm command:
npm i als-document
Including the Library
The library provides three different files to cater to different module systems:
- index.js: This file uses the CommonJS module system. It's suitable for projects using Node.js or bundlers like Browserify or Webpack. The entry point in
package.json
for this file is "main".
const { parseHTML, Node, Query, TextNode, SingleNode, Root, Document } = require('als-document');
- index.mjs: This file uses the ES Modules (ESM) system. It's suitable for modern JavaScript environments that support ESM. The entry point in
package.json
for this file is "module".
import { parseHTML, Node, Query, TextNode, SingleNode, Root, Document } from 'als-document';
- document.js: By including this file, a constant variable named
alsDocument
is created, which wraps all the exports.
<script src="/node_modules/als-document/document.js"></script>
<script>
const { parseHTML, Node, Query, TextNode, SingleNode, buildFromCache, cacheDoc, Root, Document } = alsDocument
</script>
Change log for 1.3
- added getter and setter for node.innerText
- prev and next now works with childIndex=0
- querySelctor not includes the parent any more
- Document new getters and setters include clone
- tagName - uppers, _tagName - lowers
parseHTML
parseHTML
is a function that takes an HTML string and constructs a DOM tree representation from it. It recognizes various HTML elements, such as comments, scripts, styles, and CDATA, and organizes them into nodes that can be manipulated and queried.
API:
parseHTML(html: string) -> Node
Parses an HTML string and returns a tree structure representing its content.
html
: The HTML string to parse.Returns
: A Node object representing the root of the parsed HTML content tree.
Expected Outcome:
When using the parseHTML function, the output will be a tree of nodes representing the HTML content. Each node can be one of the following:
- Node: A standard HTML element node with tag name, attributes, and child nodes.
- SingleNode: Represents self-closing or void HTML elements.
- TextNode: Represents text content in the HTML.
Each node will have a tag name, a dictionary of attributes, and a list of child nodes (if applicable).
Examples
const parsedHTML = parseHTML('<div class="container"><img src="image.jpg" alt="Image"/><p>Hello, world!</p></div>');
// The returned `parsedHTML` object will be a tree-like structure.
// For instance, parsedHTML.childNodes[0] would represent the <div> element,
// and parsedHTML.childNodes[0].childNodes[0] would represent the <img> element inside it.
const parsedScript = parseHTML('<script>console.log("Hello, world!");</script>');
// The returned `parsedScript` object will contain a `script` Node with a child node
// holding the JavaScript code as text content.
Remember, the actual tree structure will be more complex and detailed, but the provided examples give you a basic understanding of how to navigate through the parsed result.
Node
Node
is a fundamental class that represents an element node in the DOM tree. It provides functionality similar to the native DOM API in browsers, but with its own implementation.
Properties:
- tagName: Represents the tag name of the element (upper cased).
- _tagName: Represents the tag name of the element (lower cased).
- innerText
- attributes: A dictionary of attributes and their values.
- childNodes: An array of child nodes for the element.
- isSingle: Boolean value to check if the node is a self-closing tag.
- parentNode, previousElementSibling, nextElementSibling, children: Navigation properties to move through the DOM tree.
- dataset, classList, style: Special properties for interacting with
data-*
attributes, classes, and inline styles.
Methods:
- getAttribute, setAttribute, removeAttribute: Manipulate element's attributes.
- remove: Removes the element from its parent.
- innerHTML, outerHTML: Get and set the inner or entire HTML of the element.
- querySelector, querySelectorAll: Find elements within the node based on CSS-like selectors.
- limits: pseudo selector like
:first-of-type
or:checked
not available - namaspace for tags
some:namspace
available - there are additional methods
$
forquerySelector
and$$
forquerySelectorAll
- limits: pseudo selector like
- getElementsByClassName, getElementsByTagName, getElementById: Get elements by class, tag, or id respectively.
- insertAdjacentElement, insertAdjacentHTML, insertAdjacentText: Insert content relative to the element.
- appendChild: Add a child node to the element.
- insert(place,element): place (0-3) or beforebegin,afterbegin,... eleemnt - raw html or element
Examples:
const div = new Node('div');
div.setAttribute('class', 'container');
const img = new SingleNode('img', { src: 'image.jpg', alt: 'An image' });
div.appendChild(img);
console.log(div.outerHTML); // Outputs: <div class="container"><img src="image.jpg" alt="An image"></div>
const p = new Node('p',{},div); // adding as last child to parent div
p.textContent = "Hello, world!";
const foundP = div.querySelector('p');
console.log(foundP.textContent); // Outputs: Hello, world!
SingleNode
SingleNode
extends from the Node
class and represents elements that don't have closing tags (self-closing tags) in HTML. Examples include <img>
, <br>
, and <!DOCTYPE>
. This class has restricted methods and properties since these elements can't have child nodes.
TextNode
TextNode
is a class that represents text content within the DOM. A TextNode holds raw text data and does not have child nodes.
Document node (extends Node)
Has additional getters and setters:
- get documentElement
- get html
- get head
- get body
- get title
- get charset
- set title
- get clone - return cloned new instance of Document
Query
The Query
class is designed to parse CSS selector strings and transform them into a structured object format, providing detailed insights into each selector and its components.
By using the class, one can expect to transform a CSS selector string into an array of objects.
Each object will represent a selector, containing detailed information such as its tag, identifier, classes, attributes, and associated selectors if any. This can be useful for further processing or analysis of CSS selectors in an application.
Example
let q1 = 'html>body>div.tabs~.some[type $= "radio and some"]>p+div>.some-id .tab-content~input[disabled] div.some'
let result = new Query(q1).selectors
let result1 = Query.get(q1)
// result and result1 has to be same
console.log(result)
Result:
[
{
"query": "div.some",
"tag": "div",
"classList": [
"some"
],
"ancestors": [
{
"query": ".some-id",
"classList": [
"some-id"
],
"parents": [
{
"query": "div",
"tag": "div"
}
],
"prev": {
"query": "p",
"tag": "p",
"parents": [
{
"query": ".some[0]",
"classList": [
"some"
],
"attribs": [
{
check:(f),
"query": "[type$=\"radio and some\"]",
"name": "type",
"value": "radio and some",
"sign": "$="
}
]
}
],
"prevAny": {
"query": "div.tabs",
"tag": "div",
"classList": [
"tabs"
],
"parents": [
{
"query": "html",
"tag": "html"
},
{
"query": "body",
"tag": "body"
}
]
},
"group": "html>body>div.tabs~.some[0]>p"
},
"group": "html>body>div.tabs~.some[0]>p+div>.some-id"
},
{
"query": "input[1]",
"tag": "input",
"attribs": [
{
"query": "[disabled]",
"name": "disabled"
}
],
"prevAny": {
"query": ".tab-content",
"classList": [
"tab-content"
]
},
"group": ".tab-content~input[1]"
}
],
"group": "html>body>div.tabs~.some[type $= \"radio and some\"]>p+div>.some-id .tab-content~input[disabled] div.some"
}
]
Attribs and check function
if attribute has value, attrib object will contain check function with one parameter for value to check.
let s = Query.get('[test^="some"]')[0]
console.log(s.attribs[0].check('some value test')) // true
buildFromCache and cacheDoc
Building DOM from raw html, usually takes tens of milliseconds. But now, you can build DOM once and save it's cache as regular stringified JSON. The caching process and building from cache takes less then 5ms for each and require realy low resources.
How it works?
const html = `` // some real html 255KB
const root = parseHTML(html); // 31.9ms
const cache = cacheDoc(root); // 2.4ms
const root1 = buildFromCache(cache); // 1.2ms
console.log(root.inneHTML === root1.innerHTML) // true