npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

als-document

v1.4.0

Published

A powerful HTML parser & DOM manipulation library for both backend and frontend.

Downloads

108

Readme

als-document: HTML Parser & DOM Manipulation Library

Overview

als-document is a powerful library for parsing HTML and XML, building and manipulating virtual DOM structure on backend and frontend. It provides a robust and intuitive API for querying and interacting with DOM elements using selectors, making it a valuable tool for web developers.

Installation

To install the als-document library, use the following npm command:

npm i als-document

Including the Library

The library provides three different files to cater to different module systems:

  1. index.js: This file uses the CommonJS module system. It's suitable for projects using Node.js or bundlers like Browserify or Webpack. The entry point in package.json for this file is "main".
const { parseHTML, Node, Query, TextNode, SingleNode, Root, Document } = require('als-document');
  1. index.mjs: This file uses the ES Modules (ESM) system. It's suitable for modern JavaScript environments that support ESM. The entry point in package.json for this file is "module".
import { parseHTML, Node, Query, TextNode, SingleNode, Root, Document } from 'als-document';
  1. document.js: By including this file, a constant variable named alsDocument is created, which wraps all the exports.
<script src="/node_modules/als-document/document.js"></script>
<script>
   const { parseHTML, Node, Query, TextNode, SingleNode, buildFromCache, cacheDoc, Root, Document } = alsDocument
</script>

Change log for 1.3

  • added getter and setter for node.innerText
  • prev and next now works with childIndex=0
  • querySelctor not includes the parent any more
  • Document new getters and setters include clone
  • tagName - uppers, _tagName - lowers

parseHTML

parseHTML is a function that takes an HTML string and constructs a DOM tree representation from it. It recognizes various HTML elements, such as comments, scripts, styles, and CDATA, and organizes them into nodes that can be manipulated and queried.

API:

parseHTML(html: string) -> Node

Parses an HTML string and returns a tree structure representing its content.

  • html: The HTML string to parse.
  • Returns: A Node object representing the root of the parsed HTML content tree.

Expected Outcome:

When using the parseHTML function, the output will be a tree of nodes representing the HTML content. Each node can be one of the following:

  • Node: A standard HTML element node with tag name, attributes, and child nodes.
  • SingleNode: Represents self-closing or void HTML elements.
  • TextNode: Represents text content in the HTML.

Each node will have a tag name, a dictionary of attributes, and a list of child nodes (if applicable).

Examples

const parsedHTML = parseHTML('<div class="container"><img src="image.jpg" alt="Image"/><p>Hello, world!</p></div>');

// The returned `parsedHTML` object will be a tree-like structure. 
// For instance, parsedHTML.childNodes[0] would represent the <div> element, 
// and parsedHTML.childNodes[0].childNodes[0] would represent the <img> element inside it.
const parsedScript = parseHTML('<script>console.log("Hello, world!");</script>');

// The returned `parsedScript` object will contain a `script` Node with a child node 
// holding the JavaScript code as text content.

Remember, the actual tree structure will be more complex and detailed, but the provided examples give you a basic understanding of how to navigate through the parsed result.

Node

Node is a fundamental class that represents an element node in the DOM tree. It provides functionality similar to the native DOM API in browsers, but with its own implementation.

Properties:

  • tagName: Represents the tag name of the element (upper cased).
  • _tagName: Represents the tag name of the element (lower cased).
  • innerText
  • attributes: A dictionary of attributes and their values.
  • childNodes: An array of child nodes for the element.
  • isSingle: Boolean value to check if the node is a self-closing tag.
  • parentNode, previousElementSibling, nextElementSibling, children: Navigation properties to move through the DOM tree.
  • dataset, classList, style: Special properties for interacting with data-* attributes, classes, and inline styles.

Methods:

  • getAttribute, setAttribute, removeAttribute: Manipulate element's attributes.
  • remove: Removes the element from its parent.
  • innerHTML, outerHTML: Get and set the inner or entire HTML of the element.
  • querySelector, querySelectorAll: Find elements within the node based on CSS-like selectors.
    • limits: pseudo selector like :first-of-type or :checked not available
    • namaspace for tags some:namspace available
    • there are additional methods $ for querySelector and $$ for querySelectorAll
  • getElementsByClassName, getElementsByTagName, getElementById: Get elements by class, tag, or id respectively.
  • insertAdjacentElement, insertAdjacentHTML, insertAdjacentText: Insert content relative to the element.
  • appendChild: Add a child node to the element.
  • insert(place,element): place (0-3) or beforebegin,afterbegin,... eleemnt - raw html or element

Examples:

const div = new Node('div');
div.setAttribute('class', 'container');

const img = new SingleNode('img', { src: 'image.jpg', alt: 'An image' });
div.appendChild(img);

console.log(div.outerHTML);  // Outputs: <div class="container"><img src="image.jpg" alt="An image"></div>

const p = new Node('p',{},div); // adding as last child to parent div
p.textContent = "Hello, world!";

const foundP = div.querySelector('p');
console.log(foundP.textContent);  // Outputs: Hello, world!

SingleNode

SingleNode extends from the Node class and represents elements that don't have closing tags (self-closing tags) in HTML. Examples include <img>, <br>, and <!DOCTYPE>. This class has restricted methods and properties since these elements can't have child nodes.

TextNode

TextNode is a class that represents text content within the DOM. A TextNode holds raw text data and does not have child nodes.

Document node (extends Node)

Has additional getters and setters:

  • get documentElement
  • get html
  • get head
  • get body
  • get title
  • get charset
  • set title
  • get clone - return cloned new instance of Document

Query

The Query class is designed to parse CSS selector strings and transform them into a structured object format, providing detailed insights into each selector and its components.

By using the class, one can expect to transform a CSS selector string into an array of objects.

Each object will represent a selector, containing detailed information such as its tag, identifier, classes, attributes, and associated selectors if any. This can be useful for further processing or analysis of CSS selectors in an application.

Example

let q1 = 'html>body>div.tabs~.some[type $= "radio and some"]>p+div>.some-id .tab-content~input[disabled] div.some'
let result = new Query(q1).selectors
let result1 = Query.get(q1)
// result and result1 has to be same
console.log(result)

Result:

[
   {
      "query": "div.some",
      "tag": "div",
      "classList": [
         "some"
      ],
      "ancestors": [
         {
            "query": ".some-id",
            "classList": [
               "some-id"
            ],
            "parents": [
               {
                  "query": "div",
                  "tag": "div"
               }
            ],
            "prev": {
               "query": "p",
               "tag": "p",
               "parents": [
                  {
                     "query": ".some[0]",
                     "classList": [
                        "some"
                     ],
                     "attribs": [
                        {
                           check:(f),
                           "query": "[type$=\"radio and some\"]",
                           "name": "type",
                           "value": "radio and some",
                           "sign": "$="
                        }
                     ]
                  }
               ],
               "prevAny": {
                  "query": "div.tabs",
                  "tag": "div",
                  "classList": [
                     "tabs"
                  ],
                  "parents": [
                     {
                        "query": "html",
                        "tag": "html"
                     },
                     {
                        "query": "body",
                        "tag": "body"
                     }
                  ]
               },
               "group": "html>body>div.tabs~.some[0]>p"
            },
            "group": "html>body>div.tabs~.some[0]>p+div>.some-id"
         },
         {
            "query": "input[1]",
            "tag": "input",
            "attribs": [
               {
                  "query": "[disabled]",
                  "name": "disabled"
               }
            ],
            "prevAny": {
               "query": ".tab-content",
               "classList": [
                  "tab-content"
               ]
            },
            "group": ".tab-content~input[1]"
         }
      ],
      "group": "html>body>div.tabs~.some[type $= \"radio and some\"]>p+div>.some-id .tab-content~input[disabled] div.some"
   }
]

Attribs and check function

if attribute has value, attrib object will contain check function with one parameter for value to check.

let s = Query.get('[test^="some"]')[0]
console.log(s.attribs[0].check('some value test')) // true

buildFromCache and cacheDoc

Building DOM from raw html, usually takes tens of milliseconds. But now, you can build DOM once and save it's cache as regular stringified JSON. The caching process and building from cache takes less then 5ms for each and require realy low resources.

How it works?

const html = `` // some real html 255KB
const root = parseHTML(html); // 31.9ms
const cache = cacheDoc(root); // 2.4ms
const root1 = buildFromCache(cache); // 1.2ms
console.log(root.inneHTML === root1.innerHTML) // true