npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

xpath-html

v1.0.3

Published

Easily use XPath to locate any element in an HTML DOM page

Downloads

3,763

Readme

XPath HTML

CI Release NPM Downloads

XPath stands for XML Path Language. It provides a flexible non-XML syntax to address (point to) different parts of an XML document.

With the XPath HTML, this will enable us to use such a powerful tool, navigating through the HTML DOM by XPath expression.

If you want to learn more about XPath and know how to use different XPath expression for finding complex or dynamic elements, take a visit to this concise tutorial here.

Table of Contents

Installation

xpath-html is available as a package on NPM, open up a Terminal and enter the following command:

npm install --save xpath-html

Usages

Hello XPath from HTML World

const fs = require("fs");
const xpath = require("xpath-html");

// Assuming you have an html file locally,
// Here is the content that I scraped from www.shopback.sg
const html = fs.readFileSync(`${__dirname}/shopback.html`, "utf8");

// Don't worry about the input much,
// you are able to use the HTML response of an HTTP request,
// as long as the argument is a string type, everything should be fine.
const node = xpath.fromPageSource(html).findElement("//*[contains(text(), 'with love')]");

console.log(`The matched tag name is "${node.getTagName()}"`);
console.log(`Your full text is "${node.getText()}"`);
# A fast way to download .html file above
$ curl https://www.shopback.sg -o shopback.html

# Or from my GitHub examples
$ curl -O https://raw.githubusercontent.com/hieuvp/xpath-html/master/examples/shopback.html

Bang 💥 Output should be something looks like:

The matched tag name is "div"
Your full text is "Made with love by"

It is understandable, right? Now, you can scroll down the APIs below and diving into details.

fromPageSource(html).findElement(expression)

Locate an element on a page, the returned node is a representation of the underlying DOM.

Arguments:

| Name | Type | Description | | ------------ | -------- | -------------------------- | | html | string | Input HTML page's source | | expression | string | The given XPath expression |

Returns: Node

Example:

const fs = require("fs");
const xpath = require("xpath-html");

const html = fs.readFileSync(`${__dirname}/shopback.html`, "utf8");
const node = xpath.fromPageSource(html).findElement("//*[text()='Made with love by']");

console.log(node.toString());

Result:

<div xmlns="http://www.w3.org/1999/xhtml">Made with love by</div>

fromPageSource(html).findElements(expression)

Search for multiple elements on a page.

Arguments:

| Name | Type | Description | | ------------ | -------- | -------------------------- | | html | string | Input HTML page's source | | expression | string | The given XPath expression |

Returns: Array<Node>

Example:

const fs = require("fs");
const xpath = require("xpath-html");

const html = fs.readFileSync(`${__dirname}/shopback.html`, "utf8");
const nodes = xpath
  .fromPageSource(html)
  .findElements("//img[starts-with(@src, 'https://cloud.shopback.com')]");

console.log("Number of nodes found:", nodes.length);
console.log("nodes[0]:", nodes[0].toString());
console.log("nodes[1]:", nodes[1].toString());

Result:

Number of nodes found: 158
nodes[0]: <img src="https://cloud.shopback.com/raw/upload/static/images/navbar/sb-logo.png" xmlns="http://www.w3.org/1999/xhtml"/>
nodes[1]: <img src="https://cloud.shopback.com/raw/upload/static/images/navbar/desktop/icon-raf.svg" xmlns="http://www.w3.org/1999/xhtml"/>

fromNode(xhtml).findElement(expression)

Select an element against an XHTML format. Similar to fromPageSource(html).findElement(expression), but it is for a subset of an html page this time.

Arguments:

| Name | Type | Description | | ------------ | ------------------ | ------------------------------------------------------------------------------------- | | xhtml | Node or string | Either a returned node from a queryor an xhtml string with a good shape | | expression | string | The given XPath expression |

Returns: Node

Notes:

  • The input xhtml must have a namespace of xmlns="http://www.w3.org/1999/xhtml" e.g. <div xmlns="http://www.w3.org/1999/xhtml">Made with love by</div>

Example:

const fs = require("fs");
const xpath = require("xpath-html");

const html = fs.readFileSync(`${__dirname}/shopback.html`, "utf8");
const group = xpath.fromPageSource(html).findElement("//div[@class='ui-store-group']");

const node = xpath.fromNode(group).findElement("//a[@href='/aliexpress']");

console.log(node.toString());

Result:

<a class="store-logo-wrapper" href="/aliexpress" title="AliExpress Coupons &amp; Promo Codes" xmlns="http://www.w3.org/1999/xhtml"><img class="store-logo" src="https://cloud.shopback.com/t_sd_250_pad,f_auto,fl_lossy,q_auto/sg-store/49/49_logo_86958e96.png" alt="AliExpress Coupons &amp; Promo Codes"/></a>

fromNode(xhtml).findElements(expression)

Select multiple elements against an XHTML format. Same as fromPageSource(html).findElements(expression), however it is being used for querying from a part of an html.

Arguments:

| Name | Type | Description | | ------------ | ------------------ | ------------------------------------------------------------------------------------- | | xhtml | Node or string | Either a returned node from a queryor an xhtml string with a good shape | | expression | string | The given XPath expression |

Returns: Array<Node>

Notes:

  • The input xhtml must have a namespace of xmlns="http://www.w3.org/1999/xhtml" e.g. <div xmlns="http://www.w3.org/1999/xhtml">Made with love by</div>

Example:

const fs = require("fs");
const xpath = require("xpath-html");

const html = fs.readFileSync(`${__dirname}/shopback.html`, "utf8");
const group = xpath.fromPageSource(html).findElement("//div[@class='ui-store-group']");

const nodes = xpath.fromNode(group).findElements("//img[contains(@src,'shopily')]");

console.log("Number of nodes found:", nodes.length);
console.log("nodes[0]:", nodes[0].toString());
console.log("nodes[1]:", nodes[1].toString());

Result:

Number of nodes found: 2
nodes[0]: <img class="store-logo" src="https://shopily-sg.s3.amazonaws.com/uploads/stores/504/504_logo_200c4121.png" alt="zChocolat Coupons &amp; Promo Codes" xmlns="http://www.w3.org/1999/xhtml"/>
nodes[1]: <img class="store-logo" src="https://shopily-sg.s3.amazonaws.com/uploads/stores/2498/2498_logo_81f0a24d.png" alt="Bed Bath &amp; Beyond Coupons &amp; Promo Codes" xmlns="http://www.w3.org/1999/xhtml"/>

node.getTagName()

Retrieve the node's tag name.

Arguments: None

Returns: string

Example:

const fs = require("fs");
const xpath = require("xpath-html");

const html = fs.readFileSync(`${__dirname}/shopback.html`, "utf8");
const node = xpath.fromPageSource(html).findElement("//*[text()='Made with love by']");

console.log("Single node's tag name:", node.getTagName());

const nodes = xpath
  .fromPageSource(html)
  .findElements("//img[starts-with(@src, 'https://cloud.shopback.com')]");

console.log("First nodes[0] tag name:", nodes[0].getTagName());
console.log("Second nodes[1] tag name:", nodes[1].getTagName());

Result:

Single node's tag name: div
First nodes[0] tag name: img
Second nodes[1] tag name: img

node.getText()

Get the visible innerText of the node.

Arguments: None

Returns: string

Example:

const fs = require("fs");
const xpath = require("xpath-html");

const html = fs.readFileSync(`${__dirname}/shopback.html`, "utf8");
const node = xpath.fromPageSource(html).findElement("//*[text()='Made with love by']");

console.log("Text of the node:", node.getText());

const nodes = xpath
  .fromPageSource(html)
  .findElements("//div[@id='home-page-container']//*[@class='title-text']");

console.log("Text of nodes[0]:", nodes[0].getText());
console.log("Text of nodes[1]:", nodes[1].getText());

Result:

Text of the node: Made with love by
Text of nodes[0]: Up to 10.0% Cash Rewards
Text of nodes[1]: Up to 7.0% Cashback

node.getAttribute(name)

Retrieve the current value of the given attribute of this node.

Arguments:

| Name | Type | Description | | ------ | -------- | ---------------------------------- | | name | string | The name of the attribute to query |

Returns: string

Example:

const fs = require("fs");
const xpath = require("xpath-html");

const html = fs.readFileSync(`${__dirname}/shopback.html`, "utf8");
const node = xpath.fromPageSource(html).findElement("//a[text()='View All Popular Stores']");

console.log("The href value:", node.getAttribute("href"));

Result:

The href value: /all-stores

Dependencies

Special thanks to all contributors of these libraries which are the foundation of what xpath-html was built upon.

  1. xpath
  2. xmldom
  3. xmlserializer
  4. parse5

License

MIT

Made with ❤ from ShopBack.