npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

jq-html-parser

v0.3.0

Published

A jQuery powered parser for extracting text (strings) from HTML documents

Downloads

28

Readme

jq-html-parser

Build Status

A jQuery powered parser for extracting text (strings) from HTML documents

Example Usage:


var Parser, request, config, url;

// npm dependencies
Parser  = require("jq-html-parser");
request = require("request");

// config, etc.
config = {
  title: "title",
  logo: {
    selector: "#hplogo",
    attribute: "style",
    regexp: "url\\(([\/A-z0-9]+.png)\\)"
  }
};
url = "http://google.co.uk";

// request a page
request.get(url, function(err, res, body){

  // handle error and non-200 response here
  if(err || (res.statusCode != 200)){
    return console.log("An error occured.");
  }

  var parser, result;

  // parse body
  parser = new Parser(config);
  result = parser.parse(body);

  console.log(result.title); // "Google"
  console.log(result.logo);  // "/images/srpr/logo11w.png"

});

Options:

selector (required)

The jQuery selector, used to locate the desired element.

For example, if html equals:

<!-- head, etc. -->
<h1>Hello, world!</h1>
<!-- more html... -->
<h1>Another Hello!</h1>
<!-- even more html... -->

And our JavaScript is:

var parser = new Parser(html);
var result = parser.parse({
  myElement: { selector: "h1" }
});

The parser would match the first h1 element from the DOM, and assign its value to the myElement attribute:

// value of 'result':
{ myElement: "Hello, world!"}

The parser would match all h1 elements from the DOM, and assign the array to the myElement attribute:

NOTE: To return all h1's from the DOM, see the multiple option below.

multiple (optional, defaults to false)

Returns an array of values when set to true. By default, or when set to false, will return a string containing the first value found for the specified jQuery selector.

For example, if html equals:

<!-- head, etc. -->
<h1>Hello, world!</h1>
<!-- more html... -->
<h1>Another Hello!</h1>
<!-- even more html... -->

And our JavaScript is:

var parser = new Parser(html);
var result = parser.parse({
  myElement: { selector: "h1", multiple: true }
});

The parser would match all h1 elements from the DOM, and assign the array to the myElement attribute:

// value of 'result':
{
  myElement: ["Hello, world!", "Another Hello!"]
}
var parser = new Parser(config);
parser.parse(html); // returns { myElement: ["Hello, world!", "Another Hello!"]}

attribute (optional)

Returns the text value of the specified attribute, for the given selector.

For example, if html equals:

<!-- head, etc. -->
<h1 data-secret-title="Shh, don't tell...">Hello, world!</h1>
<!-- more html... -->

And our JavaScript is:

var parser = new Parser(html);
var result = parser.parse({
  myElement: { selector: "h1", attribute: "data-secret-title" }
});

Then our result would contains "Shh, don't tell...", as that is the data-secret-title attribute of the h1 tag:

// value of 'result':
{
  myElement: "Shh, don't tell..."
}

NOTE Setting the attribute option will ignore remove and html options, as those options relate to the inner contents of an element, and not the attribute.

remove (optional)

Specifies any descendant elements to remove before parsing the value of the selected elements.

For example, if html equals:

<!-- head, etc. -->
<article>
  <p>Keep me. </p>
  <p class="advert">Annoying advert</p>
  <p>Keep me too.</p>
</article>
<!-- more html... -->

And our JavaScript is:

var parser = new Parser(html);
var result = parser.parse({
  myElement: { selector: "article", remove: ".advert" }
});

Then our result will contain everything inside of article, except the content from .advert:

// value of 'result':
{
  myElement: "Keep me. Keep me too."
}

html (optional, defaults to false)

When set to true, sets return the selected element as HTML. By default, or when set to true, returns the contents of the selected element as text.

For example, if html equals:

<!-- head, etc. -->
<article>
  <p>I am the text</p>
</article>
<!-- more html... -->

And our JavaScript is:

var parser = new Parser(html);
var result = parser.parse({
  myElement: { selector: "article", html: true }
});

Then our result will be a string of HTML, instead of the text value:

// value of 'result':
{
  myElement: "<article><p>I am the text</p></article>" // this would simply be "I am the text" if html=false
}

regexp (optional)

Use regular expressions to extract data from the parsed value(s) from your selected elements. regexp is the second-to-last option applied to the text, before transform.

For example, if html equals:

<!-- head, etc. -->
<h1>The title is 'jQuery Rocks'</h1>
<!-- more html... -->

And our JavaScript is:

var parser = new Parser(html);
var result = parser.parse({
  myElement: { selector: "h1", regexp: "The title is '(.*)'" }
});

Then our result will be "jQuery Rocks".

// value of 'result':
{
  myElement: "jQuery Rocks"
}

NOTE Only the first match will be returned. For more advanced transformations, like combining multiple regular expression matches together, use the transform option.

transform

Used to transform parsed values of selected element(s).

For example, if html equals:

<!-- head, etc. -->
<div class="points">Total points: 21</div>
<!-- more html... -->

And our JavaScript is:

var parser = new Parser(html);
var result = parser.parse({
  pointsScored: {
    selector: "h1",
    regexp: "Total points: (.*)",
    transform: function (val) {
      return val ? parseInt(val) : 0;
    }
  }
});

Then our result will be the numerical value for "points scored":

// value of 'result':
{
  pointsScored: 21
}

Combining Options:

All options can be combined, and when being parsed they will be processed in a particular order, depending on their specificity.

If you find your returned object doesn't match what you were expecting, it may be due to conflicting configuration options.

Configuration options will be evaluated in the following order:

  1. selector
  2. multiple
  3. attribute
  4. remove (skipped if attribute is defined)
  5. html (skipped if attribute is defined)
  6. regexp
  7. transform

Support

If you need any help, please let me know via the "issues" tab on Github.

Contributions are also welcome, so please feel free to fork the code, play around, then put in a PR.