npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

html-extract-data

v1.2.3

Published

Extract data from the DOM using a JSON config

Downloads

1,705

Readme

Travis npm npm

html-extract-data

Extract data from the DOM using a JSON config

Installation

yarn add html-extract-data
npm i -S html-extract-data

Usage

Basic

import extractFromHTML from 'html-extract-data';

extractFromHTML(
  html, // a HTML DOM element
  {
    query: '.grid-item',
    data: {
      title: 'h2',
      description: { query: 'p', html: true },
    }
  },
);

// Output:
{
  title: 'title',
  description: 'description <b>bold</b>'
}

Advanced

import extractFromHTML from 'html-extract-data';

const data = extractFromHTML(
  // a HTML DOM element
  html,
  {
    // query element within the html
    query: '.grid-item',
    
    // if list, it will use querySelectorAll and return an array
    list: true,
    
    // extract dat (mostly attributes) from the element itself
    self: {
    
      // grab the `data-category` attribute and put it in the `category` field
      'category': 'data-category',
      
      // convert the value to a number
      'id': { attr: 'data-id', convert: 'number' },
    }
    
    // extract extra data from child elements
    data: {
    
      // get the text value from the `h2` element
      title: 'h2',
      
      // get the html value from the `p` element
      description: { query: 'p', html: true },
      
      // get the text value from the `.tag` elements, and return as an array
      tags: { query: '.tags > .tag', list: true },

      // option to convert your extracted value, provide a user function      
      price: { query: '.price', convert: parseFloat }
      
      // or use any of the built-in converts (number, float, boolean, date)
      date: { query: '.date', convert: 'date' }
      
      // when passed a function, you can do your own logic,
      // extract and process any information you want, and return a value
      // the extract function passed is bould to the parent element
      // the parent element itself is also passed
      image: (extract, element) => ({
      
        // in here you can call and pass the same information as above
        alt: extract({ query: '.js-image', attr: 'alt' }),
        
        // or use the shorthand syntax
        src: extract('.js-image', { attr: 'src' }),
      }),
      
      // alternative option for the above
      image2: (extract) =>
      
        // if we just want to exract info from a single element
        // we can just pass a data object with shorthand extracts (see below)
        extract('.js-image', {
          data: { src: 'src', alt: 'alt' }
        }),
      // use the shorthand syntax to extra information from a single element
      link: {
        // specify the query to that element
        query: 'a',
        data: {
        
          // when passed a string, it will extract the attribute
          href: 'href',
          
          // when passed as object, it will do the same as normal
          target: { attr: 'target', convert: 'number' },
          
          // when passed true, it will grab the text content
          text: true,
          
          // this will extract the HTML content
          value: { html: true },
        },
      },
    },
  },
  
  // pass an additional object that will be merged in each extracted item
  {
    // normal property
    visible: false,
    
    // allows deep merging (this prepends a default value to the array)
    tags: ['select a value']
  }
);

Will output:

[{
  category: 'js',
  id: 1,
  title: 'title',
  description: 'description <b>bold</b>',
  tags: ['select a value', 'a', 'b', 'c'],
  price: 123.45,
  date: Date(2018-20-08 ... )
  image: {
    src: 'foo.jpg',
    alt: 'foobar',
  },
  image2: {
    src: "foo.jpg",
    alt: "foobar",
  },
  link: {
    href: 'http://www.google.com',
    target: '_blank',
    text: 'google',
    value: '<b>google</b>'
  },
  visible: false
}]

Production

This library uses Joi to validate the input config structure, but it's quite large. That's why they are added within process.env.NODE_ENV !== 'production' checks, which means that your build process can strip it out.

Documentation

View the unit tests to see all the possible ways this module can be used.

Building

In order to build html-extract-data, ensure that you have Git and Node.js installed.

Clone a copy of the repo:

git clone https://github.com/ThaNarie/html-extract-data.git

Change to the html-extract-data directory:

cd html-extract-data

Install dev dependencies:

yarn

Use one of the following main scripts:

yarn build            # build this project
yarn test             # run the unit tests incl coverage
yarn test:dev         # run the unit tests in watch mode
yarn lint             # run tslint on this project

Contribute

View CONTRIBUTING.md

Changelog

View CHANGELOG.md

Authors

View AUTHORS.md

LICENSE

MIT © Tha Narie