npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

html-play

v1.3.0

Published

Fetch and parse dynamic HTMLs with Node.js like a boss 🕶

Downloads

12

Readme

Features

  • Intuitive APIs for extracting useful contents like links and images.
  • CSS selectors.
  • Mocked user-agent (like a real web browser).
  • Full JavaScript support.
    await htmlPlay(url, { browser: true })
    Using Chromium under the hood by default, thanks to Playwright.

Recipes

  • Grab a list of all links and images on the page.

    import { htmlPlay } from 'html-play'
    
    const { dom } = await htmlPlay('https://nodejs.org')
    // Will print all link URLs on the page
    console.log(dom.links)
    // Will print all image URLs on the page
    console.log(dom.images)
  • Select an element with a CSS selector.

    import { htmlPlay } from 'html-play'
    
    const { dom } = await htmlPlay('https://nodejs.org')
    const intro = dom.find('#home-intro', { containing: 'Node' })
    // Will print: 'Node.js® is an open-source, cross-platform...'
    console.log(intro.text)
  • Let's grab some wallpapers from unsplash.
    import { htmlPlay } from 'html-play'
    
    const { dom } = await htmlPlay('https://unsplash.com/t/wallpapers')
    const elements = dom.findAll('img[itemprop=thumbnailUrl]')
    const images = elements.map(({ image }) => image)
    // Will print something like
    // ['https://images.unsplash.com/photo-1705834008920-b08bf6a05223', ...]
    console.log(images)
  • Let's load some hacker news from Hack News.
    import { htmlPlay } from 'html-play'
    
    const { dom } = await htmlPlay('https://news.ycombinator.com')
    const titles = dom.findAll('.titleline')
    const news = titles.map(({ text, link }) => [text, link])
    // Will print something like
    // [['news 1', 'http://xxx.com'], ['news 2', 'http://yyy.com'], ...]
    console.log(news)
  • Load a dynamic website, which means its content is generated by JavaScript!
    // Search for images of "flower" with Google
    import { htmlPlay } from 'html-play'
    
    const { dom } = await htmlPlay('https://www.google.com/search?&q=flower&tbm=isch', { browser: true })
    // Filtering is still needed if you want this work...
    console.log(dom.images)
  • Send requests with custom cookies.
    import { htmlPlay } from '../src/index.js'
    
    const { dom } = await htmlPlay('https://httpbin.org/cookies', {
      fetch: { fetchInit: { headers: { Cookie: 'a=1; b=2;' } } },
    })
    // Will print { "cookies": { "a": "1", "b": "2" } }
    console.log(dom.text)

Installation

npm i html-play

If you want to use a browser to "run" the page before parsing, you'll need to install Chromium with Playwright.

npm i playwright
npx playwright install chromium

APIs

  • Methods

    htmlPlay

    Fetch a certain URL and return its response with the parsed DOM.

    Example:
    import { htmlPlay } from 'html-play'
    
    const { dom } = await htmlPlay('http://example.com')
    Parameters:
    • url

      Type: string

      The URL to fetch.

    • options (Optional)

      Type: object

      Default: { fetch: true }

      • fetch (Optional)

        Type: boolean | object

        Default: true

        If set to true, we will use the Fetch API to load the requested URL. You can also specify the options for the Fetch API by passing an object here.

        • fetcher (Optional)

          Type: function

          The fetch function we are going to use. We can pass a polyfill here.

        • fetchInit (Optional)

          Type: function

          The fetch parameters passed to the fetch function. See fetch#options. You can set HTTP headers or cookies here.

      • browser (Optional)

        Type: boolean | object

        Default: false

        If set to true, we will use Playwright to load the requested URL. You can also specify the options for Playwright by passing an object here.

        • browser (Optional)

          Type: object

          The Playwright Browser instance to use.

        • page (Optional)

          Type: object

          The Playwright Page instance to use.

        • launchOptions (Optional)

          The launchOptions passed to Playwright when we are launching the browser. See BrowserType#browser-type-launch

        • beforeNavigate (Optional)

          A custom hook function that will be called before the page is loaded. page and browser can be accessed here as the properties of its first parameter to interact with the page.

        • afterNavigate (Optional)

          A custom hook function that will be called after the page is loaded. page and browser can be accessed here as the properties of its first parameter to interact with the page.

    Returns:

    A Promise of the Response instance (see below).

  • Classes

    Response

    Properties
    • url

      Type: string

      The URL of the response. If the response is redirected from another URL, the value will be the final redirected URL.

    • status

      Type: number

      The HTTP status code of the response.

    • content

      Type: string

      The response content as a plain string.

    • dom

      Type: object

      The parsed root DOM. See DOMElement.

    • json

      Type: object | undefined

      The parsed response JSON. If the response is not a valid JSON, it will be undefined.

    • rawBrowserResponse

      Type: object

      The raw response object returned by Playwright.

    • rawFetchResponse

      Type: object

      The raw response object returned by the Fetch API.

    DOMElement

    Properties
    • html

      Type: string

      The "outerHTML" of this element.

    • link

      Type: string

      If the element is an anchor element, this will be the absolute value of the element's link, or it will be an empty string.

    • links

      Type: string[]

      All the anchor elements inside this element.

    • text

      Type: string

      The text of the element with whitespaces and linebreaks stripped.

    • rawText

      Type: string

      The original text of the element.

    • image

      Type: string

      If the element is an image embed element, this will be the absolute URL of the element's image, or it will be an empty string.

    • images

      Type: string[]

      All the image URLs inside this element.

    • backgroundImage

      Type: string

      The background image source extracted from the element's inline style.

    • element

      Type: object

      The corresponding JSDOM element object.

    Methods
    • find

      Find the first matched child DOMElement inside this element.

      Parameters
      • selector

        Type: string

        The CSS selector to use.

      • options (Optional)

        Type: object

        • containing (Optional)

          Type: string

          Check if the element contains the specified substring.

          Type: string

    • findAll

      Find all matched child DOMElements inside this element.

      Parameters
      • selector

        Type: string

        The CSS selector to use.

      • options (Optional)

        Type: object

        • containing (Optional)

          Type: string

          Check if the element contains the specified substring.

          Type: string

    • getAttribute

      Parameters
      • qualifiedName

        Type: string

        Returns element's first attribute whose qualified name is qualifiedName, and undefined if there is no such attribute otherwise.

Credits

This project is highly inspired by another fabulous library Requests-HTML for Python.

License

MIT