npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

graphql-scraper

v1.2.1

Published

Extract structured data from the web using GraphQL.

Downloads

20

Readme

graphql-scraper

GraphQL lets us query all sorts of graph-shaped data - so why not use it to query the world's most useful graph, the web?

graphql-scraper is a command-line tool and reusable GraphQL schema which lets you easily extract data from HTML.

Check out a live demo here. You can easily spin up your own by using graphql-scraper-server.

Install

npm install -g graphql-scraper

The command-line tool

graphql-scraper <query-file>

Reads a GraphQL query from the path query-file, and prints the result.

If query-file is not given, reads the query from stdin.

Command-line options

  • --json Returns the result in JSON format, for use in other tools.
  • --help Prints a help string.

Variables

Any other named options you pass to the CLI will be used as a query variable.

For example, if you want to reuse the same query on several pages, you could write the following query file (query.graphql):

query ExampleQueryWithVariable($page: String) {
  page(url: $page) {
    items: queryAll(selector: "tr.athing") {
      rank: text(selector: "td span.rank")
      title: text(selector: "td.title a")
      sitebit: text(selector: "span.comhead a")
      url: attr(selector: "td.title a", name: "href")
      attrs: next {
        score: text(selector: "span.score")
        user: text(selector: "a:first-of-type")
        comments: text(selector: "a:nth-of-type(3)")
      }
    }
  }
}

...and execute the query like this:

graphql-scraper query.graphql --page="https://news.ycombinator.com/"

The schema

You can check out an auto-generated schema description here, but I recommend spinning up a graphql-scraper-server instance and exploring the types interactively. You can also play around with the schema in the live demo.

Re-using the schema in your own projects

The npm package exports the GraphQL schema which is used by the command-line tool. This an instance of graphql-js GraphQLSchema, which you can use anywhere that expects a schema, for example apollo-server or graphql-yoga.

Use npm install graphql-scraper or yarn add graphql-scraper to add the schema to your project.

Basic example with graphql

import { graphql } from 'graphql'
import schema from 'graphql-scraper'
// You can also import it as follows:
// const schema = require('graphql-scraper')


const query = `
{
  page(url: "http://news.ycombinator.com") {
    items: queryAll(selector: "tr.athing") {
      rank: text(selector: "td span.rank")
      title: text(selector: "td.title a")
      sitebit: text(selector: "span.comhead a")
      url: attr(selector: "td.title a", name: "href")
      attrs: next {
        score: text(selector: "span.score")
        user: text(selector: "a:first-of-type")
        comments: text(selector: "a:nth-of-type(3)")
      }
    }
  }
}
`

graphql(schema, query).then(response => {
  console.log(response)
})

Background

This project was inspired by gdom, which is written in Python and uses the Graphene GraphQL library.

If you want to switch over from gdom, please note some schema changes:

  • query(selector: String!) now only returns a single Element, rather than a list (like document.querySelector). Added a new queryAll(selector: String!): [Element] field, which behaves like document.querySelectorAll.
  • is(selector: String!) is renamed to has(selector: String!).
  • children, parent, siblings, next etc. no longer have a selector argument. If you need to select children with a specific selector, use child selectors (.foo > .bar).
  • parents is removed.
  • prev[All] is renamed to previous[All].

Maintainers

@lachenmayer

Contribute

PRs accepted.

License

MIT © 2018 harry lachenmayer