npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

quick-scraper

v1.7.0

Published

An easy, lightweight scraper for humans with many inbuilt features..

Downloads

9

Readme

Quick Scraper

Hero Image

An easy, lightweight scraper built using typescript for good developer experience.

npm bundle size npm node-current npm

Features.

  • If it works in cheerio, it will work here.
  • Automatically change any encoding to UTF-8.
  • Built on typescript.
  • Great editor support.

Cons.

  • It doesn't play well with nested structures like
<p>
  abcd
  <a href="abcd">Some Url</a>
</p>

In this case, if you want to select the text abcd, it won't work ootb as there are some limitiations in the way jquery does it directly, to handle such cases, use the raw output object and then apply the logic in there.

Installation

Yarn

yarn add quick-scraper

NPM

npm i quick-scraper

Usage

import { quickScraper } from "quick-scraper"

await quickScraper({
  url: "https://typestrong.org/ts-node/",
  options: {
    title: {
      // This property can be changed to the name you want.
      selector: ".hero__subtitle",
    },
    docs: {
      selector: "a.navbar__item:nth-child(1)",
      text: false, // Text is enabled by default, so you need to disable it explicitly.
      href: true, // One of the attribute that's available by default.
    },
    releaseNotes:{
      selector: "a.navbar__item:nth-child(3)",
      text: true, // You can also enable multiple attributes at once.
      href: true,
    }
  },
});

// Output
/*

{
  rawString: <html>{...}</html> structure of your page in string format, load it in cheerio or do whatever you like with it.
  data: {
    title: { text: 'TypeScript execution and REPL for node.js' },
    docs: { href: 'https://typestrong.org/ts-node/docs/' },
    releaseNotes: {
      text: 'Release Notes',
      href: 'https://github.com/TypeStrong/ts-node/releases'
    }
  }
}
*
/

Scrape a HTML string.

// The process works similar to the quickScraper, few things needs to be changed.

import { scrapeHtml } from 'quick-scraper';

await scrapeHtml({
  html: "html source code from https://typestrong.org",
  options: {
    title: {
      // This property can be changed to the name you want.
      selector: ".hero__subtitle",
    },
    docs: {
      selector: "a.navbar__item:nth-child(1)",
      text: false, // Text is enabled by default, so you need to disable it explicitly.
      href: true, // One of the attribute that's available by default.
    },
    releaseNotes:{
      selector: "a.navbar__item:nth-child(3)",
      text: true, // You can also enable multiple attributes at once.
      href: true,
    }
  },
});

// Output
/*

{
  rawString: <html></html> structure of your page in string format, load it in cheerio or do whatever you like with it.
  data: {
    title: { text: 'TypeScript execution and REPL for node.js' },
    docs: { href: 'https://typestrong.org/ts-node/docs/' },
    releaseNotes: {
      text: 'Release Notes',
      href: 'https://github.com/TypeStrong/ts-node/releases'
    }
  }
}
*
/

Headless Quick Scrape

import { quickScraperHeadless } from "quick-scraper";

const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();

await quickScraperHeadless({
  url: "https://typestrong.org/ts-node/",
  options: {
    title: {
      // This property can be changed to the name you want.
      selector: ".hero__subtitle",
    },
    docs: {
      selector: "a.navbar__item:nth-child(1)",
      text: false, // Text is enabled by default, so you need to disable it explicitly.
      href: true, // One of the attribute that's available by default.
    },
    releaseNotes:{
      selector: "a.navbar__item:nth-child(3)",
      text: true, // You can also enable multiple attributes at once.
      href: true,
    }
  },
  page: page
});

// Output
/*

{
  rawString: <html>{...}</html> structure of your page in string format, load it in cheerio or do whatever you like with it.
  data: {
    title: { text: 'TypeScript execution and REPL for node.js' },
    docs: { href: 'https://typestrong.org/ts-node/docs/' },
    releaseNotes: {
      text: 'Release Notes',
      href: 'https://github.com/TypeStrong/ts-node/releases'
    }
  }
}
*
/

More Examples.

Custom Attribute

await scrapeHtml({
  html: "html source code from https://typestrong.org",
  options: {
    relStatus: {
      // This property can be changed to the name you want.
      selector: "a.navbar__item:nth-child(3)",
      attrs:{
        rel: true // Key will be the identifier of the attribute you want to scrape.
      }
    },
  },
});

All custom attributes will be accessible under Attrs key inside `output.data`
// Output
/*

{
  data: {
    relStatus: { attrs: { rel: "noopener noreferrer" } }
  }
}
*
/

List Item

import { quickScraper } from 'quick-scraper';

await quickScraper({
  url: "https://www.ptwxz.com/html/11/11622/",
  options: {
    chapters: {
      selector: ".centent > ul> li",
      listItem: true,
    },
  },
});
scrapedData.data.test.lists?
/*

{
  raw: [Function: initialize] {}, // It's the default output from cheerio, use it as you like.
  data: {
    chapters: {
      lists: [
        { text: '第一章 键盘侠' },
        { text: '第2章 杀机' },
        { text: '第3章 晴天霹雳' },
        { text: '第4章 第一部秘典' },
        { text: '第5章 坑爹的抽奖' },
        { text: '第6章 打赌' },
        { text: '第7章 突破' },
        { text: '第8章 信春哥' },
        { text: '第9章 老婆闺蜜' },
        ... 740 More Items
      ]
    }
  }
}
*
/

Visualization of this Repo.

Visualization of this repo

Libraries Used