npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@sesamestrong/json-scraper

v4.5.0

Published

A tool to allow for quick running of JSON-based scrapers using request-promise and jsonframe-cheerio.

Downloads

5

Readme

JSON Scraper

Tool for creating complex, multi-step static web scrapers with cookies, auth and more

Installation

JSON Scraper is built and published on the Github Package Registry. npm install @sesamestrong/json-scraper

const {runEntireScraper}=require("@sesamestrong/json-scraper");
(async ()=>{
  console.log(await runEntireScraper(require('./myScraper.json'),{username:"exampleUsername",password:"exPw"});
})();

Error Reporting

JSON Scraper adds a jsonData and a stepNumber property to any error that it may throw.

Use

json-scraper is a library designed to both reduce boilerplate in web scraping and provide a secure, language-agnostic platform on which to write web scrapers.

One writes a scraper in a step-based format, such as the following:

{
"steps":[
{
"headers":{
"uri":"https://google.com/",
"method":"POST",
"headers":{
"Content-Type":"application/json"
}
},
"frame":{
"%title":"meta[property='twitter:title'] @ content",
"%imgSrc":"center img[title] @ src || https://.+$"
}
}]

This example will get the current title and image shown on the Google homepage. For example, if one were to run this scraper on the 4th of July, the return value of the scraper will be as follows:

{
"title":"Happy Fourth of July!",
"imgSrc":"https://google.com/logos/doodles/2019/fourth-of-july...2.2-I.png"
}

Currently, its only implementation is in NodeJS. Specific parameters are in the format specified by request, meaning that json-scraper is biased towards NodeJS. This problem can be solved in one of two ways:

  1. A change in the default spec There is no specification in json-scraper yet; the only artifact is the code itself in the library. However, when a specification is written, it can very well include mrome language-agnostic features.
  2. Options for individual formats This means that one can write their JSON files in many ways (with many different naming practices), but every style still compiles to one common format. This is the most likely option if json-scraper is to be continued, because it allows the library to be ergonomic for Python, Java, NodeJS and Go developers alike.