npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

scrappers

v2.0.0

Published

A set of utility classes for node.js to make scrapping the web easier.

Downloads

9

Readme

Scrappers.js

A set of utility classes for node.js to make scrapping the web easier.

There is support for custom browser headers, encodings and compression.

Install

npm install --save scrapper

Scrapper options

url

The url of the target page

parser

An object with a public "parse" method.

######Example:

var hnParser = {
  //$ is cheerio (jquery) instance of the parsed page
  parse:function($){
    //get the text of the third link in a page
    return $('a').eq(3).text();
  }
};

####encoding

The encoding of the target html page. This parameter is optional and defaults to "utf-8"

####headers

An object containing key-value pairs of headers. Defaults to:

{
  'User-Agent': "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.94 Safari/537.36"
}

####gzip A flag to enable disable the gzip compressing. By default it is enabled (set to true.

You will probably not want to disable this, if the page is not compressed, it will still be parsed correctly (see request)

####Options can be passed on instantiation:

var scrapper = new PageScrapper({
  url: HACKER_NEWS_HOME,
  parser: hnParser
});

####Or on the get request:

scrappers.get(options, done);

Options passed in the get request, will extend the options passed on instantiation for the duration of the request.

Page

A base class for scrapping a web page.

####Example:

Get the third link from hacker news home page.

#####Import scrapper object


var PageScrapper = require('scrappers').PageScrapper;
Write a parser

The parse functin will rescive a cheerio instance with hn html.

var hnParser = {
  //$ is cheerio (jquery) instance of the parsed page
  parse:function($){
    //get the text of the third link in a page
    return $('a').eq(3).text();
  }
};

#####Instantiate a scraper object


var HACKER_NEWS_HOME = "https://news.ycombinator.com/";
var scrapper = new PageScrapper({
  url: HACKER_NEWS_HOME,
  parser: hnParser
});

#####Parse!


scrapper.get(function(err,parsed){
  console.log('Third link on hacker news page is:", parsed);
});
Result:
Third link on hacker news page is: comments

Rss

A base class for scrapping an rss feed.

####Example:

Get a list of article titles for ask hacker news rss.

#####Import scrapper object

var RssScrapper = require('scrappers').RssScrapper;
Write a parser

The parse functin will rescive a javascript object representing a single rss article.

var hnParser = {
  //gets a parsed rss articale in an object
  parse:function(article){
    return article.title;
  }
};

#####Instantiate a scraper object


var HACKER_NEWS_RSS = "http://hnrss.org/ask";
var scrapper = new RssScrapper({
  url: HACKER_NEWS_RSS,
  parser: hnParser
});

#####Parse!


scrapper.get(function(err,parsed){
  //print all articles on an rss
  console.log("Ask:Hn titles", parsed);
});
Result:
Ask:HN titles:
[
  'Ask HN: Do you like the idea of social network and learning?★',
  'Ask HN: How does Saved stories feature work?',
  'Ask HN: AGPL on a Code Generator App',
  'Ask HN: How do you read your programming books?',
  'Ask HN: Is OpenGL Worth Learning?',
  'Ask HN: How to produce vnc like Browserling?',
  'Ask HN: How do I solve problems/code outside of the book I used to learn python?',
  'Ask HN: Self Study Learning Path',
  'Ask HN: How to build quality software in a fast paced startup enviorment?',
  'Ask HN: Is Agar.io currently making or losing money?',
  'Ask HN: Any success with Toastmasters?',
  'Ask HN: Has anyone else found Angular to be destroying their productivity?',
  'Ask HN: How to survive a horrible tech job while looking for a new one?',
  'Ask HN: How can a successful startup adopt a strong testing workflow?',
  'Ask HN: What kind of software will be used to develop VR applications?',
  'Ask HN: How do you prepare for a Technical Interview',
  'Ask HN: Recommend one Business/Startup book',
  'Ask HN: Should I branch off my startup\'s technology into a separate company?',
  'Ask HN: Test/Play with 3D Printing Library',
  'Ask HN: What database storage engine do you use, and why?'
]

Development

To run tests use:

npm test