npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

hoo

v0.0.3

Published

command-line contact information scrapper

Downloads

30

Readme

Hoo

A contact information scrapping tool for programmatic and command-line use. Hoo will scrape webpages looking for personal websites, email addresses, Twitter handles, and Github usernames and returns completed user profiles in JSON or CSV.

npm install -g hoo

Command-line Usage

This is a tool for quick contact information. Just provide a Twitter handle @compooter or Github username ^andrejewski or even just a plain website url chrisandrejewski.com, and Hoo will try figure out the remaining details.

# these all do the same thing
hoo @compooter
hoo ^andrejewski
hoo chrisandrejewski.com
{ fullname: 'Chris Andrejewski',
  github: [ 'andrejewski' ],
  url: [ 'http://chrisandrejewski.com' ],
  email: [ '[email protected]' ],
  twitter: [ 'compooter' ] }

Hoo works fine with multiple names, although too many will take longer.

hoo @compooter ^tj @iamdevloper

Output as JSON or CSV

By default, all output is in JSON. Passing the --csv flag will change all output to CSV.

hoo @compooter --csv
hoo @compooter -c
fullname,twitter,email,url,github
Chris Andrejewski,compooter,[email protected],http://chrisandrejewski.com,andrejewski

Writing to a file

Pass --output <filename> and Hoo will save output to a file instead. It works how you would expect passing the CSV flag as well.

hoo @compooter ^tj --output output.json
hoo @compooter ^tj -o output.json

For JSON, the results array is grouped into the "people" key.

{
  "people": [
    {
      "fullname": "Chris Andrejewski",
      "twitter": [
        "compooter"
      ],
      "url": [
        "http://chrisandrejewski.com"
      ],
      "email": [
        "[email protected]"
      ],
      "github": [
        "andrejewski"
      ]
    },
    {
      "fullname": "TJ Holowaychuk",
      "github": [
        "tj"
      ],
      "url": [
        "http://tjholowaychuk.com"
      ],
      "email": [
        "[email protected]"
      ]
    }
  ]
}

More options

See hoo --help for more options including colored output, debugging activity, and selecting only certain fields.

Programmatic Usage

Hoo is designed to be entirely configurable. The command-line interface uses some default scrappers but an instance of the Hoo class initially has none. Any scrappers are added just as you would add Express/Connect middleware.

var Hoo = require('hoo');
var hoo = new Hoo()
	.use(Hoo.TwitterScrapper)
	.use(Hoo.GithubScrapper)
	.use(Hoo.DefaultScrapper);

var names = ['@compooter', '^tj'];
hoo.run(names, function(error, records) {
	// do something awesome
});

Scrappers

Hoo includes Email (Default), Twitter, and Github web scrappers, but that doesn't mean new ones cannot be made. In fact that is why they all extend the same base Scrapper class. Building a new scrapper is easy.

var Scrapper = require('hoo').Scrapper;

class MyScrapper extends Scrapper {
	constructor(options) {
		/* options passed to new Hoo() are passed to each Scrapper added to it */
	}

	expandArg(arg) {
		/* this allows the twitter/github scrappers to expand usernames to urls */
		return arg;
	}

	processWebpage(webpage, record, next) {
		/* 
			take any webpage and extract contact information to put on the record
			find new webpage urls to call
			calling next when done
		*/
		/*
			Process `webpage` like it's jQuery like:
				var $ = webpage; $('#myElement').text();
			(See https://github.com/cheeriojs/cheerio)
		*/
		next(err, [optional urls])
	}
}

Note that while ES6 classes are used, you do not need to extend the Scrapper class for your own scrapper. Just be sure to implement the methods in your prototyped class.

Contributing

If you like Hoo enough to contribute, sweet. As the markup of scrapped webpages change, Hoo will need to be updated to match, so open a issue/pull if a scrapper is broken. If you have scrapper you would like to add to Hoo, pull request. Any other issues are welcome too.

npm install # dependencies
npm run build # to build
npm run pre-publish # to pre-publish for pull requests

Follow me on Twitter for updates or just for the lolz and please check out my other repositories if I have earned it. I thank you for reading.