console-crawler
v0.2.0
Published
A simple web crawler that keeps to the domain
Downloads
4
Maintainers
Readme
Console Crawler
A Node app to crawls a given web site.
npm install -g console-crawler;
console-crawler http://en.wikipedia.org/ --legs=8
console-crawler http://en.wikipedia.org/ --legs=2 --phantom
Quick Set-Up for dev
- This is a Node app, so you'll need node/npm to run it.
- Clone down the repo
- Install the dependencies
npm install
. - Fire up the crawler.
Or, Copy-Paste
git clone https://github.com/robcolburn/console-crawler;
cd console-crawler;
npm install;
./console-crawler.js http://en.wikipedia.org/ --legs=8;
Notes
On Mac, you'll likely need X-Code Command Line tools installed.
If you'd like to use PhantomJS. You'll need to download PhatomJS, and install it separately since it has it's own binary.
If you need target a different "Host", you may just need to edit your hosts file. For instance, say I wanted to hit 5.5.5.5, but with the host of example.com which isn't ready to go live just yet. I might add the following to my hosts file.
5.5.5.5 example.com