@dschnare/chai
v2.0.1
Published
A crawler that grabs the URL, page titles, and first H1 and H2 elements.
Downloads
17
Readme
Chai
Chai is a simple web crawler that scrapes relevant SEO data from each page it visits.
Usage
npm install @dschnare/chai -g
chai http://mywebsite.com > crawl.json
Scraping
Chai will scrape the following data from each page it visits.
- Page title
- All H1 values
- All H2 values
The scrape data written to stdout
is a JSON array of objects with the
following shape:
{ title, url, headings: { h1: [], h2: [] } }
For URLs that respond with an error the scrape object has this shape:
{ url, statusCode, error }
Where error
is the error object returned from Superagent.
Roadmap
- Expose way to filter out URLs to be crawled
- Expose way to customize the scraper
- Make it easier to identify 404 URLs
- Add option to control verbosity