@headwall/url-crawler
v0.2.7
Published
URL crawler for analysing web content
Downloads
4
Readme
head-spider
URL crawler and web content analyser.
The idea is to create an instance of the crawler, add one or more URLs to it along with one or more response/document processors. When the crawler has no more URLs in its queue, it finished.
This can form the basis of a technical SEO crawler, or any other content crawler/scraper.
When a page has been fetched, a series of "processors" are run over it to extract structured data.
After all the processors have finished, the "analysers" are run, which can look for things like missing IMG Alt text, out-of-sequence heading elements, whatever you want.
You can easily add your own processors and analysers.
This is still in early development as I'm working on the test suite and setting up some basic document processors.
You can run the test suite with npm run test
.