krake
v1.0.2
Published
simple base library for crawl jobs
Downloads
3
Readme
krake
- simple base library for crawl jobs, based on osmosis.
- it crawls a website recursively and emits events to take custom actions.
- it reports broken links (of static pages)
- it can be used to create a search index of a static website: example
why
making crawling jobs easier and more robust...
how
install
npm install --save krake
use
var Crawler = require('krake')
var crawler = new Crawler()
crawler
.on('page', function (pageData) {
console.log('page', pageData)
})
.on('link', function (linkData) {
console.log('link', linkData)
})
.on('error', function (err, pageData, linkData) {
console.log('error', err.errorType, err.errorMessage)
})
.on('done', function (err) {
if (err) console.log('broken links', err.brokenLinks)
})
.crawl('http://localhost:8080/')
see also example
options
these are the default options:
var crawler = new Crawler({
// osmosis options: http://rchipka.github.io/node-osmosis/Osmosis.html
osmosis: {
ignore_http_errors: false,
tries: 1
},
// krake options
uri: 'http://localhost:8080',
followExternalLinks: false,
timeout: 500,
pageDataSelectors: {
title: 'head title',
body: 'body'
},
linkSelectors: {
url: '@href,@src'
},
linkTags: 'a,img,svg',
linkIgnores: ':starts-with(javascript)'
})
author
Andi Neck | @andineck | [email protected] | intesso
license
MIT