webcheck-crawl-once
v1.0.0
Published
A plugin for webcheck to prevent multiple downloads of the same resource
Downloads
1
Readme
webcheck-crawl-once
A plugin for webcheck to prevent multiple downloads of the same resource.
How to install
npm install --save webcheck-crawl-once
How to use
var Webcheck = require('webcheck');
var CrawlOncePlugin = require('webcheck-crawl-once');
var plugin = CrawlOncePlugin({
// filterUrl: /.html/,
ignoreQuery: false
});
var webcheck = new Webcheck();
webcheck.addPlugin(plugin);
plugin.enable();
// now continue with your code...
Options
filterUrl
: Filter urls that should only crawled once (default all urls).ignoreQuery
: Ignore query in url.
Note for filters
Filters are regular expressions, but the plugin uses only the .test(str)
method to proof. You are able to write
your own and much complexer functions by writing the logic in the test method of an object like this:
opts = {
filterSomething: {
test: function (val) {
return false || true;
}
}
}
Methods
reset(undefined | url)
: Reset a specific url, or the complete ignore listignore(url)
: Add a resource to ignore listcheck(url)
: Check if a resource is ignored