spiders
v2.0.0
Published
A web crawler engine
Downloads
11
Readme
SPIDERS
Crawl web pages efficiently
Feautures
- Persistance
- Optimization
- Light weight
Installation
npm install spiders
or
yarn add spiders
Simple Usage Demo
ES6 syntax:
let Spiders = require('spiders');
let spidy = new Spider();
//Crawl
spidy.crawl( 'http://urltoscrape' )
.then( $ => {
let title = $("title").text();//Jquery functions
console.log(title);
})
Options
Options can be passed as arguement during object intialization.
The options supports following
{
persist : './fileToStore',
toStore : (params,url) => {
},
fromStore : (obj ,params, url){
}
}
persist - Used for persistance. See below briefly
toStore - returns a object to tell spider how to store given url and params
fromStore - specify match condition for the given object & url & params
Persistance
let spider = new Spider({persist:'./songs'});
spider.persist().then(()=>{
// Spiders gets loaded with previous scraped details
// Scrape fn here.
})
Methods
crawl( url , params)
Demo
let Spider = require('spiders');
let songSpidy = new Spider({
persist:"./persist/song",
toStore: (url,params){
return {url}
},
fromStore: (obj,url,params){
return obj.url == url;
}
});
songspidy.persist().then(scrape);
function scrape(){
songspidy.scrape('pathtoSong',{lng:'en'}).then($=>{
let title = $("title").text();
})
Note
For more clarity read my blog on Medium