punch-scraper
v0.0.16
Published
## Config * proxyManagerConfig - punch proxy manager config * maxTry - How many time scrapper will try to fetch the link before error * strategy - scraper strategies * name - strategy name valid values: CASPERJS, HTTP, PHANTOMJS * proxy - proxy i
Downloads
5
Readme
punch-scraper
Config
- proxyManagerConfig - punch proxy manager config
- maxTry - How many time scrapper will try to fetch the link before error
- strategy - scraper strategies
- name - strategy name valid values: CASPERJS, HTTP, PHANTOMJS
- proxy - proxy ip
- lambda - for CASPERJS or PHANTOMJS only
aws_key - aws key
aws_secret - aws secret key
region - aws region
lambda_name - aws labmda function name
- eval - code that should be evaled for CASPERJS or PHANTOMJS only
- services
- include - array of proxy services to use
- exclude - array of proxy services to not use
- valid valuesGIMMI_PROXY, HIDE_MY_ASS, IN_CLOCK, PROXY_SERVER_LIST, UK_PROXY, US_PROXY
Method
- scrape - scrape urls
- start - start the scraper manager
- stop - stop the scraper manager
Usage
'use strict';
const ScrapeManager = require('./scraper-manager/');
const scrapeManager = new ScrapeManager();
const config = {
eval: "response.write(page.content);response.close();",
strategy: {
name: 'phantomjs',
lambda: {
aws_key: 'XXX-XXX-XXX',
aws_secret: 'XXX-XXX-XXX',
lambda_name: 'node-phantomjs-aws-lambda-server-development',
region: 'us-west-2'
}
}
};
let links = [
'http://www.google.com/',
'http://www.google.com/'
];
scrapeManager.start()
.then(() => scrapeManager.scrape(links, config))
.then((results) => {
console.log(results);
console.log('done');
scrapeManager.stop();
});