scramo
v0.2.0
Published
Simple web scraping module for fun and profit
Downloads
12
Maintainers
Readme
Scramo
Scramo is a simple config-driven web scraper module for Node.js. Being config-driven, one must provide a configuration of what element and which data should be scraped from a given URL. Scramo returns a promise to deal with the asynchronous execution. However, callback is also supported.
Install
npm install scramo
Usage
var config = {
collection1: {
selector: '.class-to-scrape',
properties: [
{
name: 'utime',
attr: 'data-utime'
},
{
name: 'longText'
}
]
}
};
// using promise
scramo
.scrape(url, config)
.then(function(result) {
// the resulting data should be
// {
// collection1: [
// {
// utime: 'somevalue',
// longText: 'some long text'
// }
// ]
// }
})
.fail(function(err) {
console.error(err);
});
// using callback
scramo.scrape(url, config, function(err, result) {
// do something
})
config
The config should consist of the following:
- CollectionName: a name where the result will be collected to. In the example above is
collection1
- selector: the selector to select which element to scrape
- properties: a list of property to tell Scramo what property of
- the selected element to be scraped:
- name: the name that is used as the key of the scraped value in the result
- attr: the element's attribute that will be scraped. When nothing is specified, then the element's text will be returned.
html
is the special value to scrape the inner html of the given element
License
MIT