great-reaper
v0.5.3
Published
Scrap and collect data from urls, html, json and stuff..
Downloads
6
Maintainers
Readme
Great reaper
great-reaper
is targeted to scrap collections of data from web pages with usage of friendly jquery-like (css) selectors for describing scrap strategy.
Installation
npm install great-reaper
Examples
Get top 3 hacker news:
reap('https://news.ycombinator.com/')
.group('table tr:nth-child(3) table tr')
.map({
title: '.title a',
url: '.title a@href'
})
.limit(3)
.then(console.log);
results
[ { title: 'Engineer Anti-Patterns',
url: 'http://dtrace.org/blogs/eschrock/2012/08/14/engineer-anti-patterns/' },
{ title: 'Hotel Wi-Fi blocking: Marriott is bad, and should feel bad',
url: 'http://www.economist.com/blogs/gulliver/2015/01/hotel-wi-fi-blocking' },
{ title: 'Can\'t you just turn up the volume?',
url: 'https://medium.com/@Amp/cant-you-just-turn-up-the-volume-4ecb7fc422a' } ]
Use transforms
Get hot questions from stackoverflow with urls.
Initially question links are relative so we should make them absolute to get correct urls.
reap('http://stackoverflow.com/?tab=hot')
.group('.question-summary')
.map({
question: '.question-hyperlink',
url: '.question-hyperlink@href',
views: '.views .mini-counts'
})
.transform({
question: reap.t().lowercase(),
url: reap.t.().prefix('http://stackoverflow.com'),
views: reap.t.().int()
})
.then(console.log);
results
[ { question: 'program breaks from switch java',
url: 'http://stackoverflow.com/questions/27840619/program-breaks-from-switch-java',
views: 49 },
{ question: 'what is the z at the end of date',
url: 'http://stackoverflow.com/questions/27840670/what-is-the-z-at-the-end-of-date',
views: 28 },
{ question: 'convert array of objects into object',
url: 'http://stackoverflow.com/questions/27840109/convert-array-of-objects-into-object',
views: 18 }, .... ]
Also you can chain transforms
...
.transform({
summary: reap.t().lowercase().trim()
})
...
Transforms
reap.transforms
contains basic transforms functions
reap.t().tream()
Tream field value
reap.t().prefix(string)
Prepend string to field value
reap.t().postfix(string)
Append string to field value
reap.t().lowercase()
Lowercase field value
reap.t().slice(from, to)
Slices field value same as string.slice
reap.t().split(separator)
Split string using given separator and returns array
reap.t().join(glue)
Joins array using given glue and returns string
reap.t().int()
Typecase field value to int
reap.t().float()
Typecase field value to float
Custom transforms
You can use custom transform function:
...
.transform(function (item) {
if (item.type === 'good') {
item.status = 'good item';
}
return item;
})
...
Or apply transform for specific field
...
.transform({
status: function (val) {
return 'status: ' + val.toLowerCase();
}
})
...
Filter results
Filters allows you to filter out redundant items from collection
...
.filter(function (item) {
return item.type === 'good';
})
...
property specific filters:
...
.filter({
type: function (type) {
return type === 'good';
}
})
...
LICENSE
MIT