bluemango-scraper
v1.0.0
Published
Scraper === This library is a helper for creating a script that scrapes values of a page
Downloads
4
Keywords
Readme
Scraper
This library is a helper for creating a script that scrapes values of a page
var scraper = new Scraper({
title: {
type:'text',
selector:'h1.the-title' // this is a jquery selector that will look for the first h1 with the class the-title
}
})
return scraper.getResults() // {title:'hello world'}
The config
The config is an objects where where every key stands for a field that wil be returned in the results. the value of the config item can be a string
, object
or a function
###whitelistedDomain Only allow the library to work when the value matches the current domain. Value should be a regular expression of type string.
using an object
There are several types that you can select to configure your scraper via objects. the types you can choose from are listed below
text
Use text to extract text from a page
var scraper = new Scraper({
title: {
type:'text', // default value
selector: '.title span',
test: '/[0-9]*/g' // this wil test if the result only contains digits
}
})
url
var scraper = new Scraper({
clickUrl: {
type:'url', // will return the current pageurl
prefix: 'htttp://yourredirect.com/url=' (optional) use when you want to prefix your url,
query: {myparam:''} // (optional) returns the url with only myparam appended to the url
}
})
image
var scraper = new Scraper({
imageUrl: {
type:'image', // will return the src of an image
selector: 'image#myImage'
}
})
regex
var scraper = new Scraper({
title: {
type:'regex',
selector: '.title span',
test: '/€([0-9]*)/g' // wil return the first regex group
}
})
template
var scraper = new Scraper({
title: {
type:'template',
template: 'hello {{name}}' // will return the value of name
},
name: {
type: 'text',
selector: '.profile .name'
}
})
dictionary
var scraper = new Scraper({
custom1:{
type:'dictionary',
selector:'#yourdealCompareBlock > div > div > img',
dictionary:{
'VODAFONE': '/vodafone/g', // result is VODAFOME when vodafone is found in the selector text
'TELFORT': '/telfort/g', // result is TELFORT when telfort is found in the selector text
'T-MOBILE': '/tmobile/g', // result is T-MOBILE when tmobile is found in the selector text
'TELE2': '/tele2/g',
'BEN': '/ben/g',
'KPN': '/kpn/g',
'HI': '/hi/g'
}
},
})
using a string
Just a short hand for a selector with the type text (see example below)
var scraper1 = new Scraper({
title: h1.the-title
})
var scraper2 = new Scraper({
title: {
type:'text',
selector:'h1.the-title' // this is a jquery selector that will look for the first h1 with the class the-title
}
})
using an function
var scraper = new Scraper({
number1: {type:'template', template:'1'}
number2: {type:'template', template:'3'}
sum: function(scraper){
var n1 = Number(scraper.getField('number1'))
var n2 = Number(scraper.getField('number2'))
return n1 + n2 // = 4
}
})
Default fields
by default the scraper only returns the following fields ['id', 'available', 'title', 'imageUrl', 'clickUrl', 'category', 'basket', 'description', 'priceNormal', 'priceDiscount', 'logoUrl', 'stickerText', 'custom1', 'custom2', 'custom3', 'custom4']