hapi-goldwasher
v1.0.4
Published
A plugin for Hapi.js to run goldwasher as a scraping API on the web.
Downloads
3
Maintainers
Readme
hapi-goldwasher
A plugin for hapi to run goldwasher as a scraping API on the web. Basically a scraper proxy that will return information in the selected format, defaulting to JSON.
Installation
npm install hapi-goldwasher
If you aren't already running a hapi server, you need to install this too, to run the example:
npm install hapi
Options
When registering the plugin with hapi, you have several options, non of them required:
path
- the endpoint you mount the plugin on. Defaults to/goldwasher
.maxRedirects
- the maximum number of redirects the scraper will accept before giving up. Defaults to5
.cors
- a CORS object. Defaults tofalse
. See hapi docs for more information.raw
- enable raw output mode. This will enableoutput=raw
that will return the raw, scraped result, usually HTML.
Parameters
url
- url to scrape. Required.selector
- cheerio (jQuery) selector, a selection of target tags. Defaults to the default of goldwasher, usually'h1, h2, h3, h4, h5, h6, p'
.search
- only pick results containing these terms. Not case or special character sensitive.limit
- limit number of results.output
- output format (json
,xml
,atom
,rss
or - if enabled -raw
).filterTexts
- stop texts that should be excluded.filterKeywords
- stop words that should be excluded as keywords.filterLocale
- stop words from external JSON file (see documentation on goldwasher)).
Example
var Hapi = require('hapi');
var HapiGoldwasher = require('./index');
var server = new Hapi.Server();
server.connection({ port: 7979 });
server.register({
register: HapiGoldwasher,
options: {
path: '/goldwasher',
cors: {
origin: ['*']
}
}
}, function(err) {
if (err) {
throw err;
}
server.start(function() {
console.log('Server running at: ' + server.info.uri);
});
});
Go to the server uri and you will be presented with a JSON response containing documentation. I recommend using something like the Chrome JSON Formatter for readability.