liqen-scrapper
v2.1.0
Published
tool to collect news about environmental issues
Downloads
29
Readme
Liqen Scrapper 2
Find news and get the relevant information of them.
This project uses
- Google Custom Search to search into the medias websites.
- Scraping techniques to extract the content of an article.
Usage
This package includes 2 functions that can be used together or separately:
googleSearch(term, options) => Promise<Object>
to perform a Google SearchdownloadArticle(uri) => Promise<Object>
to parse an article
Examples
Using only googleSearch
const { googleSearch } = require('liqen-scrapper')
const options = {
apiKey: 'MY_GOOGLE_API_KEY',
cx: 'MY_CX'
}
googleSearch('climate change', options)
.then(result => result.items)
.then(items => items.forEach(item => {
console.log(item.title)
console.log(item.link)
}))
Using only downloadArticle
const { downloadArticle } = require('liqen-scrapper')
.then(article => {
console.log(article.metadata.title)
console.log(article.body.html.slice(0, 80))
downloadArticle('http://cultura.elpais.com/cultura/2017/02/08/actualidad/1486573775_868895.html')
})
Using both functions together
const { googleSearch, downloadArticle } = require('liqen-scrapper')
const options = {
apiKey: 'MY_GOOGLE_API_KEY',
cx: 'MY_CX'
}
const promiseOfArticles = googleSearch('climate change', options)
.then(result => result.items.map(item => item.link))
.then(links => links.map(downloadArticle))
Promise.all(promiseOfArticles)
.then(articles => articles.map(article => article.body.html))
.then(bodies => {
bodies.forEach(body => {
console.log(body.slice(0,80))
})
})
docs
See /docs
directory for more docs