@elioway/spider
v1.2.5
Published
Scrape Schema.org objects into mongoose schema files the elioWay.
Downloads
27
Maintainers
Readme
spider
Get your schemon! Tim Bushell
Scrape Schema.org objects into mongoose schema files the elioWay.
This is a requirement of bones but it can also be run as the boilerplate of a web spidering project with scheming intentions.
Install
npm install @elioWay/spider --save
Usage
// yourapp.js
const yourAppSpider = require('@elioWay/spider');
var today = new Date()
// Create schemon the spider.
let schemon = new yourAppSpider(
version = today.getFullYear() + '.' + today.getMonth() + '.' + today.getDate(), // Do change.
depth = 2, // The deeper you go, the more objects you get. Go crazy.
thingsSelector = '#thing_tree', // Don't change - but there is a bigger tree on the page.
useOjectFields = true // Instead of 1 to 1 relationships to other Things, force String type.
)
// Let schemon do spider things.
schemon.spider(
// Wrap what schemon scraped.
data => Spider.optimize(
data
)
)
node yourapp
Seeing is believing
git clone https://gitlab.com/elioschemers/spider/
cd spider
node test_spider
Credits
- http://sinonjs.org/
- https://github.com/underscopeio/sinon-mongoose
- https://cheerio.js.org/
- https://stackoverflow.com/questions/34368419/web-scraper-iterating-over-pages-with-rx-js
License
MIT Tim Bushell