qd-scraper
v1.0.4
Published
Quick and dirty way to scrape specific html tags from a website for text data.
Downloads
3
Readme
https://github.com/benlazzero/Quick-Dirty-Scrape
For scraping website text data quick and dirty.
Proabaly not the most accurate scraper/parser one-liner, however it's very simple to use and most often returns useable results.
QDScraper boasts zero 3rd-party dependencies by making use of the nodejs https library. This package will most likely DOA.. However I will merge all pulls that pass tests, simplify the code, and add accuracy.
NPM Installation
WARNING: NODE >= 16.17.0
cd your-root-dir
npm install qd-scraper
Example
var qdScraper = require("qd-scraper")
const scrapeSite = async() => {
let ArrayOfData = await qdScraper('https://website.com/', 'div');
console.log(ArrayOfData);
}
scrapeSite(); // ['text from first div', 'text from second div'...]
Behavior
NOTE: Returns a promise
On success the scraper will return an array of strings containing the text in between the tag specified.
On failure the scraper will return an empty array. Could be the case when the url is bad or tag is not found.
If a tag is not specified as a parameter the scraper will default to a <li>
tag.
Build
git clone https://github.com/benlazzero/Quick-Dirty-Scrape
cd Quick-Dirty-Scrape
npm install
Running tests with Jest
npm test
Running the example file (uses nodemon to restart on saves)
npm start