scraptor
v0.1.0
Published
My way to use Chrome headless and scrape.
Downloads
3
Readme
scraptor
!!This library is a work in progress. The API most likely will change.!!
This library is my attempt to wrap puppeteer
and cheerio
to create a
library that allows me to easily construct web scrapers. A DSL implements
common patterns, while allowing to break out into the underlying libraries if
necessary.
Synopsis
import {browse, once, fillForm, click, html, usingHeadlessBrowser} from "scraptor";
import {flowP} from "combinators-p";
const spinnerDone = "document.querySelector('.spinner').classList.contains('hide')";
const waitForSpinner = once(spinnerDone);
const search = (url, term) =>
flowP([
browse,
waitForSpinner,
fillForm("#search"),
click("button.search"),
waitForSpinner,
html("body"),
], url);
usingHeadlessBrowser(search("https://example.org", "Keith Johnstone"))
.then(console.log); // Prints full HTML
API
usingBrowser
: Execute a scrape in a browser session.usingHeadlessBrowser
: Execute a scrape in a headless browser session.browse
: Visit a URL and load the page.html
: Select the inner HTML of a DOM node.fillForm
: Input a string into a form field.click
: Click on an DOM node.once
: Continue the browser session once a predicate fulfills.onceLoaded
: Continue the browser session once the page loaded.onceMs
: Continue browser session once a set time passes.doUntil
: Run an action once a predicate fulfills.