@keso/scraping
v1.0.5
Published
helpers to scrape a web page and notify some discord channel of the result
Downloads
1
Readme
scraping utilities
Installation
npm i @keso/scraping
Usage
scrape
scrape
is an operation run once. Set it up to run through a
Cron job or similiar for repeated jobs.
Example, scrape.js:
import { scrape } from "@keso/scraping";
scrape("https://petter.envall.se/", parser, analyzer);
// Document parser, returns scraped data
function parser() {
return { pageTitle: document.title, };
}
// Analyze and act on the data that was parsed
async function analyzer(data) {
if (...) {
...
}
}
poll
poll
is a simple way to repeatedly scrape a page
Example, poll.js:
import { poll } from "@keso/scraping";
poll("https://example.com/", parser, analyzer, 5000);
// Document parser, returns scraped data
function parser() {
return { pageTitle: document.title, };
}
// Analyze and act on the data that was parsed
async function analyzer(data) {
if (...) {
...
}
}
getSession
Obtain a session object to navigate, interact and parse data from.
Example, session.js:
import { session } from "@keso/scraping";
async function run() {
const session = await getSession();
await session.nav("https://example.com/");
const data = await session.parse(parser);
const submitButton = await session.page.$(`input[type="submit"]`);
if (submitButton) {
submitButton.click();
await session.page.waitForNavigation();
const data2 = await session.parse(parser);
}
}
function parser() {
return { pageTitle: document.title, };
}
run();
session
API
The session object has the following API
nav(str)
— navigates to a URL
:
await session.nav(url: string);
parse(fn)
— parses the current page using a parser function.
Returns a promise of the data returned from the parser function:
const data = await session.parse(parser: DocumentParser<T>);
page
— getter for the current page object. It is the "page" from
the Puppeteer API.
const button = await session.page.$(`input[type="submit"]`);
if (button) {
button.click();
await session.page.waitForNavigation();
}
setTextFieldValue(value: string, selector: string)
— sets the desired string value
in the corresponding text input or -field.
await setTextFieldValue("foo", `input[name="bar"]`);
close()
— closes the browser session
await session.close();