scrappy
v0.6.0
Published
Extract rich metadata from URLs
Downloads
35
Maintainers
Readme
Scrappy
Extract rich metadata from URLs.
Installation
npm install scrappy --save
Usage
Scrappy attempts to parse and extract rich structured metadata from URLs.
import { scraper, urlScraper } from "scrappy";
import * as plugins from "scrappy/dist/plugins";
Scraper
Accepts a request
function and a list of plugins
to use. The request is expected to return a "page" object, which is the same shape as the input to scrape(page)
.
const scrape = scraper({
request,
plugins: [plugins.htmlmetaparser, plugins.exifdata],
});
const res = await fetch("http://example.com"); // E.g. `popsicle`.
await scrape({
url: res.url,
status: res.status,
headers: res.headers.asObject(),
body: res.stream(), // Must stream the request instead of buffering to support large responses.
});
URL Scraper
Simpler wrapper around scraper
that automatically makes a request(url)
for the page.
const scrape = urlScraper({ request });
await scrape("http://example.com");
License
Apache 2.0