@tuplo/fletcher

v2.23.1

Published

3 months ago

Web scraping HTTP request library

Downloads

0High
0Medium
0Low

tuplo

http-client request scraping

@tuplo/fletcher

HTTP request library, focused on web scraping.

Install

$ npm install @tuplo/fletcher

# or with yarn
$ yarn add @tuplo/fletcher

Usage

Fetch a HTML page and parse it using cheerio.

import fetch from '@tuplo/fletcher';

const $page = await fetch.html('https://foo.com/page.html');
const heading = $page.find('body > h1');

Fetch a JSON file and parse it.

const { foo } = await fetch.json('https://foo.com/page.html');

Find a script on a page and evaluate it.

const { foo } = await fetch.script('https://foo.com/page.html', {
  scriptPath: 'script:nth-of-type(3)',
});

Find JSON-LD metadata on a page.

const [jsonld] = await fetch.jsonld('https://foo.com/page.html');

Work with the raw Response.

const res = await fetch.response('https://foo.com');
console.log(res.headers);
console.log(await res.text());

Work with Puppeteer for headless browser automation.

const client = fetch.create({
  browserWSEndpoint: 'ws://localhost:3000',
});
const $page = await client.browser.html('https://foo.com');
const { foo } = await client.browser.json('https://foo.com', /ajax-list/);

Options

| Option | Description | Default | | ------------------- | ------------------------------------------------------------------------------ | ---------------------------------------------------------- | | browserWSEndpoint | Puppeteer web socket address | | cache | Caches requests in memory | false | | delay | Introduce a delay before the request (ms) | 1_000 | | formData | Object with key/value pairs to send as form data | | encoding | The encoding used by the source page, will be converted to UTF8 | | headers | A simple multi-map of names to values | | jsonData | Object with key/value pairs to send as json data | | log | Should log all request URLS to stderr | false | | onAfterRequest | Callback to be called right after request is resolved | proxy | Proxy configuration (host, port, username, password) | | retry | Retries failed responses | async-retry | | scriptFindFn | A function to find a script element on the page, execute and return it | | scriptPath | A CSS selector to pick a script element on the page, execute and return it | | scriptSandbox | An object to use as base on an execution of a piece of code found on the page | | urlSearchParams | A key-value object listing what parameters to add to the query string of url | | userAgent | Set a custom user agent | | validateStatus | A function to decide if the response status is an error and should be thrown |

API

`fletcher(url: string, options?: FletcherOptions) => http.Response`

Generic utility to return a HTTP Response

`fletcher.html(url: string, options?: FletcherOptions) => Cheerio<AnyNode>`

Requests a HTTP resource, parses it using Cheerio and returns its

const $page = await fletcher.html('https://foo.com/page.html');
const heading = $page.find('body > h1');

`fletcher.script<T>(url: string, options?: FletcherOptions) => T`

Requests a HTTP resource, finds a script on it, executes and returns its global context.

const { foo } = await fletcher.script('https://foo.com/page.html', {
	scriptPath: 'script:nth-of-type(3)',
});

`fletcher.text(url: string, options?: FletcherOptions) => string`

Requests a HTTP resource, returning it as a string

`fletcher.cookies(url: string, options?: FletcherOptions) => CookieJar`

Requests a HTTP resources, returning the cookies returned with it.

`fletcher.json<T>(url: string, options?: FletcherOptions) => T`

Requests a HTTP resource, returning it as a JSON object

`fletcher.jsonld(url: string, options?: FletcherOptions) => unknown[]`

Requests a HTTP resource, retrieving all the JSON-LD blocks found on the document

`fletcher.response(url: string, options?: FletcherOptions) => Response`

Requests a HTTP resource, returning the full HTTP Response object

`fletcher.browser.html(url: string) => Cheerio<AnyNode>`

Requests a HTTP resource using Puppeteer/Chrome, parses it using Cheerio and returns its.

`fletcher.browser.json<T>(url: string, requestUrl: string | RegExp) => T`

Requests a HTTP resource using Puppeteer/Chrome, intercepts a request made by that page and returns it as a JSON object

`fletcher.create(options: FletcherOptions) => Object`

Creates a new instance of fletcher with a custom config

const instance = fletcher.create({ headers: { foo: 'bar' } });
await instance.json('http://foo.com');

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

@tuplo/fletcher

Install

Usage

Options

API

fletcher(url: string, options?: FletcherOptions) => http.Response

fletcher.html(url: string, options?: FletcherOptions) => Cheerio<AnyNode>

fletcher.script<T>(url: string, options?: FletcherOptions) => T

fletcher.text(url: string, options?: FletcherOptions) => string

fletcher.cookies(url: string, options?: FletcherOptions) => CookieJar

fletcher.json<T>(url: string, options?: FletcherOptions) => T

fletcher.jsonld(url: string, options?: FletcherOptions) => unknown[]

fletcher.response(url: string, options?: FletcherOptions) => Response

fletcher.browser.html(url: string) => Cheerio<AnyNode>

fletcher.browser.json<T>(url: string, requestUrl: string | RegExp) => T

fletcher.create(options: FletcherOptions) => Object

`fletcher(url: string, options?: FletcherOptions) => http.Response`

`fletcher.html(url: string, options?: FletcherOptions) => Cheerio<AnyNode>`

`fletcher.script<T>(url: string, options?: FletcherOptions) => T`

`fletcher.text(url: string, options?: FletcherOptions) => string`

`fletcher.cookies(url: string, options?: FletcherOptions) => CookieJar`

`fletcher.json<T>(url: string, options?: FletcherOptions) => T`

`fletcher.jsonld(url: string, options?: FletcherOptions) => unknown[]`

`fletcher.response(url: string, options?: FletcherOptions) => Response`

`fletcher.browser.html(url: string) => Cheerio<AnyNode>`

`fletcher.browser.json<T>(url: string, requestUrl: string | RegExp) => T`

`fletcher.create(options: FletcherOptions) => Object`