get-scraping

v1.1.12

Published

3 months ago

GetScraping.com NodeJS client library

Downloads

0High
0Medium
0Low

bart-getscraping

GetScraping Node.js Client

This is the official Node.js client library for GetScraping.com, a powerful web scraping API service.

Installation

You can install the GetScraping client library using npm, yarn, or pnpm:

# Using npm
npm install get-scraping

# Using yarn
yarn add get-scraping

# Using pnpm
pnpm add get-scraping

Usage

To use the GetScraping client, you'll need an API key from GetScraping.com. Once you have your API key, you can start using the client as follows:

import { GetScrapingClient } from 'get-scraping';

const client = new GetScrapingClient('YOUR_API_KEY');

async function scrapeWebsite() {
  const result = await client.scrape({
    url: 'https://example.com',
    method: 'GET'
  });

  const html = await result.text();
  console.log(html);
}

scrapeWebsite();

Features

The GetScraping client supports a wide range of features, including:

Basic web scraping
JavaScript rendering
Custom headers and cookies
Proxy support (ISP, residential, and mobile)
Retrying requests
Programmable browser actions

API Reference

`GetScrapingClient`

The main class for interacting with the GetScraping API.

const client = new GetScrapingClient(api_key: string);

`scrape(params: GetScrapingParams)`

The primary method for scraping websites.

const result = await client.scrape(params);

`GetScrapingParams`

The GetScrapingParams object supports the following options:

export type GetScrapingParams = {
    /**
     * The url to scrape - should include http:// or https://
     */
    url: string;

    /**
     * The method to use when requesting this url
     * Can be GET or POST
     */
    method: 'GET' | 'POST';

    /**
     * The payload to include in a post request.
     * Only used when method = 'POST'
     */
    body?: string;

    /**
     * When defined, your GetScraping deployment will route the request through a browser
     * with the ability to render javascript and do certain actions on the webpage. 
     */
    js_rendering_options?: JavascriptRenderingOptions;

    /**
     * Define any cookies you need included in your request.
     * ex: `cookies: ['SID=1234', 'SUBID=abcd', 'otherCookie=5678']`
     */
    cookies?: Array<string>;

    /**
     * The headers to attach to the scrape request. We fill in missing/common headers
     * by default — if you want only the headers defined below to be part of the request
     * set 'omit_default_headers' to true.
     */
    headers?: Record<string, string>;

    /**
     * omit_default_headers will pass only the headers you define in the scrape request
     * Defaults to false.
     */
    omit_default_headers?: boolean;

    /**
     * Set to true to route requests through our ISP proxies.
     * Note this may incur additional API credit usage.
     */
    use_isp_proxy?: boolean
}

For more detailed information on these parameters, please refer to the GetScraping documentation.

Examples

Basic Scraping

const result = await client.scrape({
  url: 'https://example.com',
  method: 'GET'
});

const html = await result.text();
console.log(html);

Scraping with JavaScript Rendering

Render Javascript to scrape dynamic sites. Note rendering JS will incur an additional cost (5 requests)

const result = await client.scrape({
  url: 'https://example.com',
  method: 'GET',
  js_rendering_options: {
    render_js: true,
    wait_millis: 5000
  }
});

const html = await result.text();
console.log(html);

Using Various Proxies

Typically the best proxy type for bypassing tough anti-bot measures is mobile, then residential, then ISP, and lastly our default proxies.

We recommend trying request with the default to start and working your way up as necessary as non-default proxies incur additional costs (costs are: 1 request for default proxies, 5 requests for ISP proxies, 25 for residential, and 50 for mobile).

const result = await client.scrape({
  url: 'https://example.com',
  method: 'GET',
  use_residential_proxy: true
});

const html = await result.text();
console.log(html);

Retrying Requests

const result = await client.scrape({
  url: 'https://example.com',
  method: 'GET',
  retry_config: {
    num_retries: 3,
    success_status_codes: [200]
  }
});

const html = await result.text();
console.log(html);

Advanced Usage

For more advanced usage, including programmable browser actions and intercepting requests, please refer to the GetScraping documentation.

Support

If you encounter any issues or have questions, please visit our support page or open an issue in the GitHub repository.

License

This project is licensed under the ISC License. See the LICENSE file for details.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme