zyte-smartproxy-plugin
v1.0.8
Published
A playwright-extra and puppeteer-extra enhancement with zyte smart proxy manager services
Downloads
163
Readme
Zyte SmartProxy Plugin
A plugin for playwright-extra and puppeteer-extra to provide Smart Proxy Manager specific functionalities.
QuickStart for playwright-extra
- Install Zyte SmartProxy Plugin
npm install playwright playwright-extra zyte-smartproxy-plugin puppeteer-extra-plugin-stealth @cliqz/adblocker-playwright
- Create a file
sample.js
with following content and replace<SPM_APIKEY>
with your SPM Apikey
// playwright-extra is a drop-in replacement for playwright,
// it augments the installed playwright with plugin functionality
const { chromium } = require('playwright-extra')
// add zyte-smartproxy-plugin
const SmartProxyPlugin = require('zyte-smartproxy-plugin');
chromium.use(SmartProxyPlugin({
spm_apikey: '<SPM_APIKEY>',
static_bypass: false, // enable to save bandwidth (but may break some websites)
}));
// add stealth plugin and use defaults (all evasion techniques)
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
chromium.use(StealthPlugin());
// create adblocker to block all ads (saves bandwidth)
const { PlaywrightBlocker } = require('@cliqz/adblocker-playwright');
const fetch = require('cross-fetch');
// playwright usage as normal
(async () => {
const adBlocker = await PlaywrightBlocker.fromPrebuiltAdsAndTracking(fetch);
const browser = await chromium.launch({ headless: false });
const page = await browser.newPage({ignoreHTTPSErrors: true});
// uncomment to enable adBlocker (saves bandwidth but may break some websites)
// adBlocker.enableBlockingInPage(page);
await page.goto('https://toscrape.com', {timeout: 180000});
await page.screenshot({path: 'screenshot.png'})
await browser.close();
})();
Make sure that you're able to make https
requests using Smart Proxy Manager by following this guide Fetching HTTPS pages with Zyte Smart Proxy Manager
- Run
sample.js
using Node
node sample.js
QuickStart for puppeteer-extra
- Install Zyte SmartProxy Plugin
npm install puppeteer puppeteer-extra zyte-smartproxy-plugin puppeteer-extra-plugin-stealth puppeteer-extra-plugin-adblocker
- Create a file
sample.js
with following content and replace<SPM_APIKEY>
with your SPM Apikey
// puppeteer-extra is a drop-in replacement for puppeteer,
// it augments the installed puppeteer with plugin functionality
const puppeteer = require('puppeteer-extra')
// add zyte-smartproxy-plugin
const SmartProxyPlugin = require('zyte-smartproxy-plugin');
puppeteer.use(SmartProxyPlugin({
spm_apikey: '<SPM_APIKEY>',
static_bypass: false, // enable to save bandwidth (but may break some websites)
}));
// add stealth plugin and use defaults (all evasion techniques)
const StealthPlugin = require('puppeteer-extra-plugin-stealth');
puppeteer.use(StealthPlugin());
// uncomment to enable adblocker plugin (saves bandwidth but may break some websites)
// const AdBlockerPlugin = require('puppeteer-extra-plugin-adblocker');
// puppeteer.use(AdBlockerPlugin({blockTrackers: true}));
// puppeteer usage as normal
(async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage({ignoreHTTPSErrors: true});
await page.goto('https://toscrape.com', {timeout: 180000});
await page.screenshot({path: 'screenshot.png'})
await browser.close();
})();
Make sure that you're able to make https
requests using Smart Proxy Manager by following this guide Fetching HTTPS pages with Zyte Smart Proxy Manager
- Run
sample.js
using Node
node sample.js
Zyte SmartProxy Plugin arguments
| Argument | Default Value | Description |
|----------|---------------|-------------|
| spm_apikey
| undefined
| Zyte Smart Proxy Manager API key that can be found on your zyte.com account. |
| spm_host
| http://proxy.zyte.com:8011
| Zyte Smart Proxy Manager proxy host. |
| static_bypass
| true
| When true
Zyte SmartProxy Plugin will skip proxy use (saves proxy bandwidth) for static assets defined by static_bypass_regex
or pass false
to use proxy. |
| static_bypass_regex
| /.*?\.(?:txt\|json\|css\|less\|gif\|ico\|jpe?g\|svg\|png\|webp\|mkv\|mp4\|mpe?g\|webm\|eot\|ttf\|woff2?)$/
| Regex to use filtering URLs for static_bypass
. |
| headers
| {'X-Crawlera-No-Bancheck': '1', 'X-Crawlera-Profile': 'pass', 'X-Crawlera-Cookies': 'disable'}
| List of headers to be appended to requests |
| spm_session_id
| undefined
| When specified Zyte SmartProxy Plugin will use an existing Zyte Smart Proxy Manager session, otherwise a new session will be created. |
Notes
Some websites may not work with AdBlocker or
static_bypass
enabled (default). Try to disable them if you encounter any issues.When using
headless: true
mode, values generated for some browser-specific headers are a bit different, which may be detected by websites. Try using 'X-Crawlera-Profile': 'desktop' in that case:
puppeteer.use(SmartProxyPlugin({spm_apikey: '<SPM_APIKEY>', headers: {'X-Crawlera-No-Bancheck': '1', 'X-Crawlera-Profile': 'desktop', 'X-Crawlera-Cookies': 'disable'}}));
- When connecting to a remote Chrome browser instance, it should be launched with these arguments:
--proxy-server=http://proxy.zyte.com:8011 --disable-site-isolation-trials