automation-extra-interception-proxy
v0.9.0
Published
Puppeteer extra plugin for advanced interception and proxying requests
Downloads
16
Readme
automation-extra-interception-proxy
Simple way to play with a site requests and responses.
Spiritual heir of puppeteer-page-proxy. Just the same behavior but in more extend way with promises.
Using a proxy is optional.
Supported proxy(through proxy-agent):
http://proxy-server-over-tcp.com:3128
https://proxy-server-over-tls.com:3129
socks://username:[email protected]:9050
(username & password are optional)socks5://username:[email protected]:9050
(username & password are optional)socks4://some-socks-proxy.com:9050
pac+http://www.example.com/proxy.pac
Tested in puppeteer/chromium
only!
Table of Contents
Known issues in managed mode
- Grease chips are missing.
- Some headers are missing.
- [CORS] OPTIONS requests (preflight requests) are missing before the actual request will be executed.
- [CORS] Headers are close to being correct, but they're not.
- WebSockets will be handled by the browser (IP leak may occur if you are using a proxy in the package but not in Puppeteer itself).
- Optimization can be bad on high load.
Installing
Using npm
npm install automation-extra-interception-proxy
Using yarn
yarn add automation-extra-interception-proxy
Why?
This package solves next problems:
1. Traffic sniffing
Time to time required to reach information from the browser request. By default you can reach easily only to headers information. If you want to just read all responses you also can do that but time to time it will throw errors by one of next reasons.
At first page can be already closed and then your code will throw an error.
At second some sites using service workers for requesting some information. Unfortunately you cant handle this situation without manual requesting and then converting to puppeteer.
2. Data manipulation
If you want just adjust some requests or responses you should do that manually.
Example. You want get original request/response and do some adjustments. This package will help do that easily. You just getting what you want by single function call.
3. Set proxy
Yes, puppeteer already have a proxy support throw additional process arguments. But you should manually maintain proxy credentials each request(?, not sure). Also you cant use socks proxy(?, not sure).
4. Asynchronous decisions for requests
Even with cooperative mode you can not make your decisions asynchronously. Here you can chain of handlers with will proceed request decision one by one. Also you can say that this is latest decision and no need to ask another handlers in the chain. Also in one handler you can can adjust request/response for the next one.
Motivation
We live in the world where almost each website have internal api. When you are looking at the network tab in Chrome DevTools its easy to handle where and what. Data already yours but you cant just get what you want. But you have to fight for the information you desired for. So lets fight together!
API
Table of Contents
- wrapPage
- IConfig
- continue
- ignoreResponseBodyIfPossible
- flushLocal
- recordError
- recordInternalError
- recordWarning
- RequestMode
- RequestStage
- IRequestOptions
- _bodyError
- IAbortReason
- InterceptionProxyRequest
wrapPage
Add interception ability to the page (sample)
Parameters
page
Puppeteer.Page Page for future interceptionsconfig
IConfig?
Returns Promise<InterceptionProxyPageConfig>
IConfig
Plugin configuration object
cooperativePriority
Puppeteer' "Cooperative Intercept Mode" priority
This package using own way to manage cooperation
Use only if you know what it does
requestMode
ignore
- Plugin will do nothing about original request
native
- Plugin will just listen to the original request/response data and all requests will fulfilled by puppeteer itself. But some plugin functionality can be unavailable.
managed
- Plugin will do all requests by requestHandlers
or by himself. All plugin features will be available.
Default - managed
Type: RequestMode
proxy
Proxy for request
Automatically sets agent
property using proxy-agent
Examples:
http://proxy-server-over-tcp.com:3128
https://proxy-server-over-tls.com:3129
socks://username:[email protected]:9050
(username & password are optional)socks5://username:[email protected]:9050
(username & password are optional)socks4://some-socks-proxy.com:9050
pac+http://www.example.com/proxy.pac
Default null
Type: (string | null)
agent
Your agent hot handling requests
Sets by proxy
property. Cleans proxy
property if sets directly.
Default null
Type: (Agent | null)
Meta
- deprecated: Use
proxy
property instead. Deprecated because of possibly incoming request handling rework.
logger
You can handle all plugins messages
Type: any
timeout
Request timeout in milliseconds(actual execution only)
Type: number
nativeContinueIfPossible
src/interfaces/base.ts:103-103
If you didn't changed request or response, let puppeteer handle this request by himself
Default: false
Type: boolean
ignoreResponseBodyIfPossible
src/interfaces/base.ts:113-113
If you did not use the plugin' response object it will not retrieve response from puppeteer for better performance
Applies for native
mode only
Type: boolean
enableLegacyCookieHandling
src/interfaces/base.ts:121-121
For old versions of puppeteer, plugin should handle cookies by himself.
Enable this option, if you are have an issue with cookie.
Recommended to upgrade your puppeteer version instead.
Type: boolean
gotHooks
src/interfaces/base.ts:129-129
It is not recommended to use. Use another library properties to do it.
Modify requests in more advanced way through interaction with got.
Type: Hooks
continue
src/interfaces/classes.ts:42-42
Will send gathered response back to the puppeteer immediately
If response not collected yet will call getResponse first.
Returns Promise<void>
ignoreResponseBodyIfPossible
src/interfaces/mixins.ts:14-14
If you are using this specific method global ignoreResponseBodyIfPossible
will be ignored
Type: boolean
flushLocal
src/interfaces/mixins.ts:43-43
Flush local configuration
Parameters
key
any? If provided will flush only specific parameter at local level
Returns void
recordError
src/interfaces/mixins.ts:55-59
Pass an error to the logger
Parameters
message
any Flow descriptionerror
any? Original error objectmeta
...any non specific meta information
Returns void
recordInternalError
src/interfaces/mixins.ts:65-68
Pass an internal error to the logger
Parameters
message
any Flow/error descriptionmeta
...any non specific meta information
Returns void
recordWarning
src/interfaces/mixins.ts:74-77
Pass an warn to the logger
Parameters
message
any Flow/error descriptionmeta
...any non specific meta information
Returns void
RequestMode
src/interfaces/network.ts:7-21
Plugin mode for handling requests
ignore
src/interfaces/network.ts:11-11
Plugin will do nothing about original request
Type: string
native
src/interfaces/network.ts:16-16
Plugin will just listen to the original request/response data and all requests will fulfilled by puppeteer itself. But some plugin functionality can be unavailable.
Type: string
managed
src/interfaces/network.ts:20-20
Plugin will do all requests by himself. All plugin features will be available.
Type: string
RequestStage
src/interfaces/network.ts:26-65
Current stage of the request
gotRequest
src/interfaces/network.ts:35-35
We got a new request from the puppeteer witch includes all necessary information about.
At this stage we can adjust request.
Type: string
sentRequest
src/interfaces/network.ts:42-42
The request in requesting process
At this stage we unable to adjust request but still have not response to go forward.
Type: string
gotResponse
src/interfaces/network.ts:50-50
We got response from the request witch probably was modified by the user and now user can adjust the response.
At this stage we can adjust response. At this stage the user will unable to override the request anymore.
Type: string
sentResponse
src/interfaces/network.ts:57-57
We sent final response of the request to the browser.
Its too late to adjust request or response.
Type: string
closed
src/interfaces/network.ts:64-64
Page were closed and we unable do anything
From technical perspective sentResponse
looks just the same
Type: string
IRequestOptions
src/interfaces/network.ts:72-100
Plugin' request options. The request have significant difference with Puppeteer' request.
Can be modified. All changes will be applied to the actual Puppeteer' request and will be executed
method
src/interfaces/network.ts:78-78
Request method.
If request were executed you will unable to change this property.
Type: Method
url
src/interfaces/network.ts:85-85
Request url.
If request were executed you will unable to change this property.
Type: string
headers
src/interfaces/network.ts:92-92
Request headers.
If request were executed you will unable to change this property.
Type: Headers
body
src/interfaces/network.ts:99-99
Request body.
If request were executed you will unable to change this property.
Type: (string | Buffer | undefined)
_bodyError
src/interfaces/network.ts:108-108
Type: string
IAbortReason
src/interfaces/network.ts:129-129
This option will override the response
aborted
- An operation was aborted (due to user action).accessdenied
- Permission to access a resource, other than the network, was denied.addressunreachable
- The IP address is unreachable. This usually means
that there is no route to the specified host or network.
blockedbyclient
- The client chose to block the request.blockedbyresponse
- The request failed because the response was delivered along with requirements which are not met ('X-Frame-Options' and 'Content-Security-Policy' ancestor checks, for instance).connectionaborted
- A connection timed out as a result of not receiving an ACK for data sent.connectionclosed
- A connection was closed (corresponding to a TCP FIN).connectionfailed
- A connection attempt failed.connectionrefused
- A connection attempt was refused.connectionreset
- A connection was reset (corresponding to a TCP RST).internetdisconnected
- The Internet connection has been lost.namenotresolved
- The host name could not be resolved.timedout
- An operation timed out.failed
- A generic failure occurred.
Type: ErrorCode
InterceptionProxyRequest
Extends RequestBase
Plugin' request. The request have significant difference with Puppeteer' request.
Parameters
initial
INewRequestInitialArgsrequestOptions
IRequestOptions
Samples
Page proxy
/**
* This example will show how to enable proxy for single page.
*/
// require libs
const puppeteer = require('puppeteer');
const InterceptionUtils = require('automation-extra-interception-proxy');
// do everything async
(async () => {
// launch some browser
const browser = await puppeteer.launch({
headless: false,
});
// get some page
const page = await browser.newPage();
// attach interception commands
await InterceptionUtils.wrapPage(page, {
requestMode: "managed",
// optional, will be handled by https://www.npmjs.com/package/proxy-agent
proxy: "socks5://username:[email protected]:9050"
});
// goto to our destination and wait for the response
await page.goto('https://www.npmjs.com/package/automation-extra-interception-proxy');
// closing browser
await browser.close();
})(); // ent of our thread
single page interception
/**
* This example will show how to enable interceptions for single page.
*
* This code will get some wallpaper image urls from bing.com
*
* This code could be broken if their behavior was changed.
*/
// require libs
const puppeteer = require('puppeteer');
const InterceptionUtils = require('automation-extra-interception-proxy');
// do everything async
(async () => {
// launch some browser
const browser = await puppeteer.launch({
headless: false,
});
// get some page
const page = await browser.newPage();
// attach interception commands
await InterceptionUtils.wrapPage(page, {
requestMode: "managed",
// optional, will be handled by https://www.npmjs.com/package/proxy-agent
// proxy: "socks5://username:[email protected]:9050"
});
// create promise callback for async processing
let callback;
const promise = new Promise((resolve) => { callback = resolve; });
// add some listener
page.interceptions.addRequestListener('bing-images', async request => {
// filter anything else
if (request.url !== 'https://www.bing.com/hp/api/model') {
// just letting you know that we got something else here
console.log('Ignoring', request.url.slice(0, 50));
return
}
// get response data
const response = await request.getResponse();
// grab data directly from their api response
const apiData = response.json;
// doing anything you like
const imageUrls = apiData.MediaContents.map(({ ImageContent }) =>
`https://www.bing.com${ImageContent.Image.Url}`);
// back to async thread
callback(imageUrls);
}); // end of listener
// goto to our destination and wait for the response
const [imageUrls] = await Promise.all([
promise,
page.goto('https://www.bing.com/'),
]);
// print our image urls
console.log('imageUrls', imageUrls);
// not necessary: cleaning our listener
page.interceptions.deleteLocalRequestListener('bing-images');
// closing browser
await browser.close();
})(); // ent of our thread
Open other samples
Troubleshooting
Cookies does not work
Probably you're using old version of puppeteer. Try you upgrade first.
In case if you don't want to or cookies still does not work enable enableLegacyCookieHandling.
Does cors requests are broken?
Yes, the implementation is still raw.
TODO:
- finalize cors managed requests - need to pass cors test
- add tests
- plugin flow
- documentation
- improve
docs
command
- improve
- describe
wrapPage
- describe
- describe
InterceptionProxyPlugin
class
- describe
- add more proxy api
- waitRequest
- websocket support
- migrate to automation-extra-plugin
- support Grease cipher
License
Copyright © 2021 - 2023, Utyfua. Released under the MIT License.