@browserless.io/browserless
v2.23.0
Published
The browserless platform
Downloads
1,299
Readme
[!NOTE]
Looking for v1.x.x of browserless? You can find it here, although we recommend migrating to v2.
Browserless allows remote clients to connect and execute headless work, all inside of docker. It supports the standard, unforked Puppeteer and Playwright libraries, as well offering REST-based APIs for common actions like data collection, PDF generation and more.
We take care of common issues such as missing system-fonts, missing external libraries, and performance improvements, along with edge-cases like downloading files and managing sessions. For details, check out the documentation site built into the project which includes Open API docs.
If you've been struggling to deploy headless browsers without running into issues or bloated resource requirements, then Browserless was built for you. Run the browsers in our cloud or your own, free for non-commercial uses.
Table of Contents
- External links
- Features
- How it works
- Extending (NodeJS SDK)
- Debugger
- Usage with other libraries
- Motivations
- Licensing
External links
Features
General
- Parallelism and request-queueing are built-in + configurable.
- Fonts and emoji's working out-of-the-box.
- Debug Viewer for actively viewing/debugging running sessions.
- An interactive puppeteer debugger, so you can see what the headless browser is doing and use its DevTools.
- Works with unforked Puppeteer and Playwright.
- Configurable session timers and health-checks to keep things running smoothly.
- Error tolerant: if Chrome dies it won't.
- Support for running and development on Apple's M1 machines
Cloud-only
Our cloud accounts include all the general features plus extras, such as:
- Inbuilt residential proxy
- /unblock API for avoiding detectors
- Automated captcha solving for getting past mandatory checks
- Hybrid automations for streaming login windows during scripts
- /reconnect API for keeping browsers alive for reuse
- REST APIs for tasks such as retrieving HTML, PDFs or Lighthouse metrics
- SSO, tokens and user roles
How it works
Browserless listens for both incoming websocket requests, generally issued by most libraries, as well as pre-build REST APIs to do common functions (PDF generation, images and so on). When a websocket connects to Browserless it starts Chrome and proxies your request into it. Once the session is done then it closes and awaits for more connections. Some libraries use Chrome's HTTP endpoints, like /json
to inspect debug-able targets, which Browserless also supports.
You still execute the script itself which gives you total control over what library you want to choose and when to do upgrades. This also comes with the benefit of keep your code proprietary and able to run on numerous platforms. We simply take care of all the browser-aspects and offer a management layer on top of the browser.
Docker
[!TIP] See more options on our full documentation site.
docker run -p 3000:3000 ghcr.io/browserless/chromium
- Visit
http://localhost:3000/docs
to see the documentation site. - See more at our docker package.
Hosting Providers
We offer a first-class hosted product located here. Alternatively you can host this image on just about any major platform that offers hosting for docker. Our hosted service takes care of all the machine provisioning, notifications, dashboards and monitoring plus more:
- Easily upgrade and toggle between versions at the press of a button. No managing repositories and other code artifacts.
- Never need to update or pull anything from docker. There's literally zero software to install to get started.
- Scale your consumption up or down with different plans. We support up to thousands of concurrent sessions at a given time.
If you're interested in using this image for commercial aspects, then please read the below section on licensing.
Puppeteer
Puppeteer allows you to specify a remote location for chrome via the browserWSEndpoint
option. Setting this for Browserless is a single line of code change.
Before
const browser = await puppeteer.launch();
After
const browser = await puppeteer.connect({
browserWSEndpoint: 'ws://localhost:3000',
});
Playwright
We support running with playwright via their their browser's remote connection protocols interface out of the box. Just make sure that your Docker image, playwright browser type and endpoint match:
Before
import pw from "playwright";
const browser = await pw.firefox.launch();
After
docker run -p 3000:3000 ghcr.io/browserless/firefox
# or ghcr.io/browserless/multi for all the browsers
import pw from "playwright-core";
const browser = await pw.firefox.connect(
'ws://localhost:3000/firefox/playwright',
);
After that, the rest of your code remains the same with no other changes required.
Extending (NodeJS SDK)
Browserless comes with built-in extension capabilities, and allows for extending nearly any aspect of the system (for Version 2+). For more details on how to write your own routes, build docker images, and more, see our SDK README.md or simply run "npx @browserless.io/browserless create" in a terminal and follow the onscreen prompts.
Debugger
You can install a first-party interactive debugger for Browserless, that makes writing scripts faster and interactive. You can take advantage of things like debugger;
calls and the page's console output to see what's happening on the page while your script is running. All of the Chrome devtools are there at your disposal.
A small list of features includes:
- Running
debugger;
andconsole.log
calls - Errors in the script are caught and show up in the console tab
- DOM inspection, watch network requests, and even see how the page is rendering
- Exporting you debugging script as a Node project
- Everything included in Chrome DevTools
Install debugger
Installing the debugger is as simple as running the install:debugger
script after the project has been built. This way:
$ npm run build
$ npm run install:debugger #or npm install:dev
You will then see the debugger url during the startup process.
---------------------------------------------------------
| browserless.io
| To read documentation and more, load in your browser:
|
| OpenAPI: http://localhost:3000/docs
| Full Documentation: https://docs.browserless.io/
| Debbuger: http://localhost:3000/debugger/?token=6R0W53R135510
---------------------------------------------------------
Usage with other libraries
Most libraries allow you to specify a remote instance of Chrome to interact with. They are either looking for a websocket endpoint, a host and port, or some address. Browserless supports these by default, however if you're having issues please make an issue in this project and we'll try and work with the library authors to get them integrated with browserless. Please note that in V2 we no longer support selenium or webdriver integrations.
You can find a much larger list of supported libraries on our documentation site.
Motivations
Running Chrome on lambda or on your own is a fantastic idea but in practice is quite challenging in production. You're met with pretty tough cloud limits, possibly building Chrome yourself, and then dealing with odd invocation issues should everything else go ok. A lot of issues in various repositories are due to just challenges of getting Chrome running smoothly in AWS (see here). You can see for yourself by going to nearly any library and sorting issues by most commented.
Getting Chrome running well in docker is also a challenge as there's quiet a few packages you need in order to get Chrome running. Once that's done then there's still missing fonts, getting libraries to work with it, and having limitations on service reliability. This is also ignoring CVEs, access-controls, and scaling strategies.
All of these issues prompted us to build a first-class image and workflow for interacting with Chrome in a more streamlined way. With Browserless you never have to worry about fonts, extra packages, library support, security, or anything else. It just works reliably like any other modern web service. On top of that it comes with a prescribed approach on how you interact with Chrome, which is through socket connections (similar to a database or any other external appliance). What this means is that you get the ability to drive Chrome remotely without having to do updates/releases to the thing that runs Chrome since it's divorced from your application.
Licensing
SPDX-License-Identifier: SSPL-1.0 OR Browserless Commercial License.
If you want to use Browserless to build commercial sites, applications, or in a continuous-integration system that's closed-source then you'll need to purchase a commercial license. This allows you to keep your software proprietary whilst still using browserless. You can purchase a commercial license here. A commercial license grants you:
- Priority support on issues and features.
- On-premise running as well as running on public cloud providers for commercial/CI purposes for proprietary systems.
- Ability to modify the source (forking) for your own purposes.
- A new admin user-interface.
Not only does it grant you a license to run such a critical piece of infrastructure, but you are also supporting further innovation in this space and our ability to contribute to it.
If you are creating an open source application under a license compatible with the Server Side License 1.0, you may use Browserless under those terms.