@browserbasehq/mcp-stagehand

v0.5.1

Published

3 months ago

MCP server for AI web browser automation using Stagehand

Downloads

0High
0Medium
0Low

paul-klein-browserbase

desaadi

anirudhkamath

Stagehand MCP Server

cover

A Model Context Protocol (MCP) server that provides AI-powered web automation capabilities using Stagehand. This server enables LLMs to interact with web pages, perform actions, extract data, and observe possible actions in a real browser environment.

Get Started

Run npm install to install the necessary dependencies, then run npm run build to get dist/index.js.
Set up your Claude Desktop configuration to use the server.

{
  "mcpServers": {
    "stagehand": {
      "command": "node",
      "args": ["path/to/mcp-server-browserbase/stagehand/dist/index.js"],
      "env": {
        "BROWSERBASE_API_KEY": "<YOUR_BROWSERBASE_API_KEY>",
        "BROWSERBASE_PROJECT_ID": "<YOUR_BROWSERBASE_PROJECT_ID>",
        "OPENAI_API_KEY": "<YOUR_OPENAI_API_KEY>",
      }
    }
  }
}

Restart your Claude Desktop app and you should see the tools available clicking the 🔨 icon.
Start using the tools! Below is a demo video of Claude doing a Google search for OpenAI using stagehand MCP server and Browserbase for a remote headless browser.

Tools

Stagehand commands

stagehand_navigate
- Navigate to any URL in the browser
- Input:
  - url (string): The URL to navigate to
stagehand_act
- Perform an action on the web page
- Inputs:
  - action (string): The action to perform (e.g., "click the login button")
  - variables (object, optional): Variables used in the action template
stagehand_extract
- Extract data from the web page based on an instruction and schema
- Inputs:
  - instruction (string): Instruction for extraction (e.g., "extract the price of the item")
  - schema (object): JSON schema for the extracted data
stagehand_observe
- Observe actions that can be performed on the web page
- Input:
  - instruction (string, optional): Instruction for observation

Resources

The server provides access to two types of resources:

Console Logs (console://logs)
- Browser console output in text format
- Includes all console messages from the browser
Screenshots (screenshot://<name>)
- PNG images of captured screenshots
- Accessible via the screenshot name specified during capture

Key Features

AI-powered web automation
Perform actions on web pages
Extract structured data from web pages
Observe possible actions on web pages
Simple and extensible API
Model-agnostic support for various LLM providers

License

Licensed under the MIT License.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme