npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

webmarker-js

v0.5.0

Published

A library for marking web pages for Set-of-Mark prompting with vision-language models.

Downloads

70

Readme

Overview

WebMarker adds visual markings with labels to elements on a web page. This can be used for Set-of-Mark prompting, which improves visual grounding abilities of vision-language models such as GPT-4o, Claude 3.5, and Google Gemini 1.5.

Screenshot of marked Google homepage

How it works

1. Call the mark() function

This marks the interactive elements on the page, and returns an object containing the marked elements, where each key is a mark label string, and each value is an object with the following properties:

  • element: The interactive element that was marked.
  • markElement: The label element that was added to the page.
  • boundingBoxElement: The bounding box element that was added to the page.

You can use this information to build your prompt for the vision-language model.

2. Send a screenshot of the marked page to a vision-language model, along with your prompt

Example prompt:

let markedElements = mark();

let prompt = `The following is a screenshot of a web page.

Interactive elements have been marked with red bounding boxes and labels.

When referring to elements, use the labels to identify them.

Return an action and element to perform the action on.

Available actions: click, hover

Available elements:
${Object.keys(markedElements)
  .map((label) => `- ${label}`)
  .join("\n")}

Example response: click 0
`;

3. Programmatically interact with the marked elements.

In a web browser (i.e. via Playwright), interact with elements as needed.

For prompting or agent ideas, see the WebVoyager paper.

Playwright example

// Inject the WebMarker library into the page
await page.addScriptTag({
  url: "https://cdn.jsdelivr.net/npm/webmarker-js/dist/main.js",
});

// Mark the page and get the marked elements
let markedElements = await page.evaluate(async () => await WebMarker.mark());

// Click a marked element
await page.locator('[data-mark-label="0"]').click();

// (Optional) Check if page is marked
let isMarked = await page.evaluate(async () => await WebMarker.isMarked());

// (Optional) Unmark the page
await page.evaluate(async () => await WebMarker.unmark());

Options

selector

A custom CSS selector to specify which elements to mark.

  • Type: string
  • Default: "button, input, a, select, textarea"

getLabel

Provide a function for generating labels. By default, labels are generated as integers starting from 0.

  • Type: (element: Element, index: number) => string
  • Default: (_, index) => index.toString()

markAttribute

A custom attribute to add to the marked elements. This attribute contains the label of the mark.

  • Type: string
  • Default: "data-mark-label"

markPlacement

The placement of the mark relative to the element.

  • Type: 'top' | 'top-start' | 'top-end' | 'right' | 'right-start' | 'right-end' | 'bottom' | 'bottom-start' | 'bottom-end' | 'left' | 'left-start' | 'left-end'
  • Default: 'top-start'

markStyle

A CSS style to apply to the label element. You can also specify a function that returns a CSS style object.

  • Type: Partial<CSSStyleDeclaration> | (element: Element) => Partial<CSSStyleDeclaration>
  • Default: {backgroundColor: "red", color: "white", padding: "2px 4px", fontSize: "12px", fontWeight: "bold"}

boundingBoxStyle

A CSS style to apply to the bounding box element. You can also specify a function that returns a CSS style object. Bounding boxes are only shown if showBoundingBoxes is true.

  • Type: Partial<CSSStyleDeclaration> | (element: Element) => Partial<CSSStyleDeclaration>
  • Default: {outline: "2px dashed red", backgroundColor: "transparent"}

showBoundingBoxes

Whether or not to show bounding boxes around the elements.

  • Type: boolean
  • Default: true

containerElement

Provide a container element to query the elements to be marked. By default, the container element is document.body.

  • Type: Element
  • Default: document.body

viewPortOnly

Only mark elements that are visible in the current viewport.

  • Type: boolean
  • Default: false

Advanced example

const markedElements = mark({
  // Only mark buttons and inputs
  selector: "button, input",
  // Use test id attribute for marker labels
  markAttribute: "data-test-id",
  // Use a blue mark with white text
  markStyle: { color: "white", backgroundColor: "blue", padding: 5 },
  // Use a blue dashed outline with a transparent and slighly blue background
  boundingBoxStyle: { outline: "2px dashed blue", backgroundColor: "rgba(0, 0, 255, 0.1)"},
  // Place the mark at the top right corner of the element
  markPlacement: "top-end";
  // Show bounding boxes over elements (defaults to true)
  showBoundingBoxes: true,
  // Generate labels as 'Element 0', 'Element 1', 'Element 2'...
  // Defaults to '0', '1', '2'... if not provided.
  getLabel: (element, index) => `Element ${index}`,
  // A custom container element to query the elements to be marked.
  // Defaults to the document.body.
  containerElement: document.body.querySelector("main"),
  // Only mark elements that are visible in the current viewport
  viewPortOnly: true,
});