npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

navaiguide-ts

v0.0.3

Published

NavAIGuide: Extensible components toolkit for integrating LLM Web Agents and Browser Companions, enhancing web interfaces with intelligent AI capabilities.

Downloads

7

Readme

🤖 NavAIGuide-TS

🤔 What is NavAIGuide?

NavAIGuide is a TypeScript Extensible components toolkit for integrating LLMs into Navigation Agents and Browser Companions. Key features include:

  • Natural Language Task Detection: Supports both visual (using GPT-4V) and textual modes to identify tasks from web pages.
  • Automation Code Generation: Automates the creation of code for predicted tasks with options for Playwright (requires Node) or native JavaScript Browser APIs.
  • Visual Grounding: Enhances the accuracy of locating visual elements on web pages for better interaction.
  • Efficient DOM Processing and Token Reduction: Utilizes advanced strategies for DOM element management, significantly reducing the number of tokens required for accurate grounding and action detection.
  • Reliability: Includes a retry mechanism with exponential backoff to handle transient failures in LLM calls.
  • JSON Mode & Action-based Framework: Utilizes JSON mode and reproducible outputs for predictable outcomes and an action-oriented approach for task execution.

NavAIGuide Agents extend the core toolkit with advanced automation solutions:

  • Preview of Playwright-based Agents: Initial offerings for browser automation.
  • Cross-platform Appium Support: Future updates will introduce compatibility with Appium for broader device coverage.

NavAIGuide aims to streamline the development process for web navigation assistants, offering a comprehensive suite of tools for developers to leverage LLMs in web automation efficiently.

⚡️ Quick Install

You can use npm, yarn, or pnpm to install NavAIGuide

npm:

  npm install navaiguide-ts
  // With Playwright:
  npm install --save-dev "@playwright/test"
  npx playwright install

Yarn:

  yarn add navaiguide-ts
  // With Playwright:
  yarn add --dev "@playwright/test"
  npx playwright install

💻 Getting Started

Prerequisites

  • Node.js
  • Access to OpenAI or AzureAI services
  • Playwright for automation capabilities

OpenAI & AzureAI Key Configuration

Configure the necessary environment variables. For example locally through .env.local (requires dotenv):

  • OPENAI_API_KEY: Your OpenAI API key.
  • Azure AI API keys and related configurations. Note that due to region availability of different classes of models, more than 1 Azure AI Project deployment might be required.
    • AZURE_AI_API_GPT4TURBOVISION_DEPLOYMENT_NAME: Deployment of GPT-4 Turbo with Vision.
    • AZURE_AI_API_GPT35TURBO_DEPLOYMENT_NAME: Deployment of GPT3.5 Turbo with JSON mode.
    • AZURE_AI_API_GPT35TURBO16K_DEPLOYMENT_NAME: Deployment of GPT-3.5 with 16k max request tokens.
    • AZURE_AI_API_GPT4TURBOVISION_KEY: GPT-4 Turbo with Vision API Key
    • AZURE_AI_API_GPT35TURBO_KEY: GPT3.5 Turbo with JSON mode and GPT-3.5 with 16k API Key
    • AZURE_AI_API_GPT35TURBO_INSTANCE_NAME: GPT-4 Turbo with Vision API Key Instance Name
    • AZURE_AI_API_GPT4TURBOVISION_INSTANCE_NAME: GPT3.5 Turbo with JSON mode and GPT-3.5 with 16k Instance Name

You can also explicitly provide the variables as part of the constructor of the NavAIGuide class.

NavAIGuide Agent

The NavAIGuideAgent base class orchestrates the process of performing and reasoning about actions on a web page towards achieving a specified end goal.

Example Playwright Agent scenario:

import { Page } from "@playwright/test";
import { PlaywrightAgent } from "navaiguide-ts";

let navAIGuideAgent = new PlaywrightAgent({
  page: playwrightPage
  openAIApiKey: "API_KEY", // if not provided as process.env
});
const findResearchPaperQuery = "Help me view the research paper titled 'Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V' and download its pdf.";

const results = await navAIGuideAgent.runAsync({
  query: findResearchPaperQuery
});

for (const result of results) {
  console.log(result);
}

NavAIGuide Core Functionalities

import { NavAIGuide } from "navaiguide-ts";
let navAIGuide: NavAIGuide = new NavAIGuide({
    openAIApiKey: "API_KEY", // if not provided as process.env
});

Some of the queries NavAIGuide is able to handle today:

const findResearchPaperQuery = 
  "Help me find the research paper titled 'Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V'.";

const candleLightsTicketQuery =
  "Help me find tickets for the earliest candle lights concert in Dublin"

const dundrumCinemaQuery =
  "What movies are playing in Dundrum Dublin cinema today?";

NavAIGuide underlying process is divided into distinct steps:

Start Task Identification: The Agent determines the starting point based on the nature of the query. It could be a general search engine, specialized services, or a specific URL.

const startTask = await navAIGuide.classifyStartTask({
  endGoal: findResearchPaperQuery,
});

await playwrightPage.goto(startTask.startPage);

HTML DOM and Screenshot Processing: The page's HTML DOM is analyzed, condensed and chunked. Screenshots are also taken if running in Visual mode.

const inputPage = await NavAIGuidePage.fromPlaywrightAsync({ playwrightPage });

Action Prediction: Depending on the specified end goal and past actions, the Agent predicts the next action using either:

  • Textual Analysis: (Faster, less reliable) Involves grounding website stucture from text and predicting actions based on them.
  • Visual Analysis: (Slower, more reliable) Employs GPT4-V for processing screenshots, focusing on visual elements to guide actions.
const previousActions: NLAction[] = [ .. ]; // Any previous NL actions as history
const nextAction = await navAIGuide.predictNextNLAction({
  page: inputPage,
  endGoal: findResearchPaperQuery,
  previousActions: previousActions,
  mode: "visual", // or textual
});

Code Inference and Automation: Converts the predicted natural language action into automation code (Playwright, JS Browser APIs), including retry patterns for feedback on unsuccessful attempts.

const codeActionResult = await navAIGuide.runCodeActionWithRetry({
  inputPage: inputPage,
  endGoal: findResearchPaperQuery,
  nextAction: nextAction,
  maxRetries: 3,
  codeEvalFunc: 
    // Your logic for injecting the code into the page goes here.
    async (code) => await tryAsyncEval({ page: playwrightPage }, code),
});

Reasoning Steps: Two reasoning steps can optionally be performed to improve the reliability of NavAIGuide-based agents:

  • Reasoning whether the action held the expected result by assessing any page state changes.
const actionFeedbackReasoningResult =
  await this.navAIGuide.RunActionFeedbackReasoning({
    beforePage: currentNavAIGuidePage,
    afterPage: nextNavAIGuidePage,
    takenAction: nextAction,
  });

if (
  !actionFeedbackReasoningResult ||
  !actionFeedbackReasoningResult.actionSuccess
) {
  console.log(`The action did not hold the expected results: ${actionFeedbackReasoningResult.pageStateChanges}.`);
  nextAction.actionSuccess = false;
  continue;
}
console.log(`The action held the expected results: ${actionFeedbackReasoningResult.pageStateChanges}.`);
  • Reasoning whether the end goal has been achieved and retrieving any relevant information.
const { endGoalMet, relevantData } = await this.navAIGuide.RunGoalCheckReasoning({
  page: nextNavAIGuidePage,
  endGoal: findResearchPaperQuery,
  newInformation: actionFeedbackReasoningResult.newInformation,
});

if (endGoalMet) {
  console.log(`Goal was met. Found relevant data:`);
  for (const data of relevantData) {
    if (data[0] && data[1]) {
      console.log(`${data[0]} - ${data[1]}`);
    }
  }
  return relevantData;
}

🚀 Challenges and Focus

Project NavAIGuide still faces challenges in long-horizon planning and code inference accuracy. Current focus is on enhancing the stability of the NavAIGuide agent.

🤓 Contributing

We welcome contributions. Please follow the standard fork-and-pull request workflow for your contributions.

🛂 License

NavAIGuide is under the MIT License.

🚑 Support

For support, questions, or feature requests, open an issue in the GitHub repository.