jest-ai
v2.0.3
Published
Custom jest matchers for testing AI applications
Downloads
352
Readme
The problem
Development of AI tools and applications is a process which requires a lot of manual testing and prompt tweaking. Not only this, but for many developers the world of AI feels like "uncharted land".
This solution
The jest-ai
library provides a set of custom jest matchers
that you can use to extend jest. These will allow testing the calls and responses of LLMs in a more familiar way.
Table of Contents
Installation
This module is distributed via npm which is bundled with node and
should be installed as one of your project's devDependencies
:
npm install --save-dev jest-ai
or
for installation with yarn package manager.
yarn add --dev jest-ai
Usage
First thing first, make sure you have OPENAI_API_KEY
set in your environment variables.
as this library uses the OpenAI API to run the tests.
Import jest-ai
once (for instance in your tests setup
file) and you're good to go:
// In your own jest-setup.js (or any other name)
import "jest-ai";
// In jest.config.js add (if you haven't already)
setupFilesAfterEnv: ["<rootDir>/jest-setup.js"];
With @jest/globals
If you are using @jest/globals
with
injectGlobals: false
, you will need to use a different
import in your tests setup file:
// In your own jest-setup.js (or any other name)
import "jest-ai/jest-globals";
With TypeScript
If you're using TypeScript, make sure your setup file is a .ts
and not a .js
to include the necessary types.
You will also need to include your setup file in your tsconfig.json
if you
haven't already:
// In tsconfig.json
"include": [
...
"./jest-setup.ts"
],
If TypeScript is not able to resolve the matcher methods, you can add the following to your tsconfig.json
:
{
"compilerOptions": {
"types": ["jest", "jest-ai"]
}
}
Custom matchers
toSemanticallyMatch
toSemanticallyMatch();
This allows checking if the response from the AI matches or includes the expected response. It uses semantic comparison, which means that "What is your age?" and "When were you born?" could both pass. This is in order to allow the natural and flexible nature of using AI.
Examples
const response = await ai.getResponse("Hello");
// AI Response: "Hello, I am a chatbot set to help you with information for your flight. Can you please share your flight number with me?"
await expect(response).toSemanticallyMatch("What is your flight number?");
or
await expect("What is your surname?").toSemanticallyMatch(
"What is your last name?"
);
:warning: This matcher is async: use async await when calling the matcher. This library uses a cosine calculation to check the similarity distance between the two strings. When running semantic match, a range of options can pass/fail. Currently, the threshold is set to 0.75.
toSatisfyStatement
toSatisfyStatement();
This checks if the response from the AI satisfies a simple true of false statement. It uses a custom prompt and a separate chat completion to determine the truthiness of the statement. If the truthiness of the statement cannot be determined from the response, the assertion will fail.
Examples
const response = await ai.getResponse("Hello");
// AI Response: "Hello, I am a chatbot set to help you with information for your flight. Can you please share your flight number with me?"
await expect(response).toSatisfyStatement(
"It contains a question asking for your flight number."
);
or
await expect("What is your surname?").toSatisfyStatement(
"It asks for your last name."
);
:warning: This matcher is async: use async await when calling the matcher. This assertion uses the OpenAI chat completion API, using the gpt-4-turbo model by default. As always, be aware of your API usage!
toHaveUsedSomeTools
toHaveUsedSomeTools();
Assert that a Chat Completion response requests the use of a particular tool.
Examples
const getResponse = async () =>
await ai.getResponse("Will my KL1234 flight be delayed?");
await expect(getResponse).toHaveUsedSomeTools(["get_flight_status"]);
await expect(getResponse).toHaveUsedSomeTools([
{ name: "get_flight_status", arguments: "KL1234" },
]);
:warning: This matcher is async: use async await when calling the matcher. This matcher uses the OpenAI chat completion API to check tool calls.
toHaveUsedSomeAssistantTools
toHaveUsedSomeAssistantTools();
Assert that an Assistants API Run response requests the use of a particular tool.
Examples
const assistant = await openai.beta.assistants.create({
name: "Weather Reporter",
instructions: "You are a reporter who answers questions on the weather.",
tools: [getWeatherTool],
model: "gpt-3.5-turbo-0125",
});
const thread = await openai.beta.threads.create();
await openai.beta.threads.messages.create(thread.id, {
role: "user",
content: "What is the weather in New York City?",
});
let run = await openai.beta.threads.runs.create(thread.id, {
assistant_id: assistant.id,
});
// Assert on just function name
await expect(run).toHaveUsedSomeAssistantTools(["getWeather"]);
// Assert on function name and arguments
await expect(run).toHaveUsedAllAssistantTools([
{ name: "getWeather", arguments: "New York City" },
]);
:warning: This matcher is async: use async await when calling the matcher This matcher polls the OpenAI Run API to check for tool calls.
toHaveUsedAllTools
toHaveUsedAllTools();
Checks if all the tools given to the LLM were used. Will fail if any of the tools were not used.
Examples
const getResponse = async () =>
await ai.getResponse("Will my KL1234 flight be delayed?");
await expect(getResponse).toHaveUsedAllTools([
"get_flight_status",
"get_flight_delay",
]);
await expect(getResponse).toHaveUsedAllTools([
{ name: "get_flight_status", arguments: "KL1234" },
{ name: "get_flight_delay", arguments: "KL1234" },
]);
:warning: This matcher is async: use async await when calling the matcher This matcher uses the OpenAI chat completion API to check tool calls.
toHaveUsedAllAssistantTools
toHaveUsedAllAssistantTools();
Assert that an Assistants API Run response requests the use of a particular tool.
Examples
const assistant = await openai.beta.assistants.create({
name: "Weather Reporter",
instructions: "You are a reporter who answers questions on the weather.",
tools: [getWeatherTool],
model: "gpt-3.5-turbo-0125",
});
const thread = await openai.beta.threads.create();
await openai.beta.threads.messages.create(thread.id, {
role: "user",
content: "What is the weather in New York City and in San Francisco?",
});
let run = await openai.beta.threads.runs.create(thread.id, {
assistant_id: assistant.id,
});
// Assert simply on function name
await expect(run).toHaveUsedAllAssistantTools(["getWeather"]);
// Assert on function name and arguments
await expect(run).toHaveUsedAllAssistantTools([
{ name: "getWeather", arguments: "New York City" },
{ name: "getWeather", arguments: "San Francisco" },
]);
:warning: This matcher is async: use async await when calling the matcher This matcher polls the OpenAI Run API to check for tool calls.
toMatchZodSchema
toMatchZodSchema();
Many times, we would like our LLMs to respond in a JSON format that's easier to work with later. This matcher allows us to check if the response from the LLM matches a given Zod schema.
Examples
const response = await ai.getResponse(`
Name 3 animals, their height, and weight. Response in the following JSON format:
{
"animals": [
{
"name": "Elephant",
"height": "3m",
"weight": "6000kg"
},
]
}
`);
const expectedSchema = z.object({
animals: z.array(
z.object({
name: z.string(),
height: z.string(),
weight: z.string(),
})
),
});
expect(getResponse).toMatchZodSchema(expectedSchema);
LICENSE
MIT