llm-rehearsal
v0.0.5
Published
Prompt evaluation and regression testing
Downloads
4
Readme
Rehearsal
Prompt evaluation and regression testing
Modifying a prompt, sometimes even the smallest details, can have a big impact on the output. Rehearsal makes it easy to perform various tests or evaluations against LLM output. Use cases for Rehearsal include:
- regression testing
- QA
- helping with prompt iteration
Installation
yarn add -D llm-rehearsal
Usage
One important aspect of Rehearsal is that it's completely agnostic of what's used to generate the text. Simply provide an async function that returns a {text: "llm response"}
object:
import { rehearsal, expectations } from 'llm-rehearsal';
const { includesString } = expectations;
// Provide an LLM function
const { testCase, run } = rehearsal(async (input: { country: string }) => {
// your custom code to call LLM here
const textResponse = await callLLM({
prompt: `What is the capital of ${country}?`,
});
return { text: textResponse }; // only requirement is to return llm response in `text` property
});
// Define test cases
testCase('France', {
input: { country: 'France' },
expect: [includesString('paris')],
});
testCase('Germany', {
input: { country: 'Germany' },
expect: [includesString('berlin')],
});
// Start test suite
run();
To run the tests, don't forget to call run()
at the end and execute your file (with plain node
for JS or ts-node
for TS).
Expectations for all test cases
To run expectations on all test cases, use expectForAll()
:
const { testCase, run, expectForAll } = rehearsal(
async (input: { country: string }) => {
// your custom code to call LLM here
const textResponse = await callLLM({
prompt: `What is the capital of ${country}?`,
});
return { text: textResponse }; // only requirement is to return llm response in `text` property
},
);
// This expectation will be run for all testCase
expectForAll([not(includesString('as a large language model'))]);
Mixing expectations
Expectations can be composed with boolean logic:
import { rehearsal, expectations } from 'llm-rehearsal';
const { includesString, not, and, or } = expectations;
const { testCase } = rehearsal(llmFunction);
testCase("don't say yellow", {
input: {
/* input variables */
},
expect: [not(includesString('yellow'))],
});
testCase('potato/tomato', {
input: {
/* input variables */
},
expect: [or(includesString('potato'), includesString('tomato'))],
});
testCase('the cake is a lie', {
input: {
/* input variables */
},
expect: [and(includesString('cake'), includesString('lie'))],
});
Built-in expectations
includesString
- checks if the LLM response contains a given stringmatchesRegex
- checks if the LLM response matches a given regular expressionnot
- negates an expectationand
- compose multiple expectations with AND logicor
- compose multiple expectations with OR logic
Coming soon:
includesWord
- check for separate words, not just substringsaskGPT
- perform evaluation through a GPT prompt
Custom expectations
Custom expctations can be easily created:
import { createExpectation } from 'llm-rehearsal';
const { isLongerThan } = createExpectation(
'isLongerThan',
(count: number) => (output) => {
return output.text.length > count
? { pass: true }
: {
pass: false,
message: `Expected output text to be > ${count} characters, but instead is ${output.text.length}`,
};
},
);
// use it as the built-in expectations
testCase('long output', {
input: {
/* input variables */
},
expect: [isLongerThan(9000)],
});
// custom expectations can also be composed with boolean logic:
testCase('long output with sandwich in it', {
input: {
/* input variables */
},
expect: [and(isLongerThan(9000), includesString('sandwich'))],
});
If your function returns more than just a text (such as metadata or results of intermediate steps), you can create type-safe expectations:
import { rehearsal, expectations } from 'llm-rehearsal';
// notice that `createExpectation` is returned by the rehearsal() function,
// and is typed according to the input/output of the LLM function
const { testCase, createExpectation } = rehearsal(
async (input: { country: string }) => {
// your custom code to call LLM here
const { textResponse, documents } = await callLLMChain({
prompt: `What is the capital of ${country}?`,
});
return { text: textResponse, documents }; // we return more than just `text`
},
);
const { usesDocuments } = createExpectation('usesDocuments', () => (output) => {
return output.documents.length > 0 // output is properly typed
? { pass: true }
: { pass: false, message: 'Expected documents to be returned, found none' };
});
Labels for expectations
To make test results more readable, expectations can attached a label:
testCase('my test case', {
input: {},
expect: [
[includesString('banana'), 'include banana'],
[matchesRegex(/^hello/), 'starts with "hello"'],
// also works with composed expectations:
[
not(
or(
includesString('hamburger'),
includesString('fries'),
includesString('hotdog'),
includesString('chicken nuggets'),
includesString('burritos'),
),
),
'no fastfood',
],
],
});
Describe
Just like most testing library, you can group test cases using describe
:
import { rehearsal, expectations, describe } from 'llm-rehearsal';
const { includesString } = expectations;
const { testCase, run } = rehearsal(async (input: { country: string }) => {
// your custom code to call LLM here
const textResponse = await callLLM({
prompt: `What is the capital of ${country}?`,
});
return { text: textResponse };
});
describe('Countries', () => {
testCase('France', {
input: { country: 'France' },
expect: [includesString('paris')],
});
testCase('Germany', {
input: { country: 'Germany' },
expect: [includesString('berlin')],
});
});
Note: describe
does not support only
. This should be supported in the future.
Only
To isolate a test case and run only this one (or only a few), use textCase.only
:
testCase('France', {
input: { country: 'France' },
expect: [includesString('paris')],
});
testCase.only('Germany', {
input: { country: 'Germany' },
expect: [includesString('berlin')],
});
This will only run the Germany
test case. Multiple test case can be marked "only" to run a selected set.
Local development
To install a local build of Rehearsal, the recommended method is to use Yalc. Make sure to install yalc globally.
- Build the library:
yarn build
- Publish to the yalc local store (does not leave your computer):
yarn publish-local
- On the consuming side (the NodeJS project where you want to install Rehearsal):
yalc install llm-rehearsal
Note
Keep in mind that Yalc will copy the package to the store, and then copy it again when installed on the consuming side. After a new build, you'll need to runyarn publish-local
in this repository and alsoyalc update
on the consuming side.