langium-ai

v0.1.6

Published

9 days ago

Tooling for building AI Agents for Langium DSLs

Downloads

164

0High
0Medium
0Low

montymxb

langium ai evaluations

Langium AI

Notice: This package has been moved over to langium-ai-tools! Please use the new package going forward. The one here has been deprecated in the meantime.

Overview

This project makes it easier to build Agents for Langium DSLs. In particular Langium AI tries to help solve the following problems:

How to pick a good base model to start developing with
How to develop good natural language interfaces for DSLs
How to build an agent stack that can support development of programs in Langium DSLs
How to split DSL documents in a way that makes sense for the language and agent
How to evaluate DSL output from an agent with respect to the language's syntax & semantics

To solve these problems Langium AI provides these key features:

Splitting Support: Using your DSL's parser to make it easier to pre-process documents before ingest (such as into a vector DB)
Evaluation Support: Assess the output of your model + RAG + whatever else you have in your stack with regards to a structured input/output evaluation suite.

So in a nutshell, Langium AI is a tool to help build document splitting logic for DSLs, and to build proper DSL evaluations for model output.

What's also important is what Langium AI doesn't provide, and why:

We don't choose your model for you. We believe this is your choice, and we don't want to presume we know best or lock you in. All we assume is that you have a model (or stack) that we can use.
We don't choose your stack for you. There are many excellent choices for hosting providers, databases, caches, and other supporting services (local & remote). There's so many, and they change so often, that we decided it was best to not assume what works here, and rather support preparing information for whatever stack you choose.

LLMs (and transformers in general), are evolving quite rapidly. With this approach, we see Langium AI as a tool to help you build your own tooling, whilst letting you keep up with the latest and greatest.

Installation

You can install Langium AI by running:

npm i --save langium-ai

Usage

Splitting

Langium AI presents various supporting behaviors for splitting.

The simplest approach is to, of course, not split at all. For smaller DSL programs this may be perfectly viable, but in all likelihood you're reading this to handle medium to large programs -- or a large quantity of smaller programs with overlapping constructs.

In most cases you can split by specific AST nodes. This will map directly to those types that are generated by your Langium grammar rules, and makes it easy to mark how you want to delineate.

Evaluation

Regardless of how you've sourced your model, you'll need a metric for determining the quality of your output.

For Langium DSLs, we provide an series of evaluator utilities to help in assessing the correctness of DSL output.

It's important to point out that evaluations are not tests, instead this is more similar to OpenAI's evals framework. The idea is that we're grading or scoring outputs with regards to an expected output from a known input. This is a simple but effective approach to determining if your model is generally doing what you expect it to in a structured way, and not doing something else as well.

Take the following evaluator for example. Let's assume you have Ollama running locally, and the ollama-js package installed. From a given base model you can define evaluatiosn like so.

import { Evaluator, EvaluatorScore } from 'langium-ai/evaluator';
import ollama from 'ollama';

// get your language's services
const services = createMyDSLServices(EmptyFileSystem).MyDSL;

// define an evaluator using your language's services
// this effectively uses your existing parser & validations to 'grade' the response
const evaluator = new LangiumEvaluator(services);

// make some prompt
const response = await ollama.chat({
    'llama3.2',
    [{
        role: 'user',
        content: 'Write me a hello world program written in MyDSL.'
    }]
});

const es: EvaluatorScore = evaluator.evaluate(response.message.content);

// print out your score!
console.log(es);

You can also define custom evaluators that are more tuned to the needs of your DSL. This could be handling diagnostics in a very specific fashion, extracting code out of the response itself to check, using an evaluation model to grade the response, or using a combination of techniques to get a more accurate score for your model's output.

In general we stick to focusing on what Langium can do to help with evaluation, but leave the opportunity open for you to extend, supplement, or modify evaluation logic as you see fit.

Examples

What this project also includes are some helpful examples:

An example of using Evaluation for a local model w/ ChromaDB
An example of using Splitting along with ChromaDB
An example of using Splitting along with LLamaIndex
An example VSCode Extension providing a simple chatbot interface for a Langium DSL

Contributing

If you want to help feel free to open an issue or a PR. As a general note we're open to accept changes that focus on improving how we support agent development for Langium DSLs, but we don't plan on integrating anything beyond that. I.e we don't want to provide explicit bindings to llamaindex, ollama, langchain, or other frameworks. Similarly we don't plan to provide direct bindings for OpenAI and Anthropic. It's not that we don't think these are excellent frameworks or providers (they are!), but we want to keep Langium AI focused as a tool for building agent development tooling. To this end we may provide integration examples, but no direct bindings.

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme