npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

ad-llama

v0.2.5

Published

Structured inference with LLaMa 2 in your browser

Downloads

22

Readme

ad-llama 🦙

Use tagged template literals for structured Llama 2 inference locally in browser via mlc-llm. Runs on Chromium browsers with WebGPU support.

https://github.com/gsuuon/ad-llama/assets/6422188/d38ef8c3-8ba1-4757-a992-ea8720776780

Check out the playground, a simple dnd npc generator, or a multi-step color generator built on solid-js. There's also a documentation page -- say hi in discord!

Usage

npm install -S ad-llama

import { loadModel, ad, report } from 'ad-llama'

const loadedModel = await loadModel('Llama-2-7b-chat-hf-q4f32_1', report(console.info))
const { context, a } = ad(loadedModel)

const dm = context(
  'You are a dungeon master.',
  'Create an NPC based on the Dungeons and Dragons universe.'
)

const npc = dm`{
  "description": "${a('short description')}",
  "name": "${a('character name')}",
  "weapon": "${a('weapon')}",
  "class": "${a('primary class')}"
}`

const generatedNpc = await npc.collect(partial => {
  switch (partial.type) {
    case 'gen':
    case 'lit':
      console.info(partial.content)
      break
  }
})

console.log(generatedNpc)

For an example of more complicated usage including validation, retry logic and transforms check the hackernews who's hiring example.

Generation

Each expression in the template literal is a new prompt and options. The prompt given for each expression is added to the system and preprompt established in context, and prior completion text (literal parts and as well as inferences) are added to the end of the LLM prompt as a partially completed assistant response (i.e. after [/INST]).

Each template expression can be configured independently - you can set a different temperature, token count, max length and more. Check the TemplateExpressionOptions type in ./src/types.ts for all options. a adds the preword to the expression prompt (by default "Generate"), use prompt to provide a naked prompt or set the preword option. A plain string gets inserted as literal text, just like normal template literals.

template`{
  "name": "${characterName}",
  "description": "${a('clever description', {
    maxTokens: 1000,
    stops: ['\n']
  })}",
  "class": "${a('a primary class for the character')}"
}`

Biased Sampling

You can modify the sampler for each expression -- this allows you to adjust the logits before sampling. You can, for example, only accept number characters for one expression, while in another avoid specific strings. The main example here shows some of these options. You can build your own, but there are some helper functions exposed as sample to build samplers. The loaded model object has a bias field which configures the sampling - avoid, prefer allow you to adjust relative likelihood of certain tokens ids, while reject, accept change the logits to negative infinity for some ids (or all other ids).

import { sample } from 'ad-llama'

const model = await loadModel(...)

const { bias } = model
const { oneOf, consistsOf } = sample

const { context, a } = ad(model)
const character = context('Create a character')

character`{
  "weapon": "${a('special weapon', {
    sampler: bias.prefer(oneOf(['Nun-chucks', 'Beam Cannon']), 10),
  })}",
  "age": ${a('age', {
    sampler: bias.accept(consistsOf(['0','1','2','3','4','5','6', '7', '8', '9'])),
    maxTokens: 3
  })},
}`

As tokenizers are stateful, a simple encoding of just the provided strings won't necessarily produce tokens that would fit into the existing sequence. For example, with Llama 2's tokenizer foo and "foo" encode to completely different tokens:

encode('"foo"') === [376, 5431, 29908]
encode('foo') === [7953]
decode([5431]) === 'foo'
decode([7953]) === 'foo'

oneOf, consistsOf try to figure out relevant tokens for the provided strings within the context of the currently inferring expression.

consistsOf is for classes of characters - each sample will have the same token logits modified based on the given relevant tokens. So in consistsOf(['a','b']), every sample will have tokens for 'a' and 'b' modified.

oneOf is for strings - each sample has logits modified depending on the tokens which are still relevant given the already sampled tokens. For example, if we have oneOf(['ranger', 'wizard']) and we've already sampled 'w', the only next relevant tokens would be from 'izard'. If you want oneOf to stop at one of the choices, include the stop character (by default the next character after the expression), eg: oneOf(['ranger"', 'wizard"']).

Even though reject(oneOf(['ranger', 'wizard'])) will never make it past the first token for either of the strings, giving the entire string still lets you target the correct tokens for completing those specific strings.

Validation

You can provide an expression validation function with a retry count. If validation fails, that expression will be attempted again up to retry times, after which whatever was generated is taken. You can also transform the result of the expression generation (this happens whether validation passes or not).

validate?: {
  check?: (partial: string) => boolean
  transform?: (partial: string) => string
  retries: number
}

Vite HMR

Waiting for models to reload can be tedious, even when they're cached. ad-llama should work with vite HMR so the loaded models stay in memory. Put this in your source file to create an HMR boundary:

import { loadModel, ad, guessModelSpecFromPrebuiltId } from 'ad-llama'

+ if (import.meta.hot) { import.meta.hot.accept() }

const loadedModel = await loadModel(guessModelSpecFromPrebuiltId('Llama-2-7b-chat-hf-q4f32_1'))

Build

  • pre-reqs
    • emcc: https://emscripten.org/docs/getting_started/downloads.html
  • build the tvmjs dependency first
    git clone https://github.com/gsuuon/ad-llama.git --recursive
    cd 3rdparty/relax/web
    
    # source ~/emsdk/emsdk_env.sh
    make
    npm install
    npm run build
  • then either npm run build or npm run dev (which watches src/ and serves public/)

Motivation

I was inspired by guidance but felt that tagged template literals were a better way to express structured inference. I also think grammar based sampling is neat, and wanted to add a way to plug something like that into MLC infrastructure.

Todos

  • [ ] runs on Deno
  • [ ] can target cpu
  • [ ] repeatable subtemplates
  • [ ] template expressions can reference previously generated expressions