npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

chat-formatter

v0.3.4

Published

A package to format chat conversation for large language models (LLMs) with Nunjucks.

Downloads

281

Readme

A TypeScript Library for Formatting Chat

Chat Formatter is a TypeScript library meant to format chat history using Nunjucks templating (similar to Jinja2). It comes with templates for various models like chatML, Llama-3, Phi-3, Gemma-it, and H2O.ai's danube2 and danube3.

Chat models are trained with unique formats to convert conversation history, like:

chat = [
   {"role": "user", "content": "Hello, how are you?"},
   {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
   {"role": "user", "content": "I'd like to show off how chat templating works!"},
]

into a single tokenizable string, like:

"<s>[INST] Hello, how are you? [/INST]I'm doing great. How can I help you today?</s> [INST] I'd like to show off how chat templating works! [/INST]"

Each model requires a different format, and not following this can cause performance issues. There are Python tools for this (like this one and this one). Hugging Face tokenizers use chat_template with a Jinja template to format conversations correctly.

For those who need this in a TypeScript or JavaScript runtime, this library uses the Nunjucks templating engine, which is mostly compatible with Jinja2 and with minor tweaks, it should work seamlessly.

Installation

You can install Chat Formatter via npm:

npm install chat-formatter

Usage

Importing the Library

import { applyTemplate, Conversation } from 'chat-formatter';

Formatting a Conversation

Here's how you can format a conversation using the default template (i.e. chatML format):

const conversation: Conversation = [
  { role: 'user', content: 'Hi there!' },
  { role: 'assistant', content: 'Hello, how can I help you today?' },
  { role: 'user', content: 'Can I ask a question?'}
];

// Using the default template without a generation prompt
const formatted = await applyTemplate(conversation);
console.log(formatted);

Expected output:

<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>

and with generation prompt:

const formatted = await applyTemplate(conversation, {addGenerationPrompt: true});

Expected output:

<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>
<|im_start|>assistant

Using Different Templates (Llama-3)

You can specify a different template using the templateKey option:

const conversation: Conversation = [
  { role: 'system', content: 'System prompt.' },
  { role: 'user', content: 'Hi there!' },
  { role: 'assistant', content: 'Nice to meet you!' },
  { role: 'user', content: 'Can I ask a question?' }
];

const result = await applyTemplate(conversation, { 
  templateKey: 'llama3', 
  addGenerationPrompt: true 
});
console.log(result);

Expected output:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

System prompt.<|eot_id|><|start_header_id|>user<|end_header_id|>

Hi there!<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Nice to meet you!<|eot_id|><|start_header_id|>user<|end_header_id|>

Can I ask a question?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Using Different Templates (Danube3)

Some models, such as Danube3, do not accept system prompts. The original template of Danube3 suggest throwing an exception.

  "chat_template": "{% for message in messages %}{% if message['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% if ((message['role'] == 'user') != (loop.index0 % 2 == 0)) or ((message['role'] == 'assistant') != (loop.index0 % 2 == 1)) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if message['role'] == 'user' %}{{ '<|prompt|>' + message['content'].strip() + eos_token }}{% elif message['role'] == 'assistant' %}{{ '<|answer|>' + message['content'].strip() + eos_token }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|answer|>' }}{% endif %}",

The templates in this repo incorporate system prompts as closely as possible to what a working workaround could be.

const conversation: Conversation = [
  { role: 'system', content: 'System prompt.' },
  { role: 'user', content: 'Hi there!' },
  { role: 'assistant', content: 'Nice to meet you!' },
  { role: 'user', content: 'Can I ask a question?' }
];
const resultWithPrompt = await applyTemplate(conversation, {
  templateKey: 'danube3',
  addGenerationPrompt: true
});
console.log(resultWithPrompt);

Expected output:

'System prompt.</s><|prompt|>Hi there!</s><|answer|>Nice to meet you!</s><|prompt|>Can I ask a question?</s><|answer|>'

Using Custom Templates

Let's say this code doesn't have a template for Qwen1.5.

We need to get the chat template (most probably it is stated somewhere in Jinja format). For Qwen1.5, we will head to the model repository on HF and look at tokenizer_config.json line 31, where chat_template is defined. It is:

    "chat_template": "{% for message in messages %}{% if loop.first and messages[0]['role'] != 'system' %}{{ '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n' }}{% endif %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",

We need to make sure this is compatible with Nunjucks.

Luckily, in this case, we don't need to change anything; we are just going to use the template as is:

const template: TemplateConfig = {
  bosToken: '',
  eosToken: '<|im_end|>"',
  chatTemplate:
    "{% for message in messages %}{% if loop.first and messages[0]['role'] != 'system' %}{{ '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n' }}{% endif %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}"
};

The values of bosToken and eosToken are also taken from tokenizer_config.json. Note if either of these tokens are null, we use an emapty string.

const conversation: Conversation = [
  { role: 'user', content: 'Hi there!' },
  { role: 'assistant', content: 'Nice to meet you!' },
  { role: 'user', content: 'Can I ask a question?' }
];

const result = await applyTemplate(conversation, {
  customTemplate: template
});
console.log('result: ', result);

Expected output:

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>

Contributing

Contributions are welcome!

License

This project is licensed under the MIT License - see the LICENSE.md file for details.