llm-distillery
v1.2.0
Published
Use LLMs to run map-reduce summarization tasks on large documents until a target token size is met.
Downloads
45
Maintainers
Readme
🍶 LLM Distillery
Use LLMs to distill large texts down to a manageable size by utilizing a map-reduce approach. This ensures that the text fits within a specified token limit, which is crucial when interfacing with Large Language Models (LLMs) in downstreams tasks.
Features
- Text Distillation: Reduces the size of text based on token count without losing the essence of the content.
- Chunking and Summarization: Uses the
semantic-chunking
library to intelligently split text into manageable chunks that are then summarized. - Customizable Parameters: Allows fine-tuning of various parameters like target token size, API base URL, and chunking thresholds.
Getting Started
Prerequisites
- Node.js installed on your system.
- An API key for running inference of OpenAI API compatible LLM models (together.ai, etc.).
Installation
Add this lib to your code page via npm install
npm install llm-distillery
Basic Usage
The llmDistillery
function can be imported and used in your Node.js applications as follows:
import { llmDistillery } from 'llm-distillery';
const text = "Your long text here...";
const options = {
targetTokenSize: 2048, // adjust as needed
baseUrl: "<openai-api-compatible-url-endpoint>" // example: https://api.together.xyz/v1
apiKey: "<your_llm_api_key>",
llmModel: "<llm_model>", // example: meta-llama/Llama-3-70b-chat-hf (Llama 3 model name on together.ai)
stopTokens: ["<|eot_id|>"], // stop tokens for Llama 3
logging: true // set to true for verbose logging
};
llmDistillery(text, options)
.then(processedText => console.log(processedText))
.catch(error => console.error(error));
Options Object Parameters
targetTokenSize
: Desired token size limit for distilled text. (default:2048
)baseUrl
: The base URL for the OpenAI API compatible endpoint. (default:"https://api.together.xyz/v1"
)apiKey
: Your API key for accessing LLM endpoint.llmModel
: The model identifier of the LLM for your chosen endpoint. (default:"meta-llama/Llama-3-70b-chat-hf"
on together.ai)stopTokens
: Array representing stopping tokens for LLM responses based on your chosen model. (default["<|eot_id|>"]
)maxDistillationLoops
: Maximum number of iterations while running distillation (default:5
)tokenizerModel
: Tokenizer model used to calculate token sizes. (See table below for options; default"Xenova/paraphrase-multilingual-MiniLM-L12-v2"
)semanticEmbeddingModel
: Semantic embedding model used to calculate text similarity. (See https://github.com/jparkerweb/semantic-chunking?tab=readme-ov-file#curated-onnx-embedding-models for options; default"Xenova/paraphrase-multilingual-MiniLM-L12-v2"
)semanticEmbeddingModelQuantized
: Whether to use the quantized version of the embedding model. (defaulttrue
)modelCacheDir
: Directory to cache models in. (defaultnull
; set to a string for a custom cache dir, example:"models/"
)chunkingThreshold
: Threshold for segmenting text into chunks for summarization and distillation. Can be a number between 0 and 1. A lower number will result in greater distillation for each iteration, and will be faster. (default.25
)llmContextLength
: Context length for the large language model (LLM) you are using. It denotes the maximum number of tokens the LLM can accept when generating chunk summaries. (default4096
, but most LLM's have larger default windows. Llama 3's context window is 8k),llmMaxGenLength
: Maximum generation length for the large language model (LLM) you are using. It denotes the maximum number of tokens the LLM can generate in a single response. (default2048
),llmApiRateLimit
: Delay in milliseconds between API calls to your chosen LLM provider. This helps to manage the rate at which requests are sent, ensuring that your application does not overload the service or exceed usage policies. (default500
; set to 0 to disable)logging
: Enable logging to monitor the various stages of distillation, compression percentages of the original text, etc. (defaultfalse
)
Tokenizer Models
| model name | |----------------------------------------------| | Xenova/all-MiniLM-L6-v2 | | Xenova/paraphrase-multilingual-MiniLM-L12-v2 | | Xenova/bert-base-uncased | | Xenova/gpt2 | | Xenova/roberta-base | | Xenova/all-distilroberta-v1 | | Xenova/multilingual-e5-large | | Xenova/bert-base-multilingual-uncased | | Xenova/xlm-roberta-base | | BAAI/bge-base-en-v1.5 |
NOTE 🚨 The initial run of llm-distillery
might take a moment as the Tokenizer Model will be downloaded and saved to this package's cache directory.