npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2025 – Pkg Stats / Ryan Hefner

gpt-semantic-cache

v1.0.0

Published

An NPM package for semantic caching of GPT responses using Redis and ANN.

Downloads

4

Readme

GPT Semantic Cache

An NPM package for semantic caching of GPT responses using Redis and Approximate Nearest Neighbors (ANN) search.

Table of Contents

Introduction

The GPT Semantic Cache is a Node.js package that provides a semantic caching mechanism for GPT responses. By leveraging semantic embeddings and approximate nearest neighbors search, the package efficiently caches and retrieves GPT responses based on the semantic similarity of user queries. This reduces redundant API calls to GPT models, saving time and costs, and improving response times for end-users. Queries with similar meaning are retrieved from cache saving the cost associated with an API.

Here are several areas where this can be used:

  • Technical Customer Support: Technical Support are specific and based of technical docunents so semantic caching can be used to address similar queries
  • Product Support: Responses to the online shopping products where the specifications or queries to the product is largely static -Other support based services

Features

  • Semantic Caching: Efficiently cache GPT responses based on semantic similarity.
  • Supports Multiple Embedding Sources: Use OpenAI or local models for generating embeddings.
  • Redis Integration: Utilize Redis for fast storage and retrieval of cached data.
  • Approximate Nearest Neighbors (ANN) Search: Quickly find similar queries using ANN algorithms.
  • Customizable Settings: Adjust similarity thresholds, cache TTL, and more according to your needs.

Installation

npm install gpt-semantic-cache

Quick Start

Here's a quick example to get you started:

const { SemanticGPTCache } = require('gpt-semantic-cache');

(async () => {
  const cache = new SemanticGPTCache({
    embeddingOptions: {
      type: 'openai',
      openAIApiKey: 'YOUR_OPENAI_API_KEY',
    },
    gptOptions: {
      openAIApiKey: 'YOUR_OPENAI_API_KEY',
      model: 'gpt-3.5-turbo',
    },
    cacheOptions: {
      redisUrl: 'redis://localhost:6379',
      similarityThreshold: 0.8,
      cacheTTL: 3600, // Cache Time-To-Live in seconds
      embeddingSize: 1536, // OpenAI's embedding size
    },
  });

  await cache.initialize();

  const response = await cache.query('What is the capital of France?');
  console.log(response);
})();

Usage

Initialization

To initialize the SemanticGPTCache, you need to provide configuration options for embeddings, GPT model, and caching.

const cache = new SemanticGPTCache({
  embeddingOptions: {
    type: 'local', // 'openai' or 'local'
    modelName: 'sentence-transformers/all-MiniLM-L6-v2', // Only for local models
    openAIApiKey: 'YOUR_OPENAI_API_KEY', // Only for OpenAI embeddings
  },
  gptOptions: {
    openAIApiKey: 'YOUR_OPENAI_API_KEY',
    model: 'gpt-3.5-turbo', // GPT model to use to query gpt if cache misses
    promptPrefix: 'You are an AI assistant.',
  },
  cacheOptions: {
    redisUrl: 'redis://localhost:6379',
    similarityThreshold: 0.8, // Cosine similarity threshold for cache hits
    cacheTTL: 3600, // Time-to-live for cache entries in seconds
    embeddingSize: 384, // Embedding size (384 for local models, 1536 for OpenAI)
  },
});

await cache.initialize();

Initialization Options Explained:

  • embeddingOptions:

    • type: 'openai' or 'local'. Specifies the source of embeddings.
    • modelName: The name of the local embedding model to use (e.g., 'sentence-transformers/all-MiniLM-L6-v2').
    • openAIApiKey: Your OpenAI API key (required if type is 'openai').
  • gptOptions:

    • openAIApiKey: Your OpenAI API key for accessing the GPT model.
    • model: The GPT model to use (e.g., 'gpt-3.5-turbo') in case of cache miss.
    • promptPrefix: An optional string to prepend to every prompt sent to the GPT model.
  • cacheOptions:

    • redisUrl: The URL of your Redis instance (e.g., 'redis://localhost:6379').
    • similarityThreshold: A number between 0 and 1 representing the cosine similarity threshold for cache hits.
    • cacheTTL: The time-to-live for cache entries in seconds.
    • embeddingSize: The dimensionality of the embeddings used (e.g., 384 for local models, 1536 for OpenAI).

Querying

To query the cache and get a response:

const response = await cache.query('Your query here', 'Additional context if any');
console.log(response);
  • If a similar query exists in the cache (based on the similarity threshold), the cached response is returned.
  • If no similar query is found, the GPT API is called, and the response is cached for future queries.

Configuration Options

The package allows you to customize various settings to fit your needs:

  • Similarity Threshold: Adjust the similarityThreshold in cacheOptions to control how similar a query needs to be to hit the cache. A higher threshold means only very similar queries will hit the cache.

  • Cache Time-To-Live (TTL): Set cacheTTL to control how long entries remain in the cache.

  • Embedding Size: Ensure embeddingSize matches the size of embeddings produced by your chosen embedding model.

Science Behind the Package

Semantic Embeddings

Semantic embeddings are vector representations of text that capture the meaning and context of the text. By converting both user queries and cached queries into embeddings, we can compare them in a high-dimensional space to find semantic similarities.

Approximate Nearest Neighbors Search

To efficiently find similar embeddings in the cache, the package uses the Hierarchical Navigable Small World (HNSW) algorithm for Approximate Nearest Neighbors search. HNSW constructs a graph of embeddings that allows for fast retrieval of nearest neighbors without comparing the query against every cached embedding.

Cosine Similarity

Cosine similarity measures the cosine of the angle between two vectors in a multidimensional space. It is a commonly used metric to determine how similar two embeddings are. In this package, after retrieving the nearest neighbors using ANN search, cosine similarity is computed to ensure the retrieved embeddings meet the specified similarity threshold.

Caching Mechanism

The caching mechanism works as follows:

  1. Embedding Generation: When a query is received, it's converted into an embedding using the specified embedding model.

  2. ANN Search: The embedding is used to search the ANN index for similar embeddings.

  3. Similarity Check: Retrieved embeddings are compared using cosine similarity to ensure they meet the similarity threshold.

  4. Cache Hit or Miss:

    • Cache Hit: If a similar embedding is found, the associated response is retrieved from Redis and returned.
    • Cache Miss: If no similar embedding is found, the query is sent to the GPT API. The response is then cached along with the embedding for future queries.

Examples

Using a Local Embedding Model

const cache = new SemanticGPTCache({
  embeddingOptions: {
    type: 'local',
    modelName: 'sentence-transformers/all-MiniLM-L6-v2',
  },
  gptOptions: {
    openAIApiKey: 'YOUR_OPENAI_API_KEY',
    model: 'gpt-3.5-turbo',
  },
  cacheOptions: {
    redisUrl: 'redis://localhost:6379',
    similarityThreshold: 0.75,
    cacheTTL: 7200, // 2 hours
    embeddingSize: 384, // For MiniLM model
  },
});

await cache.initialize();

const response = await cache.query('Tell me a joke.');
console.log(response);

Adjusting Similarity Threshold

You can adjust the similarityThreshold to control cache sensitivity:

// Higher threshold - only very similar queries will hit the cache
cache.cacheOptions.similarityThreshold = 0.9;

// Lower threshold - more queries will hit the cache, but responses may be less relevant
cache.cacheOptions.similarityThreshold = 0.6;

License

This project is licensed under the MIT License.