easy-emebddings

v1.0.0

Published

5 months ago

Easy, fast and WASM/WebGPU accelerated vector embedding for the web platform. Locally via ONNX/Transformers.js and via API. Compatible with Browsers, Workers, Web Extensions, Node.js & co.

Downloads

0High
0Medium
0Low

kyr0

vector embeddings web gpu simd transformers bert openai voyage mixedbread dataset api fast easy wasm webassembly n-dimensional

easy-embeddings

Easy vector embeddings for the web platform. Use open source embedding models locally or an API (OpenAI, Voyage, Mixedbread).

🔥 Please note: This project relies on the currently unreleased V3 branch of @xenova/transformers.js combined with a patched, development version of the onnxruntime-web to enable the latest, bleeding edge features (WebGPU and WASM acceleration) alongside unparalleled compatibility (even works in Web Extensions Service Workers).

📚 Install

npm/yarn/bun install easy-embeddings

⚡ Use

Remote inference (call an API)

Single text vector embedding

import { embed } from "easy-embeddings";

// single embedding, german embedding model
const embedding: EmbeddingResponse = await embed("Hallo, Welt!", "mixedbread-ai", {
  model: "mixedbread-ai/deepset-mxbai-embed-de-large-v1",
  normalized: true,
  dimensions: 512,
}, { apiKey: import.meta.env[`mixedbread-ai_api_key`] })

Multi-text vector embeddings

import { embed } from "easy-embeddings";

// single embedding, german embedding model
const embedding: EmbeddingResponse = await embed(["Hello", "World"], "openai", {
  model: "text-embedding-3-small"
}, { apiKey: import.meta.env[`openai_api_key`] })

Local inference

import { embed } from "easy-embeddings";

// single embedding, german embedding model
const embedResult = await embed(
  ["query: Foo", "passage: Bar"],
  "local",
  {
    // https://huggingface.co/intfloat/multilingual-e5-small
    model: "Xenova/multilingual-e5-small",
    modelParams: {
      pooling: "mean",
      normalize: true, // so a single dot product of two vectors is enough to calculate a similarity score
      quantize: true, // use a quantized variant (more efficient, little less accurate)
    },
  },
  {
    modelOptions: {
      hideOnnxWarnings: false, // show warnings as errors in case ONNX runtime has a bad time
      allowRemoteModels: false, // do not download remote models from huggingface.co
      allowLocalModels: true,
      localModelPath: "/models", // loads the model from public dir subfolder "models"
      onnxProxy: false,
    },
  },
);

Advanced: Using a custom WASM runtime loader

import { embed } from "easy-embeddings";
// @ts-ignore
import getModule from "./public/ort-wasm-simd-threaded.jsep";

// single embedding, german embedding model
const embedResult = await embed(
  ["query: Foo", "passage: Bar"],
  "local",
  {
    // https://huggingface.co/intfloat/multilingual-e5-small
    model: "Xenova/multilingual-e5-small",
    modelParams: {
      pooling: "mean",
      normalize: true, // so a single dot product of two vectors is enough to calculate a similarity score
      quantize: true, // use a quantized variant (more efficient, little less accurate)
    },
  },
  {
    importWasmModule:  async (
      _mjsPathOverride: string,
      _wasmPrefixOverride: string,
      _threading: boolean,
    ) => {
      return [
        undefined,
        async (moduleArgs = {}) => {
          return await getModule(moduleArgs);
        },
      ];
    },
    modelOptions: {
      hideOnnxWarnings: false, // show warnings as errors in case ONNX runtime has a bad time
      allowRemoteModels: false, // do not download remote models from huggingface.co
      allowLocalModels: true,
      localModelPath: "/models", // loads the model from public dir subfolder "models"
      onnxProxy: false,
    },
  },
);

Download models locally

You might want to write and execute a script to manually download a model locally:

import { downloadModel } from "easy-embeddings/tools";

// downloads the model into the models folder
await downloadModel('Xenova/multilingual-e5-small', 'public/models')

Help improve this project!

Setup

Clone this repo, install the dependencies (bun is recommended for speed), and run npm run test to verify the installation was successful. You may want to play with the experiments.