web-vad

v0.0.6

Published

5 months ago

Web Voice Activity Detection (VAD)

Downloads

246

0High
0Medium
0Low

makeitmore

speech-recognition wake-word-detection machine-learning voice-activity-detection vad onnx voice-control offline-speech-recognition

Web Voice Activity Detection (VAD)

Adaption of @ricky0123's vad library that slightly shifts the API to only support passing a media stream, addresses some Typescript issues and reduces the codebase where possible. The primary purpose of this adaption is to support realtime voice agents, such as those provided by Pipecat.

Getting started

npm install onnxruntime-web web-vad

Copy Silero model somewhere accessible

Ensure silero_vad.onnx (included in this repo here) is hosted somewhere accessible (e.g. a public / static path.)

Ensure audio worker is available globally

Browsers ensure worklets cannot be imported as modules for safety reasons. Either import it with your framework specific syntax (e.g. import AudioWorkletURL from "web-vad/dist/worklet.js?worker&url";) or include it manually in a <script> declaration (at a higher order.)

Example project

An barebones example is included in this repo:

cd test-site
yarn
yarn run build # Copies onnx wasm to dist directory
yarn run dev

Navigate to the URL shown in your terminal

Usage

import { VAD } from "web-vad";
import AudioWorkletURL from "web-vad/dist/worklet.js?worker&url";


const localAudioTrack = ... // Get mic or other audio track
const stream = new MediaStream([localAudioTrack!]);

const vad = new VAD({
    workletURL: AudioWorkletURL,
    modelUrl: "path-to-silero.onnx",
    stream,
    onSpeechStart: () => {
        console.log("speaking start");
    },
    onVADMisfire: () => {
        console.log("misfire");
    },
    onSpeechEnd: () => {
        console.log("speaking end");
    },
});

// Initalize and load models
await vad.init();

// Start when ready
vad.start();

console.log(vad.state); 
// > VADState.listening

Next / Vite support

Web VAD uses WASM files provided by ONNX. Whilst these can be loaded at runtime, it is recommended to copy these files to your build / deployment. Here is an example vite.config.js that copies these files across at build time:

// vite.config.js

export default defineConfig({
  assetsInclude: ["**/*.onnx"],
  server: {
    headers: {
      "Cross-Origin-Embedder-Policy": "require-corp",
      "Cross-Origin-Opener-Policy": "same-origin",
    },
  },
  resolve: {
    alias: {
      "@": path.resolve(__dirname, "./src"),
    },
  },
  plugins: [
    viteStaticCopy({
      targets: [
        {
          src: "node_modules/onnxruntime-web/dist/*.wasm",
          dest: "./",
        },
      ],
    }),
  ],
});

Precaching models

Both the Silero.onnx and ONNX runtime wasms are quite large in size (~10mb). The VAD class exposes a static method for precaching these:

import {VAD} from "web-vad";

async function run() {
  console.log("Precaching models");
  await VAD.precacheModels("/silero-vad.onnx");
  console.log("Download complete!");
  
  //...start()
}

References

[1] Silero Team. (2021). Silero VAD: pre-trained enterprise-grade Voice Activity Detector (VAD), Number Detector and Language Classifier. GitHub, GitHub repository, https://github.com/snakers4/silero-vad, [email protected].

[2] Ricky Samore. Original code, https://github.com/ricky0123/vad, [email protected]

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

Web Voice Activity Detection (VAD)

Getting started

Copy Silero model somewhere accessible

Ensure audio worker is available globally

Example project

Usage

Next / Vite support

Precaching models

References