vad-web

v0.6.1

Published

10 days ago

Voice activity detector (VAD) for the browser

Downloads

1,120

0High
0Medium
0Low

ocavue

vad-web

An enterprise-grade Voice Activity Detection (VAD) library for the browser.

It is based on the Silero VAD model and Transformers.js.

Online demo

https://vad-web.vercel.app

source code

Installation

npm install vad-web

Usage

Call recordAudio to start recording audio and get a dispose function. Under the hood, it will run the Silero VAD model in a web worker to avoid blocking the main thread.

import { recordAudio } from 'vad-web'

const dispose = await recordAudio({
  onSpeechStart: () => {
    console.log('Speech detected')
  },
  onSpeechEnd: () => {
    console.log('Silence detected')
  },
  onSpeechAvailable: ({ audioData, sampleRate, startTime, endTime }) => {
    console.log(`Audio received with duration ${endTime - startTime}ms`)
    // Further processing can be done here
  }
})

API Reference

recordAudio #

function recordAudio(options: RecordAudioOptions): Promise<DisposeFunction>

Records audio from the microphone and calls the onAudioData callback with the audio data.

Returns

A function to dispose of the audio recorder.

RecordAudioOptions #

onSpeechStart?: () => void

A function that will be called when a speech is detected.

onSpeechEnd?: () => void

A function that will be called when a silence is detected.

onSpeechAvailable?: (data: SpeechData) => void

A function that will be called when speech audio data is available.

readAudio #

function readAudio(options: ReadAudioOptions): Promise<DisposeFunction>

Reads audio data from an ArrayBuffer and calls the onAudioData callback with the audio data.

Returns

A function to dispose of the audio reader.

ReadAudioOptions #

audioData: ArrayBuffer

Audio file data contained in an ArrayBuffer that is loaded from fetch(), XMLHttpRequest, or FileReader.

realTime?: boolean

If true, simulates real-time processing by adding delays to match the audio duration.

Default: false

onSpeechStart?: () => void

A function that will be called when a speech is detected.

onSpeechEnd?: () => void

A function that will be called when a silence is detected.

onSpeechAvailable?: (data: SpeechData) => void

A function that will be called when speech audio data is available.

SpeechData #

An object representing speech data.

startTime: number

A timestamp in milliseconds

endTime: number

A timestamp in milliseconds

audioData: Float32Array<ArrayBufferLike>

The audio data

sampleRate: number

The sample rate of the audio data

DisposeFunction #

A function that should be called to stop the recording or recognition session.

Type: () => Promise<void>

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

vad-web

Online demo

Installation

Usage

API Reference

recordAudio #

RecordAudioOptions #

readAudio #

ReadAudioOptions #

SpeechData #

DisposeFunction #