vad-web
v0.6.1
Published
Voice activity detector (VAD) for the browser
Downloads
1,120
Readme
vad-web
An enterprise-grade Voice Activity Detection (VAD) library for the browser.
It is based on the Silero VAD model and Transformers.js.
Online demo
https://vad-web.vercel.app
Installation
npm install vad-web
Usage
Call recordAudio
to start recording audio and get a dispose function. Under
the hood, it will run the Silero
VAD model in a web worker to avoid
blocking the main thread.
import { recordAudio } from 'vad-web'
const dispose = await recordAudio({
onSpeechStart: () => {
console.log('Speech detected')
},
onSpeechEnd: () => {
console.log('Silence detected')
},
onSpeechAvailable: ({ audioData, sampleRate, startTime, endTime }) => {
console.log(`Audio received with duration ${endTime - startTime}ms`)
// Further processing can be done here
}
})
API Reference
recordAudio #
function recordAudio(options: RecordAudioOptions): Promise<DisposeFunction>
Records audio from the microphone and calls the onAudioData
callback with the audio data.
Returns
A function to dispose of the audio recorder.
RecordAudioOptions #
onSpeechStart?: () => void
A function that will be called when a speech is detected.
onSpeechEnd?: () => void
A function that will be called when a silence is detected.
onSpeechAvailable?: (data: SpeechData) => void
A function that will be called when speech audio data is available.
readAudio #
function readAudio(options: ReadAudioOptions): Promise<DisposeFunction>
Reads audio data from an ArrayBuffer and calls the onAudioData
callback with the audio data.
Returns
A function to dispose of the audio reader.
ReadAudioOptions #
audioData: ArrayBuffer
Audio file data contained in an ArrayBuffer that is loaded from fetch(), XMLHttpRequest, or FileReader.
realTime?: boolean
If true, simulates real-time processing by adding delays to match the audio duration.
Default: false
onSpeechStart?: () => void
A function that will be called when a speech is detected.
onSpeechEnd?: () => void
A function that will be called when a silence is detected.
onSpeechAvailable?: (data: SpeechData) => void
A function that will be called when speech audio data is available.
SpeechData #
An object representing speech data.
startTime: number
A timestamp in milliseconds
endTime: number
A timestamp in milliseconds
audioData: Float32Array<ArrayBufferLike>
The audio data
sampleRate: number
The sample rate of the audio data
DisposeFunction #
A function that should be called to stop the recording or recognition session.
Type: () => Promise<void>