npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

web-voice-detection

v1.0.6

Published

A WebAssembly-powered Voice Activity Detection library for the browser.

Downloads

489

Readme

Web Voice Detection

Web Voice Detection

This project demonstrates real-time voice activity detection in a web browser using a pre-trained ONNX model with WebAssembly. It captures audio from the user's microphone, processes it to identify speech segments, and provides callbacks for speech start and end events, among some other potentially useful data.

Please feel free to ask me questions about it (or to tweet me memes) on X @TheCodeTherapy

Live Demo

You can check the demo running live here.

The Live Demo app repository can be found here.

Features

  • Real-time voice detection using a pre-trained ONNX model.
  • Real-time FFT data to generate audio visualizers.
  • Customizable audio constraints and Detection parameters.
  • Callbacks for speech start, speech end, and misfire events.
  • Integration with Web Audio API for audio processing.

Usage

  • Install the package
npm install web-voice-detection
  • Usage example
const detection = await Detect.new({
  onSpeechStart: () => {
    statusDiv.textContent = "Speech detected!";
  },
  onSpeechEnd: (arr: Float32Array) => {
    statusDiv.textContent = "Speech ended.";

    // uses provided util to encode WAV from the Float32Array
    const wavBuffer = utils.encodeWAV(arr);
    // converts array buffer to base64 string
    const base64 = utils.arrayBufferToBase64(wavBuffer);
    // converts to base64 data URL
    const url = `data:audio/wav;base64,${base64}`;
    // do whatever you want with the wav audio url
    appendAudioElement(url);
  },
  onMisfire: () => {
    statusDiv.textContent = "Misfire!";
  },
  onFFTProcessed: (fftData) => {
    // you can use the FFT data to draw a visualizer
  },
  fftSize: 1024, // whatever reasonable size you want
});

Configuration

You can customize the behavior of Detect using various options. Refer to the RealTimeDetectionOptions type definition for a complete list of available options. Some key options include:

  • onFrameProcessed: Callback function that receives audio frame data with the Detection probabilities as the follwing object: { notSpeech: number, isSpeech: number }.

  • onFFTProcessed: Callback function that receives the audio FFT array based on the fftSize option passed to the constructor.

  • onSpeechStart: Callback function triggered when speech starts.

  • onSpeechEnd: Callback function triggered when speech ends.

  • onMisfire: Callback function triggered if a speech start is detected but the segment is too short.

  • frameSamples: Number of audio samples per frame (default: 1536).

  • positiveSpeechThreshold: Probability threshold for detecting speech (default: 0.5).

  • negativeSpeechThreshold: Probability threshold for detecting non-speech (default: 0.35).

Diving into the source code

To check the example code running on your browser from source locally:

git clone https://github.com/TheCodeTherapy/web-voice-detection.git

cd web-voice-detection

nvm install $(cat .nvmrc)

npm install

npm run watch:example

Examples

The example directory contains a basic example demonstrating how to use the Detect class.

You can also check a the demo repository that consumes this library as an npm package here.

License

This project is licensed under the MIT License