npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

vox-sdk

v0.0.7

Published

VoxSDK is a comprehensive toolkit designed to facilitate easy integration of AI-driven speech recognition and synthesis into your applications. With a focus on simplicity and efficiency, VoxSDK offers a set of React hooks and utilities to seamlessly conne

Downloads

105

Readme

VoxSDK

VoxSDK is a comprehensive toolkit designed to facilitate easy integration of AI-driven speech recognition and synthesis into your applications. With a focus on simplicity and efficiency, VoxSDK offers a set of React hooks and utilities to seamlessly connect with AI services for voice interactions.

Features

  • VoxProvider: A context provider to encapsulate the SDK's functionalities and make them accessible throughout your React application.
  • useListen: A hook to capture and transcribe user speech in real-time.
  • useSpeak: A hook for text-to-speech functionality, converting text responses into natural-sounding speech.

Installation

Install VoxSDK using npm:

npm install vox-sdk

Or using yarn:

yarn add vox-sdk

Install tslib.

Using npm

npm install tslib --save-dev

Using yarn

yarn add tslib -D

Setup

Server Setup

  • On your server, you will need to create a GET endpoint at /token.

  • Using the speech_key and region, you will generate an authorization token from Microsoft's APIs.

  • Set these values in the .env file as SPEECH_KEY and SPEECH_REGION.

  • The /token endpoint should return the following response:.

      {
        token:string,
        region:string
      }
  • Here's a sample implementation of the /token endpoint.

import express from "express";
import cors from "cors";
import "dotenv/config";
import axios from "axios";

const app = express();

app.use(
  cors({
    origin: process.env.FRONTEND_URL,
  })
);

let token = null;
const speechKey = process.env.SPEECH_KEY;
const speechRegion = process.env.SPEECH_REGION;

const getToken = async () => {
  try {
    const headers = {
      headers: {
        "Ocp-Apim-Subscription-Key": speechKey,
        "Content-Type": "application/x-www-form-urlencoded",
      },
    };

    const tokenResponse = await axios.post(`https://${speechRegion}.api.cognitive.microsoft.com/sts/v1.0/issueToken`, null, headers);

    token = tokenResponse.data;
  } catch (error) {
    console.error("Error while getting token:", error);
  }
};

app.get("/token", async (req, res) => {
  try {
    res.setHeader("Content-Type", "application/json");

    // When client asks for refresh token
    const refreshTheToken = req.query?.refresh;

    if (!token || refreshTheToken) {
      await getToken();
    }

    res.send({
      token: token,
      region: speechRegion,
    });
  } catch (error) {
    console.error("Error while handling /token request:", error);
    res.status(500).send({ error: "An error occurred while processing your request." });
  }
});

app.listen(8080, () => console.log("Server running on port 8080"));

Client Setup

  • Wrap your application with VoxProvider to make the SDK available throughout your app:

    import { VoxProvider } from "vox-sdk";
    
    function App() {
      return <VoxProvider>{/* Your app components go here */}</VoxProvider>;
    }
    
    export default App;
  • VoxProvider expects config object which includes,

    1. baseUrl : url to your backend. e.g. : https://exampleapp.com, Ensure that the /token route serves the token and region..

    2. OnAuthRefresh : A callback function that is invoked when any authentication error occurs or the token expires.

    3. headersForBaseUrl : Option to pass baseUrl Config.

  • Here's the implmentation of the above two step.

    <VoxProvider
      config={{
        baseUrl: "https://exampleapp.com",
        onAuthRefresh: async () => {
          const { data } = await axios.get("https://exampleapp.com/token?refresh=true");
          return { token: data.token, region: data.region };
        },
        headersForBaseUrl: {
          //... Bearer Authentication token or other config
        },
      }}
    >
      <App />
    </VoxProvider>
  • The onAuthRefresh callback will refresh the token and return it with the region.

  • For more details you can visit here sample app implementation

Usage

Using useListen Hook

After setting up the Server and VoxProvider we are ready to use useListen and useSpeak.

Integrate speech-to-text functionality in your components:

import { useListen } from "vox-sdk";
import React from "react";
const SpeechToText = () => {
  const { answers, loading, startSpeechRecognition, stopSpeechRecognition } = useListen({
    onEndOfSpeech: () => {
      console.log(answers);
    },
    automatedEnd: true,
    delay: 1000,
  });
  return (
    <>
      <button disabled={loading} onClick={startSpeechRecognition}>
        Start Litsening
      </button>
      <button onClick={stopSpeechRecognition}> Stop Listening</button>
    </>
  );
};

export default SpeechToText;

useListen hook expects following parameters.

  1. automatedEnd :

    • Expects a boolean value, default is true.
    • When the user finishes speaking, the hook will automatically start the speech-to-text conversion.
    • To listen continuously until the user clicks stopSpeechRecognition, pass false.
  2. delay :

    • Expects a value in milliseconds.
    • This is the debounce duration for listening to the user.
    • The default is set to 2000ms.
  3. onEndOfSpeech :

    • Expects a callback function that is invoked when speech ends.

useListen Hook Returns.

  1. startSpeechRecognition : Function to start speech recognition.
  2. stopSpeechRecognition : Function to stop speech recognition.
  3. answers : Returns an array of strings containing all the transcribed text.
  4. answer : The last transcribed text.
  5. recognizerRef : An instance of microsoft-cognitiveservices-speech-sdk.

Using useSpeak Hook

Implement text-to-speech in your application:


import React from "react";
import { useState } from "react";
import { useSpeak, SpeechVoices } from "vox-sdk";
const TextToSpeech = () => {
  const [text, setText] = useState("");
  const { interruptSpeech, speak, isSpeaking } = useSpeak({
  
    onEnd: () => {
      console.log("Spech ended");
    },
    shouldCallOnEnd: true,
    throttleDelay: 1000,

    voice: SpeechVoices.enUSAIGenerate1Neural, // AI Voices

  });

  return (
    <>
      <h3>Text To Speech</h3>
      <input type="text" onChange={(e) => setText(e.target.value)} value={text} />
      <button
        onClick={() => {
          speak(text);
        }}
        disabled={isSpeaking}
      >
        Start Speaking
      </button>
      <button
        disabled={!isSpeaking}
        onClick={() => {
          interruptSpeech();
        }}
      >
        Stop Speaking
      </button>
    </>
  );
};

export default TextToSpeech;

useSpeak hook expects following parameters.

  1. voice :

    • Expects a string value.

    • Choose your preferred AI voice from Microsoft Azure.

    • Here's the list of available voices.

      export enum SpeechVoices {
        // Arabic
        arAEFatimaNeural = "ar-AE-FatimaNeural",
        arBHAliNeural = "ar-BH-AliNeural",
        arEGSalmaNeural = "ar-EG-SalmaNeural",
        arJOTaimNeural = "ar-JO-TaimNeural",
        arKWFahedNeural = "ar-KW-FahedNeural",
        arLYImanNeural = "ar-LY-ImanNeural",
        arQAAmalNeural = "ar-QA-AmalNeural",
        arSAHamedNeural = "ar-SA-HamedNeural",
        arSYAmanyNeural = "ar-SY-AmanyNeural",
        arTNHediNeural = "ar-TN-HediNeural",
        arYEMaryamNeural = "ar-YE-MaryamNeural",
      
        // Chinese
        zhCNXiaoxiaoNeural = "zh-CN-XiaoxiaoNeural",
        zhCNYunxiNeural = "zh-CN-YunxiNeural",
        zhCNYunyeNeural = "zh-CN-YunyeNeural",
        zhHKHiuGaaiNeural = "zh-HK-HiuGaaiNeural",
        zhHKHiuMaanNeural = "zh-HK-HiuMaanNeural",
        zhTWHsiaoChenNeural = "zh-TW-HsiaoChenNeural",
        zhTWHsiaoYuNeural = "zh-TW-HsiaoYuNeural",
      
        // Danish
        daDKChristelNeural = "da-DK-ChristelNeural",
        daDKJeppeNeural = "da-DK-JeppeNeural",
      
        // Dutch
        nlBEArnaudNeural = "nl-BE-ArnaudNeural",
        nlBEDenaNeural = "nl-BE-DenaNeural",
        nlNLColetteNeural = "nl-NL-ColetteNeural",
        nlNLFennaNeural = "nl-NL-FennaNeural",
      
        // English (Australia)
        enAUNatashaNeural = "en-AU-NatashaNeural",
        enAUWilliamNeural = "en-AU-WilliamNeural",
      
        // English (Canada)
        enCAClaraNeural = "en-CA-ClaraNeural",
        enCALiamNeural = "en-CA-LiamNeural",
      
        // English (India)
        enINNeerjaNeural = "en-IN-NeerjaNeural",
        enINPrabhatNeural = "en-IN-PrabhatNeural",
      
        // English (UK)
        enGBLibbyNeural = "en-GB-LibbyNeural",
        enGBRyanNeural = "en-GB-RyanNeural",
      
        // English (US)
        enUSAIGenerate1Neural = "en-US-AIGenerate1Neural",
        enUSAmberNeural = "en-US-AmberNeural",
        enUSAriaNeural = "en-US-AriaNeural",
        enUSAshleyNeural = "en-US-AshleyNeural",
        enUSBrandonNeural = "en-US-BrandonNeural",
        enUSChristopherNeural = "en-US-ChristopherNeural",
        enUSCoraNeural = "en-US-CoraNeural",
        enUSDavisNeural = "en-US-DavisNeural",
        enUSElizabethNeural = "en-US-ElizabethNeural",
        enUSEricNeural = "en-US-EricNeural",
        enUSGuyNeural = "en-US-GuyNeural",
        enUSJacobNeural = "en-US-JacobNeural",
        enUSJasonNeural = "en-US-JasonNeural",
        enUSJennyNeural = "en-US-JennyNeural",
        enUSMichelleNeural = "en-US-MichelleNeural",
        enUSMonicaNeural = "en-US-MonicaNeural",
        enUSSaraNeural = "en-US-SaraNeural",
        enUSTonyNeural = "en-US-TonyNeural",
      
        // Finnish
        fiFINooraNeural = "fi-FI-NooraNeural",
        fiFISelmaNeural = "fi-FI-SelmaNeural",
      
        // French (Canada)
        frCADiegoNeural = "fr-CA-DiegoNeural",
        frCAFelixNeural = "fr-CA-FelixNeural",
        frCAJeanNeural = "fr-CA-JeanNeural",
        frCASylvieNeural = "fr-CA-SylvieNeural",
      
        // French (France)
        frFRDeniseNeural = "fr-FR-DeniseNeural",
        frFREloiseNeural = "fr-FR-EloiseNeural",
        frFRHenriNeural = "fr-FR-HenriNeural",
      
        // German
        deDEKatjaNeural = "de-DE-KatjaNeural",
        deDEKillianNeural = "de-DE-KillianNeural",
      
        // Greek
        elGRAthinaNeural = "el-GR-AthinaNeural",
        elGRNestorasNeural = "el-GR-NestorasNeural",
      
        // Hindi
        hiINMadhurNeural = "hi-IN-MadhurNeural",
        hiINSwaraNeural = "hi-IN-SwaraNeural",
      
        // Italian
        itITDiegoNeural = "it-IT-DiegoNeural",
        itITElsaNeural = "it-IT-ElsaNeural",
      
        // Japanese
        jaJPAoiNeural = "ja-JP-AoiNeural",
        jaJPNanamiNeural = "ja-JP-NanamiNeural",
      
        // Korean
        koKRInJoonNeural = "ko-KR-InJoonNeural",
        koKRSunHiNeural = "ko-KR-SunHiNeural",
      
        // Portuguese (Brazil)
        ptBRFranciscaNeural = "pt-BR-FranciscaNeural",
        ptBRAntonioNeural = "pt-BR-AntonioNeural",
      
        // Russian
        ruRUDmitryNeural = "ru-RU-DmitryNeural",
        ruRUSvetlanaNeural = "ru-RU-SvetlanaNeural",
      
        // Spanish (Mexico)
        esMXJorgeNeural = "es-MX-JorgeNeural",
        esMXDaliaNeural = "es-MX-DaliaNeural",
      
        // Spanish (Spain)
        esESElviraNeural = "es-ES-ElviraNeural",
        esESAlvaroNeural = "es-ES-AlvaroNeural",
      
        // Swedish
        svSESofieNeural = "sv-SE-SofieNeural",
        svSEMattiasNeural = "sv-SE-MattiasNeural",
      }
  2. throttleDelay :

    • Expects a value in milliseconds.
    • This is the throttle duration for listening to the user.
    • The default is set to 2000ms.
  3. onEnd :

    • Expects a callback function that is invoked when the AI speech ends.
    • To invoke this, set shouldCallOnEnd to true.

useSpeak Hook Returns.

  1. speak :

    • Function to start text-to-speech recognition.
    • Expects a string argument to be converted to speech.
  2. interruptSpeech :

    • Function to stop the AI speech.
  3. hasAllSentencesBeenSpoken :

    • Returns a boolean value indicating if all sentences have been recognized.
  4. isSpeaking :

    • Returns a boolean value indicating if the AI is speaking.
  5. streamedSentences :

    • Returns an array of strings with all streamed sentences.

Contributing

Contributions are welcome! Please read our Contributing Guide for more information.

License

This project is licensed under the MIT License.