npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

azure-speech-utilities

v1.0.0

Published

Provides a convenient abstraction layer over the Microsoft Cognitive Services Speech SDK, simplifying the integration of speech-to-text functionality into client applications. Using this npm package, developers can quickly integrate speech-to-text capabil

Downloads

57

Readme

azure-speech-utilities

Provides a convenient abstraction layer over the Microsoft Cognitive Services Speech SDK, simplifying the integration of speech-to-text and text-to-speech functionality into client/browser applications. Using this package, developers can quickly integrate basic STT and TTS capabilities into their applications without the need to write intricate code.

Features:

  • Perform a single speech recognition operation with ease.
  • Enable continuous speech recognition for real-time applications.
  • Multilingual speech recognition.
  • Text to speech synthesis.
  • SSML/Text input for TTS.

Installing

Using npm:

npm install azure-speech-utilities

Function Description

CreateRecognizer

Creates a new speech recognizer instance.

| Parameter | Type | Default Value | Description | | :-------: | :--: | :-----------: | :----------: | | cogSvcSubKey | string | "" | The Cognitive Services subscription key for Speech Services. (Required, default is empty string) | | cogSvcRegion | string | "" | The region of Cognitive Services subscription. (Required, default is empty string) | | recognitionLang | string[] | ["en-US"] | An array of language recognition codes. (Optional, default is ["en-US"]). |

RecognizeOnceAsync

Used for single-shot recognition, which recognizes a single utterance. The end of a single utterance is determined by listening for silence at the end or until a maximum of 15 seconds of audio is processed.

| Parameter | Type | Default Value | Description | | :-------: | :--: | :-----------: | :----------: | | recognizer | sdk.SpeechRecognizer | undefined | undefined | The speech recognizer instance to use. |

ContinuousRecognitionAsync

The previous function performs single-shot recognition, which recognizes a single utterance. In contrast, you can use continuous recognition to get a real-time recognized text stream. Make a call to StopContinuousRecognitionAsync() at some point to stop recognition

| Parameter | Type | Default Value | Description | | :-------: | :--: | :-----------: | :----------: | | recognizer | sdk.SpeechRecognizer | undefined | undefined | The speech recognizer instance to use. | | callbackRecognized | (value: string) => void | val => console.log(value) | A callback function called with recognized text. | | callbackRecognizing | (value: string) => void | val => console.log(value) | A callback function called while speech is being recognized. |

StopContinuousRecognitionAsync

Stops ongoing continuous speech recognition.

| Parameter | Type | Default Value | Description | | :-------: | :--: | :-----------: | :----------: | | recognizer | sdk.SpeechRecognizer | undefined | undefined | The speech recognizer instance to use. |

Note: Use the same recognizer instance which you are using for ContinuousRecognitionAsync() as an argument to this function.

CreateSynthesizer

Creates a new speech synthesizer instance.

| Parameter | Type | Default Value | Description | | :-------: | :--: | :-----------: | :----------: | | cogSvcSubKey | string | "" | The Cognitive Services subscription key for Speech Services. (Required) | | cogSvcRegion | string | "" | The region of Cognitive Services subscription. (Required) | | synthesisLang | string | "" | The language code for the speech synthesizer. (Required) | | synthesisVoiceName | string | "" | The name of the voice to use for speech synthesis. (Optional, default is "") | | createAudioConfig | boolean | false | Whether to create an audio config for speech output. (Optional, default is false) |

Note: The voice that speaks is determined in order of priority as follows:

  • Passing false for createAudioConfig, doesn't play the audio by default on the current active output device.
  • If you only set synthesisLang, the default voice for the specified locale speaks.
  • If both synthesisVoiceName and synthesisLang are set, the synthesisLang setting is ignored. The voice that you specify by using synthesisVoiceName speaks.
  • If the voice element is set by using Speech Synthesis Markup Language (SSML), the synthesisVoiceName and synthesisLang settings are ignored.

SpeakAsync

Performs speech synthesis and returns the result (synthesized audio) in form of arrayBuffer.

| Parameter | Type | Default Value | Description | | :-------: | :--: | :-----------: | :----------: | | synthesizer | sdk.SpeechSynthesizer | undefined | undefined | The speech synthesizer instance to use. | | inputString |string | "I'm excited to try text to speech" | The text to be synthesized. | | inputType | string | "text" | The format of the input text. (Optional, default is "text") | | callback | (result: sdk.SynthesisResult, error?: Error) => void | (result, error) => {} | A callback function called with the synthesis result or an error. |

Example

Recognize Once

import { CreateRecognizer, RecognizeOnceAsync  } from "azure-speech-utilities"

const CGV_KEY = "AZURE_SPEECH_SERVICE_KEY"
const CGV_REGION = "AZURE_SPEECH_SERVICE_REGION"

async function recognizeSpeech() {
    const recognizer = CreateRecognizer(CGV_KEY, CGV_REGION, ["hi-IN"])
    try {
            const recognizedText = await RecognizeOnceAsync(recognizer)
            if (recognizedText.type === "text") {
                console.log(recognizedText.message)
            } else {
                console.log(recognizedText.message)
            }
    } catch (error) {
      console.error(error)
    }
}

Continuous Recognition

import { CreateRecognizer, ContinuousRecognitionAsync } from "azure-speech-utilities"

const CGV_KEY = "AZURE_SPEECH_SERVICE_KEY"
const CGV_REGION = "AZURE_SPEECH_SERVICE_REGION"

// As there are 2 or more recognition languages "hi-IN" and "en-US" so it will be multilingual recognition.
const recognizer = CreateRecognizer(CGV_KEY, CGV_REGION, ["hi-IN", "en-US"])

function callbackRecognized(text) {
    console.log("RECOGNIZED: ", text)
}

function callbackRecognizing(text) {
    console.log("RECOGNIZING: ", text)
}

async function recognizeSpeech() {
    try {
            const response = await ContinuousRecognitionAsync(recognizer, callbackRecognized, callbackRecognizing)
            if (response.type === "success") {
                console.log(response.message)
            } else {
                console.error(response.message)
            }
    } catch (error) {
      console.error(error)
    }
}

function stopContinuousRecognition() {
    StopContinuousRecognitionAsync(recognizer)
}

Speak Async

import { CreateSynthesizer, SpeakAsync } from "azure-speech-utilities"

const CGV_KEY = "AZURE_SPEECH_SERVICE_KEY"
const CGV_REGION = "AZURE_SPEECH_SERVICE_REGION"
const SYNTHESIS_LANGUAGE = "en-US"
const SYNTHESIS_VOICE_NAME = "en-US-JennyNeural"

function handleSpeck() {
    // By default, the input type is 'text.' If you change the input type to 'ssml,' then the input string should be in the following SSML format.
    // const ssml = `
    // <speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="${SYNTHESIS_LANGUAGE}">
    //     <voice name="${SYNTHESIS_VOICE_NAME}">
    //         When you're on the freeway, it's a good idea to use a GPS.
    //     </voice>
    // </speak>
    // `

    const text = "When you're on the freeway, it's a good idea to use a GPS."

    // Please note that the 'createAudioConfig' is set to false, meaning audio will not play by default on the currently active output device.
    const synthesizer = CreateSynthesizer(CGV_KEY, CGV_REGION, SYNTHESIS_LANGUAGE, SYNTHESIS_VOICE_NAME, false)

    SpeakAsync(synthesizer, text, "text", (result, error) => {
      if (error) {
        console.error(error)
      } else {
        console.log(result)
        const audioBlob = new Blob([result.audioData], { type: "audio/wav" })

        // You can use this URL as an audio source, which allows easy user control such as starting, stopping, resetting, etc.
        console.log(URL.createObjectURL(audioBlob))
      }
    })
  }

  const stopSpeaking = () => {
    audioRef.current.pause()
  }

Note: If you do not wish to play audio through an audio source, you can set createAudioConfig to true. This will cause the audio to play on the current active output device by default. However, using this method will not provide the user with the ability to reset, play, or pause the speaking audio.

Contributing

This project welcomes contributions and suggestions.