npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

ms-bing-speech-service

v2.0.6

Published

[![Build Status](https://api.travis-ci.org/noopkat/ms-bing-speech-service.svg?branch=master)](https://travis-ci.org/noopkat/ms-bing-speech-service) [![Coverage Status](https://coveralls.io/repos/github/noopkat/ms-bing-speech-service/badge.svg?branch=mast

Downloads

67

Readme

Build Status Coverage Status

Microsoft Speech to Text Service

(Unofficial) JavaScript service wrapper for Microsoft Speech API. It is an implementation of the Speech Websocket API specifically, which supports long speech recognition up to 10 minutes in length. Are you looking for Microsoft Speech HTTP API (short speech) support instead? This SDK can help you out :)

npm install ms-bing-speech-service

Installation

  1. Install NodeJS on your computer
  2. Create a new directory for your code project if you haven't already
  3. Open a terminal and run npm install ms-bing-speech-service from your project directory

Usage

✨ This library works in both browsers and NodeJS runtime environments ✨ Please see the examples directory in this repo for more in depth examples than those below.

Microsoft Speech API

You'll first need to create a Microsoft Speech API key. You can do this while logged in to the Azure Portal.

The following code will get you up and running with the essentials in Node:

const speechService = require('ms-bing-speech-service');

const options = {
  language: 'en-US',
  subscriptionKey: '<your Bing Speech API key>'
};

const recognizer = new speechService(options);

recognizer
  .start()
  .then(_ => {
    recognizer.on('recognition', (e) => {
      if (e.RecognitionStatus === 'Success') console.log(e);
    });

    recognizer.sendFile('future-of-flying.wav')
      .then(_ => console.log('file sent.'))
      .catch(console.error);
  })
  .catch(console.error);

You can also use this library with the async/await pattern!

const speechService = require('ms-bing-speech-service');

(async function() {

  const options = {
    language: 'en-US',
    subscriptionKey: '<your Bing Speech API key>'
  };
	
  const recognizer = new speechService(options);
  await recognizer.start();

  recognizer.on('recognition', (e) => {
    if (e.RecognitionStatus === 'Success') console.log(e);
  });
  
  recognizer.on('turn.end', async (e) => {
    console.log('recognizer is finished.');
    
    await recognizer.stop();
    console.log('recognizer is stopped.');
  });
	
  await recognizer.sendFile('future-of-flying.wav');
  console.log('file sent.');

})();

And in the browser (a global window distribution is also available in dist directory). Use an ArrayBuffer instance in place of a file path:

import speechService from 'MsBingSpeechService';

const file = myArrayBuffer;

const options = {
  language: 'en-US',
  subscriptionKey: '<your Bing Speech API key>'
}

const recognizer = new speechService(options);

recognizer.start()
  .then(_ => {
    console.log('service started');

    recognizer.on('recognition', (e) => {
      if (e.RecognitionStatus === 'Success') console.log(e);
    });
    
    recognizer.sendFile(file);
  }).catch((error) => console.error('could not start service:', error));

The above examples will use your subscription key to create an access token with Microsoft's service.

In some instances you may not want to share your subscription key directly with your application. If you're creating an app with multiple users, you may want to issue access tokens from an external API so each user can connect to the speech service without exposing your subscription key.

To do this, replace "subscriptionKey" in the above code example with "accessToken" and pass in the provided token.

const options = {
  language: 'en-US',
  accessToken: '<your access token here>'
};

Custom Speech Service

Yes! You can totally use this with Custom Speech Service. You'll need a few more details in your options object, though.

Your subscriptionKey will be the key displayed on your custom endpoint deployment page in the Custom Speech Management Portal. There, you can also find your websocket endpoint of choice to use.

The following code will get you up and running with the Custom Speech Service:

const speechService = require('ms-bing-speech-service');

const options = {
  subscriptionKey: '<your Custom Speech Service subscription key>',
  serviceUrl: 'wss://<your endpoint id>.api.cris.ai/speech/recognition/conversation/cognitiveservices/v1',
  issueTokenUrl: 'https://westus.api.cognitive.microsoft.com/sts/v1.0/issueToken'
};

const recognizer = new speechService(options);

recognizer
  .start()
  .then(_ => {
    recognizer.on('recognition', (e) => {
      if (e.RecognitionStatus === 'Success') console.log(e);
    });

    recognizer.sendFile('future-of-flying.wav');
  }
}).catch(console.error);

See the API section of these docs for details on configuration and methods.

API Reference

Methods

SpeechService(options)

  • options Object
  • Returns SpeechService

Creates a new instance of SpeechService.

const recognizer = new SpeechService(options);

Available options are below:

| name | type | description | default | required | |---------------------------|-----------|----------------------------------------------------------------------------------------------------------|---------|----------| | subscriptionKey | String | your Speech API key | n/a | yes | | accessToken | String | your Speech access token. Only required if subscriptionKey option not supplied. | n/a | no | | language | String | the language you want to translate from. See supported languages in the official Microsoft Speech API docs. | 'en-US' | no | | mode | String | which recognition mode you'd like to use. Choose from interactive, conversation, or dictation | 'conversation' | no | | format | String | file format you'd like the text to speech to be returned as. Choose from simple or detailed | 'simple' | no |

recognizer.start()

Connects to the Speech API websocket on your behalf. Returns a promise.

recognizer.start().then(() => {
 console.log('recognizer service started.');
}).catch(console.error);

recognizer.stop()

Disconnects from the established websocket connection to the Speech API. Returns a promise.

recognizer.stop().then(() => {
  console.log('recognizer service stopped.');
}).catch(console.error);

recognizer.sendStream(stream)

  • stream Readable Stream

Sends an audio payload stream to the Speech API websocket connection. Audio payload is a native NodeJS Buffer stream (eg. a readable stream) or an ArrayBuffer in the browser. Returns a promise.

See the 'Sending Audio' section of the official Speech API docs for details on the data format needed.

NodeJS example:

const fs = require('fs');
const audioStream = fs.createReadableStream('speech.wav');

recognizer.sendStream(audioStream).then(() => {
 recognizer.on('recognition', (message) => {
  console.log('new recognition:', message);
 });

 console.log('stream sent.');
}).catch(console.error);

recognizer.sendFile(filepath)

  • filepath String

Streams an audio file from disk to the Speech API websocket connection. Also accepts a NodeJS Buffer or browser ArrayBuffer. Returns a promise.

See the 'Sending Audio' section of the official Speech API docs for details on the data format needed for the audio file.

recognizer.sendFile('/path/to/audiofile.wav').then(() => {
  console.log('file sent.');
}).catch(console.error);

or


fetch('speech.wav')
  .then((response) => response.arrayBuffer())
  .then((audioBuffer) => recognizer.sendFile(audioBuffer))
  .then((recognizer) => console.log('file sent'))
  .catch((error) => console.log('something went wrong:', error));

Events

You can listen to the following events on the recognizer instance:

recognizer.on('recognition', callback)

  • callback Function

Event listener for incoming recognition message payloads from the Speech API. Message payload is a JSON object.

recognizer.on('recognition', (message) => {
  console.log('new recognition:', message);
});

recognizer.on('close', callback)

  • callback Function

Event listener for Speech API websocket connection closures.

recognizer.on('close', (error) => {
  console.log('Speech API connection closed');
  // you can optionally look for an error object (most closures currently report a 1006 even when intentional closure happens but we're looking into it!)
  console.log(error);
});

recognizer.on('error', callback)

  • callback Function

Event listener for incoming Speech API websocket connection errors.

recognizer.on('error', (error) => {
  console.log(error);
});

recognizer.on('turn.start', callback)

  • callback Function

Event listener for Speech API websocket 'turn.start' event. Fires when service detects an audio stream.

recognizer.on('turn.start', () => {
  console.log('start turn has fired.');
});

recognizer.on('turn.end', callback)

  • callback Function

Event listener for Speech API websocket 'turn.end' event. Fires after 'speech.endDetected' event and the turn has ended. This event is an ideal one to listen to in order to be notified when an entire stream of audio has been processed and all results have been received.

recognizer.on('turn.end', () => {
  console.log('end turn has fired.');
});

recognizer.on('speech.startDetected', callback)

  • callback Function

Event listener for Speech API websocket 'speech.startDetected' event. Fires when the service has first detected speech in the audio stream.

recognizer.on('speech.startDetected', () => {
  console.log('speech startDetected has fired.');
});

recognizer.on('speech.endDetected', callback)

  • callback Function

Event listener for Speech API websocket 'speech.endDetected' event. Fires when the service has stopped being able to detect speech in the audio stream.

recognizer.on('speech.endDetected', () => {
  console.log('speech endDetected has fired.');
});

recognizer.on('speech.phrase', callback)

  • callback Function

Identical to the recognition event. Event listener for incoming recognition message payloads from the Speech API. Message payload is a JSON object.

recognizer.on('speech.phrase', (message) => {
  console.log('new phrase:', message);
});

recognizer.on('speech.hypothesis', callback)

  • callback Function

Event listener for Speech API websocket 'speech.hypothesis' event. Only fires when using interactive mode. Contains incomplete recognition results. This event will fire often - beware!

recognizer.on('speech.hypothesis', (message) => {
  console.log('new hypothesis:', message);
});

recognizer.on('speech.fragment', callback)

  • callback Function

Event listener for Speech API websocket 'speech.fragment' event. Only fires when using dictation mode. Contains incomplete recognition results. This event will fire often - beware!

recognizer.on('speech.fragment', (message) => {
  console.log('new fragment:', message);
});

License

MIT.

Credits

Big thanks to @michael-chi. Their bing speech example was a great foundation to build upon, particularly the response parser and header helper.