sber-salute-speech-recognition
v2.2.0
Published
A library that produces audio transcriptions using the SBER Salute Speech service.
Downloads
30
Maintainers
Readme
sber-salute-speech-recognition
A library for getting audio transcriptions from the SBER Salute Speech service https://developers.sber.ru/docs/ru/salutespeech/recognition/recognition-sync.
Getting started
- Create an account on https://developers.sber.ru/portal/products/smartspeech
- Generate a client secret
- Capture the auth key
Install
npm install sber-salute-speech-recognition
Usage
import {
SberSaluteSpeechRecognitionService,
AudioEncoding,
} from 'sber-salute-speech-recognition';
const recognitionService = new SberSaluteSpeechRecognitionService(AUTH_KEY);
const { text, normalizedText } = await recognitionService.speechToText(
pathToAudioFile,
AudioEncoding.MP3,
);
AccessToken Scope
The scope
property is set during the instantiation of the SberSaluteSpeechRecognitionService
class.
If no value is provided, it defaults to Scope.Personal
.
Here is an example of how to use the scope
property:
import {
SberSaluteSpeechRecognitionService,
Scope,
} from 'sber-salute-speech-recognition';
const service = new SberSaluteSpeechRecognitionService(
AUTH_KEY,
undefined,
Scope.Corporate
);
In this example, the scope
property is set to Scope.Corporate
what equals SALUTE_SPEECH_CORP
.
If you want to use the SALUTE_SPEECH_PERS
scope,
you can either pass Scope.Personal
as the third argument or omit the third argument entirely,
as it defaults to Scope.Personal
.
Hints param
For improve speech recognition you can pass in speechToText
method hints param
Here is example of how to use the hints
param:
import {
SberSaluteSpeechRecognitionService,
AudioEncoding,
} from 'sber-salute-speech-recognition';
const recognitionService = new SberSaluteSpeechRecognitionService(AUTH_KEY);
const { text, normalizedText } = await recognitionService.speechToText(
pathToAudioFile,
AudioEncoding.MP3,
{
words: ['card', 'name'],
enable_letters: true,
eou_timeout: "2s"
}
);
In this example , we pass object with props words
, enable_letters
, eou_timeout
.
words
- A list of words or phrases whose recognition we want to strengthen.
Here you can list the words that the user is likely to pronounce
enable_letters
- A short phrase model that improves recognition of single letters and short words.
Possible values: true
and false
eou_timeout
- Setting up recognition of the end of a phrase (End of Utterance - eou).
Such recognition will be expected after the end of the phrase for as many seconds as set in this parameter.
Possible values are from 0.5 to 5 seconds.
Channels count
The channels_count allows to override the number of channels to recognize in the audio file. If not specified, the number of channels will be determined automatically.
Possible values: undefined - channels count will be determined automatically 1 - mono 2 - stereo
import {
SberSaluteSpeechRecognitionService,
AudioEncoding,
} from 'sber-salute-speech-recognition';
const recognitionService = new SberSaluteSpeechRecognitionService(AUTH_KEY);
const { text, normalizedText } = await recognitionService.speechToText(
pathToAudioFile,
AudioEncoding.MP3,
1
);
In this example we set channels count to 1.