@solyarisoftware/voskjs

v1.2.8

Published

3 years ago

NodeJs developers API for Vosk-api speech-to-text engine.

Downloads

0High
0Medium
0Low

solyarisoftware

Vosk Vosk-API Speech Recognition Speech to Text ASR offline nodeJs

VoskJs

VoskJs is a NodeJs developers toolkit to use Vosk offline speech recognition engine, including multi thread (server) usage examples. The project gives you:

A simple sentence-based and streaming-based transcript APIs
The command line utility voskjs
A demo HTTP transcript server voskjshttp

VoskJs can be used for speech recognition processing in different scenarios:

Single-user/standalone programs (e.g. perfect for single-user embedded systems)
Multi-user/multi-core server architectures

What's Vosk?

Vosk is an open source embedded (offline/on-prem) speech-to-text engine which can run with very low latencies (< 500msecs on my PC). Vosk is based on a common DNN-HMM architecture. Deep neural network is used for sound scoring (acoustic scoring), HMM and WFST frameworks are used for time models (language models). It's based on Kaldi, but Nikolay V. Shmyrev's Vosk offers a smart, simplified and performing interface! More details in the Vosk home page and github repo.

What's VoskJs?

The goal of the project is to create an simple function API layer on top of already existing Vosk nodejs binding, supplying both sentence-based and streaming-based speech-to-text functionalities.

Sentence-based transcript API

In this mode, a file or a PCM buffer are processed asynchronously, to get the full text transcript of the given speech. Using the simple transcript interface you can build your standalone custom application, accessing async functions suitable to run on a usual single thread nodejs program.

Pseudo code:

//Loads once in RAM memory a specific Vosk engine model from a model directory.
const model = loadModel(modelDirectory)

// transcripts a speech file or buffer (in WAV/PCM format), using Vosk engine. 
// It supply speech-to-text transcript detailed info.
const result = await transcriptFromFile(fileName, model, {options}) 

// or 
// const result = await transcriptFromBuffer(buffer, model, {options}) 

freeModel(model)

Streaming-based transcript API (DRAFT)

Following Vosk-api recognizer result functions, VoskJs emit these nodejs events:

| Event name | Vosk-api recognizer function | description | | ------------- | ---------------------------- | ------------------------------------------- | | partial | recognizer.patialResult() | silent (text = '') or new word or new words | | endOfSpeech | recognizer.result() | end of speech (words followed by a silence) | | final | recognizer.finalResult() | last part of the audio |

Pseudo code:

//Loads once in RAM memory a specific Vosk engine model from a model directory.
const model = loadModel(modelDirectory)

const transcriptEvents = transcriptEventsFromFile(fileName, model, {options}) 
// or
// const transcriptEvents = transcriptEventsFromBuffer(buffer, model, {options}) 

// an new word is detected
transcriptEvents.on('partial', data => console.log(data) ) 

// a complete sentence (followed by silence) is detected 
transcriptEvents.on('endOfSpeech', data => console.log(data) )

// final (last) sentence is detected
transcriptEvents.on('final', data => console.log(data) )

freeModel(model)

Command line tools

voskjs: command line program to test Vosk transcript with specific models (some tests and command line usage here).

BTW the utility can be configured to tabularize events. By example:

voskjs --audio=audio/sentencesWithSilences.wav --model=models/vosk-model-small-en-us-0.15 --tableevents

voskjs is a CLI utility to test Vosk-api features
package @solyarisoftware/voskjs version 1.2.7, Vosk-api version 0.3.30

Statistics:

model directory      : models/vosk-model-small-en-us-0.15
speech file name     : audio/sentencesWithSilences.wav
grammar              : not specified. Default: NO
sample rate          : not specified. Default: 16000
max alternatives     : undefined
text only / JSON     : JSON
Vosk debug level     : -1

load model latency   : 2001ms
transcript latency   : 1707ms
transcript text      : one two three four five six seven eight nine zero one two three stop 

Events table:

| time   | event        | text                                     |
| ------ | ------------ | ---------------------------------------- |
|     66 | partial      | 
|    489 | partial      | one
|    538 | partial      | one two
|    592 | partial      | one two three
|    635 | endOfSpeech  | one two three
|    668 | partial      | 
|    847 | partial      | for
|    882 | partial      | four five six
|    977 | partial      | four five six seven
|   1099 | partial      | four five six seven eight
|   1169 | endOfSpeech  | four five six seven eight
|   1194 | partial      | 
|   1322 | partial      | nine
|   1381 | partial      | nine zero
|   1456 | partial      | nine zero one
|   1498 | partial      | nine zero one two
|   1550 | partial      | nine zero one two three
|   1630 | partial      | nine zero one two three stop
|   1649 | endOfSpeech  | nine zero one two three stop
|   1677 | partial      | 
|   1706 | final        |

voskjshttp: a simple demo HTTP server to transcript speech files. Using above API you can build your own server. Some usage examples here.

🛍 Install

1. Install Vosk engine and this nodejs module

Install vosk-api engine
```
pip3 install -U vosk 
```
See also: https://alphacephei.com/vosk/install
Install this module, as global package if you want to use CLI command voskjs
```
npm install -g @solyarisoftware/voskjs@latest
```

2. Install/Download Vosk models

mkdir your/path/models && cd models

# English large model
wget https://alphacephei.com/vosk/models/vosk-model-en-us-aspire-0.2.zip
unzip vosk-model-en-us-aspire-0.2.zip

# English small model
wget http://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
unzip vosk-model-small-en-us-0.15.zip

# Italian model model
wget https://alphacephei.com/vosk/models/vosk-model-small-it-0.4.zip
unzip vosk-model-small-it-0.4.zip

More about available Vosk models here: https://alphacephei.com/vosk/models

3. Demo audio files

Directory audio contains some English language speech audio files, coming from a Mozilla DeepSpeech repo. Source: Mozilla DeepSpeech audio samples These files are used for some tests and comparisons.

🧐 Examples

Some VoskJs usage examples:

🛠 Tests

Some tests/notes:

Transcript using English language, large model
Transcript using English language, small model
Comparison between Vosk and Mozilla DeepSpeech (latencies)
Multi-thread stress test (10 requests in parallel)
HTTP Server benchmark test
Latency tests

🎁 Bonus track

audioutils some audio utility functions as toPCM, a fast transcoding to PCM, using ffmpeg process (install ffmpeg before).

To do

To speedup latencies, rethink transcript interface, maybe with an initialization phases, including Model and Recognizer(s) object creation. Possible architecture: Stateful & low latency ASR architecture
Deepen grammar usage with more examples
Deepen Vosk-API errors catching
voskjshttp:
- Review stress and performances tests (especially for the HTTP server)
- HTTP POST management:
  - set mandatory audio format mime type in the header request (--header "Content-Type: audio/wav")
  - audio-transcoding using function toPcm if input speech files are not specified as wav in header request (e.g. --header "Content-Type: audio/webm") see https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-audio-formats#audio-formats-list

How to contribute

If you like the project, please ⭐️ star this repository to show your support! 🙏

Any contribute is welcome:

Discussions. Please open a new discussion (a publich chat on github) for any specific open topic, for a clarification, change request proposals, etc.
Issues Please submit issues for bugs, etc
e-mail You can contact me privately, via email

💣 Status

Project is in a very draft stage
Warning: multi-threading causes a crash: https://github.com/solyarisoftware/voskJs/issues/3 The issue has a temporary workaround: https://github.com/alphacep/vosk-api/issues/516#issuecomment-833462121

🙏 Credits

Thanks to Nicolay V. Shmyrev, author of Vosk project, for the help about nodeJs API bindings for multi-threading management

License

MIT (c) Giorgio Robino

top