npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@solyarisoftware/voskjs

v1.2.8

Published

NodeJs developers API for Vosk-api speech-to-text engine.

Downloads

61

Readme

VoskJs

VoskJs is a NodeJs developers toolkit to use Vosk offline speech recognition engine, including multi thread (server) usage examples. The project gives you:

  • A simple sentence-based and streaming-based transcript APIs
  • The command line utility voskjs
  • A demo HTTP transcript server voskjshttp

VoskJs can be used for speech recognition processing in different scenarios:

  • Single-user/standalone programs (e.g. perfect for single-user embedded systems)
  • Multi-user/multi-core server architectures

What's Vosk?

Vosk is an open source embedded (offline/on-prem) speech-to-text engine which can run with very low latencies (< 500msecs on my PC). Vosk is based on a common DNN-HMM architecture. Deep neural network is used for sound scoring (acoustic scoring), HMM and WFST frameworks are used for time models (language models). It's based on Kaldi, but Nikolay V. Shmyrev's Vosk offers a smart, simplified and performing interface! More details in the Vosk home page and github repo.

What's VoskJs?

The goal of the project is to create an simple function API layer on top of already existing Vosk nodejs binding, supplying both sentence-based and streaming-based speech-to-text functionalities.

Sentence-based transcript API

In this mode, a file or a PCM buffer are processed asynchronously, to get the full text transcript of the given speech. Using the simple transcript interface you can build your standalone custom application, accessing async functions suitable to run on a usual single thread nodejs program.

Pseudo code:

//Loads once in RAM memory a specific Vosk engine model from a model directory.
const model = loadModel(modelDirectory)

// transcripts a speech file or buffer (in WAV/PCM format), using Vosk engine. 
// It supply speech-to-text transcript detailed info.
const result = await transcriptFromFile(fileName, model, {options}) 

// or 
// const result = await transcriptFromBuffer(buffer, model, {options}) 

freeModel(model)

Streaming-based transcript API (DRAFT)

Following Vosk-api recognizer result functions, VoskJs emit these nodejs events:

| Event name | Vosk-api recognizer function | description | | ------------- | ---------------------------- | ------------------------------------------- | | partial | recognizer.patialResult() | silent (text = '') or new word or new words | | endOfSpeech | recognizer.result() | end of speech (words followed by a silence) | | final | recognizer.finalResult() | last part of the audio |

Pseudo code:

//Loads once in RAM memory a specific Vosk engine model from a model directory.
const model = loadModel(modelDirectory)

const transcriptEvents = transcriptEventsFromFile(fileName, model, {options}) 
// or
// const transcriptEvents = transcriptEventsFromBuffer(buffer, model, {options}) 

// an new word is detected
transcriptEvents.on('partial', data => console.log(data) ) 

// a complete sentence (followed by silence) is detected 
transcriptEvents.on('endOfSpeech', data => console.log(data) )

// final (last) sentence is detected
transcriptEvents.on('final', data => console.log(data) )

freeModel(model)

Command line tools

  • voskjs: command line program to test Vosk transcript with specific models (some tests and command line usage here).

    BTW the utility can be configured to tabularize events. By example:

    voskjs --audio=audio/sentencesWithSilences.wav --model=models/vosk-model-small-en-us-0.15 --tableevents
    voskjs is a CLI utility to test Vosk-api features
    package @solyarisoftware/voskjs version 1.2.7, Vosk-api version 0.3.30
    
    Statistics:
    
    model directory      : models/vosk-model-small-en-us-0.15
    speech file name     : audio/sentencesWithSilences.wav
    grammar              : not specified. Default: NO
    sample rate          : not specified. Default: 16000
    max alternatives     : undefined
    text only / JSON     : JSON
    Vosk debug level     : -1
    
    load model latency   : 2001ms
    transcript latency   : 1707ms
    transcript text      : one two three four five six seven eight nine zero one two three stop 
    
    Events table:
    
    | time   | event        | text                                     |
    | ------ | ------------ | ---------------------------------------- |
    |     66 | partial      | 
    |    489 | partial      | one
    |    538 | partial      | one two
    |    592 | partial      | one two three
    |    635 | endOfSpeech  | one two three
    |    668 | partial      | 
    |    847 | partial      | for
    |    882 | partial      | four five six
    |    977 | partial      | four five six seven
    |   1099 | partial      | four five six seven eight
    |   1169 | endOfSpeech  | four five six seven eight
    |   1194 | partial      | 
    |   1322 | partial      | nine
    |   1381 | partial      | nine zero
    |   1456 | partial      | nine zero one
    |   1498 | partial      | nine zero one two
    |   1550 | partial      | nine zero one two three
    |   1630 | partial      | nine zero one two three stop
    |   1649 | endOfSpeech  | nine zero one two three stop
    |   1677 | partial      | 
    |   1706 | final        | 
  • voskjshttp: a simple demo HTTP server to transcript speech files. Using above API you can build your own server. Some usage examples here.

🛍 Install

1. Install Vosk engine and this nodejs module

  • Install vosk-api engine

    pip3 install -U vosk 

    See also: https://alphacephei.com/vosk/install

  • Install this module, as global package if you want to use CLI command voskjs

    npm install -g @solyarisoftware/voskjs@latest

2. Install/Download Vosk models

mkdir your/path/models && cd models

# English large model
wget https://alphacephei.com/vosk/models/vosk-model-en-us-aspire-0.2.zip
unzip vosk-model-en-us-aspire-0.2.zip

# English small model
wget http://alphacephei.com/vosk/models/vosk-model-small-en-us-0.15.zip
unzip vosk-model-small-en-us-0.15.zip

# Italian model model
wget https://alphacephei.com/vosk/models/vosk-model-small-it-0.4.zip
unzip vosk-model-small-it-0.4.zip

More about available Vosk models here: https://alphacephei.com/vosk/models

3. Demo audio files

Directory audio contains some English language speech audio files, coming from a Mozilla DeepSpeech repo. Source: Mozilla DeepSpeech audio samples These files are used for some tests and comparisons.

🧐 Examples

Some VoskJs usage examples:

🛠 Tests

Some tests/notes:

  • Transcript using English language, large model
  • Transcript using English language, small model
  • Comparison between Vosk and Mozilla DeepSpeech (latencies)
  • Multi-thread stress test (10 requests in parallel)
  • HTTP Server benchmark test
  • Latency tests

🎁 Bonus track

audioutils some audio utility functions as toPCM, a fast transcoding to PCM, using ffmpeg process (install ffmpeg before).

To do

  • To speedup latencies, rethink transcript interface, maybe with an initialization phases, including Model and Recognizer(s) object creation. Possible architecture: Stateful & low latency ASR architecture
  • Deepen grammar usage with more examples
  • Deepen Vosk-API errors catching
  • voskjshttp:
    • Review stress and performances tests (especially for the HTTP server)
    • HTTP POST management:
      • set mandatory audio format mime type in the header request (--header "Content-Type: audio/wav")
      • audio-transcoding using function toPcm if input speech files are not specified as wav in header request (e.g. --header "Content-Type: audio/webm") see https://cloud.ibm.com/docs/speech-to-text?topic=speech-to-text-audio-formats#audio-formats-list

How to contribute

If you like the project, please ⭐️ star this repository to show your support! 🙏

Any contribute is welcome:

  • Discussions. Please open a new discussion (a publich chat on github) for any specific open topic, for a clarification, change request proposals, etc.
  • Issues Please submit issues for bugs, etc
  • e-mail You can contact me privately, via email

💣 Status

  • Project is in a very draft stage
  • Warning: multi-threading causes a crash: https://github.com/solyarisoftware/voskJs/issues/3 The issue has a temporary workaround: https://github.com/alphacep/vosk-api/issues/516#issuecomment-833462121

🙏 Credits

Thanks to Nicolay V. Shmyrev, author of Vosk project, for the help about nodeJs API bindings for multi-threading management

License

MIT (c) Giorgio Robino


top