whisper-node

v1.1.1

Published

a year ago

Node.js bindings for OpenAI's Whisper. Runs local on CPU.

Downloads

3,454

0High
0Medium
0Low

ariym

OpenAI Whisper CPP C++Bindings Transcribe Transcriber Transcript Transcription Audio Speech Speech-to-Text STT TTS SRT Diarization local

whisper-node

Node.js bindings for OpenAI's Whisper. Transcription done local.

Features

Output transcripts to JSON (also .txt .srt .vtt)
Optimized for CPU (Including Apple Silicon ARM)
Timestamp precision to single word

Installation

Add dependency to project

npm install whisper-node

Download whisper model of choice

npx whisper-node download

Requirement for Windows: Install the make command from here.

Usage

import whisper from 'whisper-node';

const transcript = await whisper("example/sample.wav");

console.log(transcript); // output: [ {start,end,speech} ]

Output (JSON)

[
  {
    "start":  "00:00:14.310", // time stamp begin
    "end":    "00:00:16.480", // time stamp end
    "speech": "howdy"         // transcription
  }
]

Usage with Additional Options

import whisper from 'whisper-node';

const filePath = "example/sample.wav"; // required

const options = {
  modelName: "base.en",       // default
  // modelPath: "/custom/path/to/model.bin", // use model in a custom directory (cannot use along with 'modelName')
  whisperOptions: {
    language: 'auto'          // default (use 'auto' for auto detect)
    gen_file_txt: false,      // outputs .txt file
    gen_file_subtitle: false, // outputs .srt file
    gen_file_vtt: false,      // outputs .vtt file
    word_timestamps: true     // timestamp for every word
    // timestamp_size: 0      // cannot use along with word_timestamps:true
  }
}

const transcript = await whisper(filePath, options);

Files must be .wav and 16Hz

Use FFmpeg to convert an example .mp3 with this command: ffmpeg -i input.mp3 -ar 16000 output.wav

Made with

Roadmap

[x] Support projects not using Typescript
[x] Allow custom directory for storing models
[ ] Config files as alternative to model download cli
[ ] Remove path, shelljs and prompt-sync package for browser, react-native expo, and webassembly compatibility
[ ] fluent-ffmpeg to support more audio formats
[ ] Pyanote diarization for speaker names
[ ] Implement WhisperX as optional alternative model for diarization and higher precision timestamps (as alternative to C++ version)
[ ] Add option for viewing detected langauge as described in Issue 16
[ ] Include typescript typescript types in d.ts file
[x] Add support for language option
[ ] Add support for transcribing audio streams as already implemented in whisper.cpp

Modifying whisper-node

npm run dev - runs nodemon and tsc on '/src/test.ts'

npm run build - runs tsc, outputs to '/dist' and gives sh permission to 'dist/download.js'

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

whisper-node

Features

Installation

Usage

Output (JSON)

Usage with Additional Options

Made with

Roadmap

Modifying whisper-node

Acknowledgements