whisper-node-server

v1.0.0

Published

2 months ago

Local audio transcription on CPU. Node.js bindings for OpenAI's Whisper. Modified from node-whisper

Downloads

0High
0Medium
0Low

OpenAI Whisper CPP C++Bindings Transcribe Transcriber Transcript Transcription Audio Speech Speech-to-Text STT TTS SRT Diarization local

whisper-node-server

Node.js bindings for OpenAI's Whisper. Transcription done local.

Features

Output transcripts to JSON (also .txt .srt .vtt)
Optimized for CPU (Including Apple Silicon ARM)
Timestamp precision to single word
Server mode with automatic audio conversion
Optional CUDA support for GPU acceleration

Installation

Add dependency to project

npm install whisper-node-server

Download whisper model of choice [OPTIONAL]

npx whisper-node-server download

Build whisper.cpp

Windows

use w64devkit and cmake

Usage

Direct Usage

import whisper from 'whisper-node-server';

const transcript = await whisper("example/sample.wav");

console.log(transcript); // output: [ {start,end,speech} ]

Server Mode

Set up environment variables:

WHISPER_MODEL=base.en
AUDIO_SAMPLE_RATE=16000
AUDIO_CHANNELS=1

Create the server:

import express from 'express';
import multer from 'multer';
import whisper from 'whisper-node-server';
import { exec } from 'child_process';
import { promisify } from 'util';
import fs from 'fs';

const app = express();
const upload = multer({ dest: 'uploads/' });
const execPromise = promisify(exec);

// Transcribe endpoint
app.post('/transcribe', upload.single('audio'), async (req, res) => {
  try {
    if (!req.file) {
      return res.status(400).send('No audio file uploaded');
    }

    const inputPath = req.file.path;
    const outputPath = inputPath.replace(/\.wav$/, '_converted.wav');

    // Convert audio to configured sample rate using FFmpeg
    await execPromise(`ffmpeg -y -i "${inputPath}" -ar ${process.env.AUDIO_SAMPLE_RATE} -ac ${process.env.AUDIO_CHANNELS} -c:a pcm_s16le "${outputPath}"`);

    // Transcribe the audio
    const options = {
      modelName: process.env.WHISPER_MODEL,
      whisperOptions: {
        language: 'auto',
        word_timestamps: true
      }
    };

    const transcript = await whisper(outputPath, options);

    // Clean up temp files
    fs.unlinkSync(inputPath);
    fs.unlinkSync(outputPath);

    // Extract speech text
    const text = transcript ? (Array.isArray(transcript) ? 
      transcript.map(t => t.speech).join(' ') : 
      transcript.toString()) : '';
      
    res.json({ text });

  } catch (error) {
    console.error('Transcription error:', error);
    res.status(500).send('Error processing audio: ' + error.message);
  }
});

app.listen(8080, () => {
  console.log('Server running on port 8080');
});

Send audio for transcription:

// Convert your audio to a blob
const wavBlob = await float32ArrayToWav(audio);
const formData = new FormData();
formData.append('audio', wavBlob, 'recording.wav');

// Send to server
const response = await fetch('http://localhost:8080/transcribe', {
  method: 'POST',
  body: formData,
});

if (!response.ok) {
  throw new Error('Transcription failed');
}

const data = await response.json();
console.log('Transcription:', data.text);

Output (JSON)

[
  {
    "start":  "00:00:14.310", // time stamp begin
    "end":    "00:00:16.480", // time stamp end
    "speech": "howdy"         // transcription
  }
]

Full Options List

import whisper from 'whisper-node-server';

const filePath = "example/sample.wav"; // required

const options = {
  modelName: "base.en",       // default
  // modelPath: "/custom/path/to/model.bin", // use model in a custom directory (cannot use along with 'modelName')
  whisperOptions: {
    language: 'auto'          // default (use 'auto' for auto detect)
    gen_file_txt: false,      // outputs .txt file
    gen_file_subtitle: false, // outputs .srt file
    gen_file_vtt: false,      // outputs .vtt file
    word_timestamps: true     // timestamp for every word
    // timestamp_size: 0      // cannot use along with word_timestamps:true
  }
}

const transcript = await whisper(filePath, options);

Input File Format

Files must be .wav and 16Hz

Example .mp3 file converted with an FFmpeg command: ffmpeg -i input.mp3 -ar 16000 output.wav

Made with

Modifying whisper-node-server

npm run dev - runs nodemon and tsc on '/src/test.ts'

npm run build - runs tsc, outputs to '/dist' and gives sh permission to 'dist/download.js'

Published

Vulnerabilities

Links

Maintainers

Keywords

Readme

whisper-node-server

Features

Installation

Windows

Usage

Direct Usage

Server Mode

Output (JSON)

Full Options List

Input File Format

Made with

Modifying whisper-node-server

Acknowledgements