whisper-node-server
v1.0.0
Published
Local audio transcription on CPU. Node.js bindings for OpenAI's Whisper. Modified from node-whisper
Downloads
67
Maintainers
Readme
whisper-node-server
Node.js bindings for OpenAI's Whisper. Transcription done local.
Features
- Output transcripts to JSON (also .txt .srt .vtt)
- Optimized for CPU (Including Apple Silicon ARM)
- Timestamp precision to single word
- Server mode with automatic audio conversion
- Optional CUDA support for GPU acceleration
Installation
- Add dependency to project
npm install whisper-node-server
- Download whisper model of choice [OPTIONAL]
npx whisper-node-server download
- Build whisper.cpp
Windows
use w64devkit and cmake
Usage
Direct Usage
import whisper from 'whisper-node-server';
const transcript = await whisper("example/sample.wav");
console.log(transcript); // output: [ {start,end,speech} ]
Server Mode
- Set up environment variables:
WHISPER_MODEL=base.en
AUDIO_SAMPLE_RATE=16000
AUDIO_CHANNELS=1
- Create the server:
import express from 'express';
import multer from 'multer';
import whisper from 'whisper-node-server';
import { exec } from 'child_process';
import { promisify } from 'util';
import fs from 'fs';
const app = express();
const upload = multer({ dest: 'uploads/' });
const execPromise = promisify(exec);
// Transcribe endpoint
app.post('/transcribe', upload.single('audio'), async (req, res) => {
try {
if (!req.file) {
return res.status(400).send('No audio file uploaded');
}
const inputPath = req.file.path;
const outputPath = inputPath.replace(/\.wav$/, '_converted.wav');
// Convert audio to configured sample rate using FFmpeg
await execPromise(`ffmpeg -y -i "${inputPath}" -ar ${process.env.AUDIO_SAMPLE_RATE} -ac ${process.env.AUDIO_CHANNELS} -c:a pcm_s16le "${outputPath}"`);
// Transcribe the audio
const options = {
modelName: process.env.WHISPER_MODEL,
whisperOptions: {
language: 'auto',
word_timestamps: true
}
};
const transcript = await whisper(outputPath, options);
// Clean up temp files
fs.unlinkSync(inputPath);
fs.unlinkSync(outputPath);
// Extract speech text
const text = transcript ? (Array.isArray(transcript) ?
transcript.map(t => t.speech).join(' ') :
transcript.toString()) : '';
res.json({ text });
} catch (error) {
console.error('Transcription error:', error);
res.status(500).send('Error processing audio: ' + error.message);
}
});
app.listen(8080, () => {
console.log('Server running on port 8080');
});
- Send audio for transcription:
// Convert your audio to a blob
const wavBlob = await float32ArrayToWav(audio);
const formData = new FormData();
formData.append('audio', wavBlob, 'recording.wav');
// Send to server
const response = await fetch('http://localhost:8080/transcribe', {
method: 'POST',
body: formData,
});
if (!response.ok) {
throw new Error('Transcription failed');
}
const data = await response.json();
console.log('Transcription:', data.text);
Output (JSON)
[
{
"start": "00:00:14.310", // time stamp begin
"end": "00:00:16.480", // time stamp end
"speech": "howdy" // transcription
}
]
Full Options List
import whisper from 'whisper-node-server';
const filePath = "example/sample.wav"; // required
const options = {
modelName: "base.en", // default
// modelPath: "/custom/path/to/model.bin", // use model in a custom directory (cannot use along with 'modelName')
whisperOptions: {
language: 'auto' // default (use 'auto' for auto detect)
gen_file_txt: false, // outputs .txt file
gen_file_subtitle: false, // outputs .srt file
gen_file_vtt: false, // outputs .vtt file
word_timestamps: true // timestamp for every word
// timestamp_size: 0 // cannot use along with word_timestamps:true
}
}
const transcript = await whisper(filePath, options);
Input File Format
Files must be .wav and 16Hz
Example .mp3 file converted with an FFmpeg command: ffmpeg -i input.mp3 -ar 16000 output.wav
Made with
Modifying whisper-node-server
npm run dev
- runs nodemon and tsc on '/src/test.ts'
npm run build
- runs tsc, outputs to '/dist' and gives sh permission to 'dist/download.js'