audio-sentence-detector
v1.0.5
Published
Advanced audio sentence detection using signal processing and voice activity detection
Downloads
394
Maintainers
Readme
Audio Sentence Detector
An advanced audio sentence detection library that uses voice activity detection, silence analysis, and acoustic features to segment audio into sentences.
Installation
npm install audio-sentence-detector
Usage
const AudioSentenceDetector = require('audio-sentence-detector');
// Create detector with custom options
const detector = new AudioSentenceDetector({
minSilenceDuration: 0.5,
silenceThreshold: 0.01
});
// Process audio buffer
const sentences = await detector.detect(audioBuffer);
Configuration Options
The AudioSentenceDetector constructor accepts an options object with the following parameters:
Basic Sentence Detection Options
| Option | Default | Description |
|--------|---------|-------------|
| minSilenceDuration
| 0.5
| Minimum duration of silence (in seconds) to be considered a sentence boundary |
| silenceThreshold
| 0.01
| RMS threshold below which audio is considered silence |
| minSentenceLength
| 1
| Minimum length of a sentence in seconds |
| maxSentenceLength
| 15
| Maximum length of a sentence in seconds |
| windowSize
| 2048
| Size of the analysis window in samples |
| idealSentenceLength
| 5
| Ideal length of a sentence in seconds (used for probability calculations) |
| idealSilenceDuration
| 0.8
| Ideal duration of silence between sentences |
| allowGaps
| true
| Whether to allow gaps between sentences |
| minSegmentLength
| 0
| Minimum length for merged segments |
| alignToAudioBoundaries
| false
| Whether to align sentences with audio file boundaries |
Voice Detection Options
| Option | Default | Description |
|--------|---------|-------------|
| fundamentalFreqMin
| 85
| Minimum fundamental frequency for voice detection (Hz) |
| fundamentalFreqMax
| 255
| Maximum fundamental frequency for voice detection (Hz) |
| voiceActivityThreshold
| 0.4
| Threshold for voice activity detection |
| minVoiceActivityDuration
| 0.1
| Minimum duration of voice activity (seconds) |
| energySmoothing
| 0.95
| Smoothing factor for energy calculations |
| formantEmphasis
| 0.7
| Emphasis factor for formant detection |
| zeroCrossingRateThreshold
| 0.3
| Threshold for zero-crossing rate in voice detection |
Debug Option
| Option | Default | Description |
|--------|---------|-------------|
| debug
| false
| Enable debug logging |
Return Value
The detect()
method returns an array of sentence objects, each containing:
{
index: number, // Index of the sentence
start: number, // Start time in seconds
end: number, // End time in seconds
duration: number, // Duration in seconds
probability: number // Confidence score (0-1)
}
Example
const AudioSentenceDetector = require('audio-sentence-detector');
// Create detector with custom settings
const detector = new AudioSentenceDetector({
minSilenceDuration: 0.3,
silenceThreshold: 0.02,
minSentenceLength: 1.5,
maxSentenceLength: 10,
debug: true
});
// Process audio file
const fs = require('fs');
const audioBuffer = fs.readFileSync('speech.wav');
try {
const sentences = await detector.detect(audioBuffer);
console.log('Detected sentences:', sentences);
} catch (error) {
console.error('Error processing audio:', error);
}
License
MIT