@maia-id/maleo
v1.1.2
Published
A JavaScript library for speaker diarization - the process of partitioning an audio stream into segments according to speaker identity.
Downloads
137
Readme
MALEO: Multi Platform Speaker Diarization
A JavaScript library for speaker diarization - the process of partitioning an audio stream into segments according to speaker identity.
Features
- Audio preprocessing with customizable options
- CPU, GPU, WebGPU, and WASM support
- Progress tracking during inference
- Flexible audio input handling
- Silence removal and audio normalization capabilities
Prerequisites
GPU Support
If you plan to use GPU acceleration, ensure you have the required CUDA libraries installed:
libcublasLt.so.12
For CUDA installation instructions, refer to the NVIDIA cuDNN Installation Guide.
Installation
npm install @maia-id/maleo
Usage
Basic Example
import { SpeakerDiarization } from 'speaker-diarization';
// Example usage
const example = async () => {
const speakerDiarization = new SpeakerDiarization();
const result = await speakerDiarization.inference({
audio: './examples/audio.wav', // File path for Node.js
language: 'en',
device: 'cpu', // Device support : 'cpu', 'cuda', 'webgpu', or 'wasm'
audioOptions: {
targetSampleRate: 16000,
normalizeAudio: true,
removeSilence: true,
silenceThreshold: -50,
},
progress_callback: (progress) => console.log('Progress:', progress)
});
console.table(result.segments);
};
example();
Running the Example
node examples/inference.js
Configuration Options
Audio Options
| Option | Type | Default | Description | |--------|------|---------|-------------| | targetSampleRate | number | 16000 | Target sample rate for audio processing | | normalizeAudio | boolean | true | Whether to normalize audio amplitude | | removeSilence | boolean | true | Whether to remove silence segments | | silenceThreshold | number | -50 | Threshold (in dB) for silence detection |
Inference Options
| Option | Type | Description | |--------|------|-------------| | audio | string | Path to the audio file | | device | 'cpu' | 'cuda' | 'webgpu' | 'wasm' | Processing device to use | | progress_callback | function | Callback for tracking progress |
Output Format
The inference method returns a result object containing segments with the following structure:
interface Segment {
start: number; // Start time in seconds
end: number; // End time in seconds
speaker: string; // Speaker identifier
confidence: number; // Confidence score
}
Citation
If you use this library in your research, please cite:
@inproceedings{irawan2025cross,
title = {Cross-Platform Speaker Diarization: Evaluating the Scalability of Maleo},
author = {Eka Tresna Irawan and Ardi Mardiana and Dedy Hariyadi and I Putu Agus Eka Pratama},
booktitle = {International Conference on Discoveries in Applied Sciences & Advanced Technology 2025},
year = {2025}
}
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Acknowledgments
- NVIDIA for CUDA support