ai-text-to-speech
v1.0.5
Published
A powerful and straightforward Node.js module for generating AI speech audio files from text using the OpenAI API (support for other providers is in the works). ai-text-to-speech offers a simple and robust interface to convert text into high-quality speec
Downloads
36
Maintainers
Keywords
Readme
ai-text-to-speech
Simplify your AI text-to-speech integration!
A powerful and straightforward Node.js module for generating speech audio from text using the OpenAI API (support for other TTS providers in the works). ai-text-to-speech offers a simple and robust interface to convert text into high-quality speech audio files in various formats and voices.
Developed by Jerry Kapron for everyone to use freely 👍🏼
☕️ Buy me a coffee
Table of Contents
- ai-text-to-speech
Features
- Easy Integration: Seamlessly integrate text-to-speech functionality into your Node.js applications.
- Multiple Voices: Choose from a variety of high-quality voices to suit your application's needs.
- Flexible Output Formats: Supports various audio formats like MP3, WAV, FLAC, and more.
- Customizable File Naming: Control the output file naming with suffix options to prevent overwrites.
- Robust Error Handling: Comprehensive validation and descriptive error messages for easy debugging.
Installation
Install ai-text-to-speech via NPM:
npm install ai-text-to-speech
Usage
How to load the module in your code
// Use this import statement if your project supports ES Modules
import aiSpeech from 'ai-text-to-speech';
OR
// Use this require statement if your project uses CommonJS modules
const aiSpeech = require('ai-text-to-speech');
Basic Example with async/await
(async () => { // or nested inside another async function
try {
const audioFilePath = await aiSpeech({
input: 'Buy me a coffee if it works for you.'
// If the OPENAI_API_KEY environment variable is already set,
// you don't have to specify the api_key option
});
console.log(`Audio file saved at: ${audioFilePath}`);
} catch (error) {
console.error('Error generating speech audio:', error.message);
}
})();
Basic Example with Promise .then()/.catch()
// This approach can be useful if you prefer working with promises directly
// or if you're in an environment where async/await is not supported.
aiSpeech({
input: 'Buy me a coffee if it works for you.',
// You can explicitly provide your OpenAI API key here
api_key: 'YOUR_OPENAI_API_KEY', // if process.env.OPENAI_API_KEY is not set
})
.then((audioFilePath) => {
console.log(`Audio file saved at: ${audioFilePath}`);
})
.catch((error) => {
console.error('Error generating speech audio:', error.message);
});
Advanced Usage with async/await
(async () => { // or nested inside another async function
try {
const audioFilePath = await aiSpeech({
input: 'Buy me a coffee if it works for you.',
dest_dir: './audio',
file_name: 'welcome-message',
voice: 'echo',
model: 'tts-1-hd',
response_format: 'wav',
suffix_type: 'nano',
api_key: 'YOUR_OPENAI_API_KEY', // if process.env.OPENAI_API_KEY is not set
});
console.log(`Audio file saved at: ${audioFilePath}`);
} catch (error) {
console.error('Error generating speech audio:', error.message);
}
})();
Advanced Usage with Promise .then()/.catch()
// This approach can be useful if you prefer working with promises directly
// or if you're in an environment where async/await is not supported.
aiSpeech({
input: 'Buy me a coffee if it works for you.',
dest_dir: './audio',
file_name: 'welcome-message',
voice: 'echo',
model: 'tts-1-hd',
response_format: 'wav',
suffix_type: 'nano',
api_key: 'YOUR_OPENAI_API_KEY', // if process.env.OPENAI_API_KEY is not set
})
.then((audioFilePath) => {
console.log(`Audio file saved at: ${audioFilePath}`);
})
.catch((error) => {
console.error('Error generating speech audio:', error.message);
});
Options
input
(string, required)
The text to generate audio for. Maximum length is 4096 characters.
dest_dir
(string, optional)
The destination directory to save the audio file. Default: './'
(current directory).
file_name
(string, optional)
The base name of the output file. Default: 'speech-audio'
.
voice
(string, optional)
The voice to use for speech synthesis. Default: 'nova'
.
model
(string, optional)
The TTS model to use. Default: 'tts-1'
.
response_format
(string, optional)
The audio format for the output file. Default: 'mp3'
.
suffix_type
(string, optional)
The type of unique suffix used in the file name. Default: 'uuid'
.
api_key
(string, optional)
Your OpenAI API key. Default: The value of the OPENAI_API_KEY
environment variable.
Allowed Values
Voices
alloy
echo
fable
onyx
nova
(default)shimmer
Models
tts-1
(default)tts-1-hd
Response Formats
mp3
(default)opus
aac
flac
wav
pcm
Suffix Types
uuid
(default): A unique UUID string.milli
: Timestamp in milliseconds.micro
: Timestamp in microseconds.nano
: Timestamp in nanoseconds.none
: No suffix. Warning: May overwrite existing files if filenames collide.
Edge Cases and Warnings
- Input Length: The
input
text must not exceed 4096 characters. Exceeding this limit will result in an error. - File Overwrite Risk: Using
suffix_type: 'none'
without specifying a uniquefile_name
may lead to overwriting existing files. - Directory Permissions: Ensure the
dest_dir
exists and the application has write permissions. The module will throw an error if it cannot write to the directory. - API Key Requirement: An OpenAI API key is required. Set it via the
api_key
option or theOPENAI_API_KEY
environment variable. - Network Errors: Network issues or incorrect API endpoints will result in errors. Ensure you have a stable internet connection.
- Unsupported Values: Providing unsupported values for
voice
,model
,response_format
, orsuffix_type
will result in an error.
License
This project is licensed under the MIT License.
Acknowledgements
For more information on voice options, see the OpenAI Text-to-Speech Voice Options.
Audio Format Descriptions:
- Opus: Ideal for internet streaming and communication due to low latency.
- AAC: Preferred for digital audio compression; widely used on YouTube, Android, and iOS.
- FLAC: Suitable for lossless audio compression; favored by audio enthusiasts for archiving.
- WAV: Uncompressed audio, suitable for applications requiring minimal decoding overhead.
- PCM: Raw audio samples in 24kHz (16-bit signed, little-endian), without headers.
Note: Ensure compliance with OpenAI's usage policies when integrating this module into your applications.