npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

ai-text-to-speech

v1.0.5

Published

A powerful and straightforward Node.js module for generating AI speech audio files from text using the OpenAI API (support for other providers is in the works). ai-text-to-speech offers a simple and robust interface to convert text into high-quality speec

Downloads

36

Readme

ai-text-to-speech

Simplify your AI text-to-speech integration!

A powerful and straightforward Node.js module for generating speech audio from text using the OpenAI API (support for other TTS providers in the works). ai-text-to-speech offers a simple and robust interface to convert text into high-quality speech audio files in various formats and voices.

Developed by Jerry Kapron for everyone to use freely 👍🏼
☕️ Buy me a coffee

Table of Contents

Features

  • Easy Integration: Seamlessly integrate text-to-speech functionality into your Node.js applications.
  • Multiple Voices: Choose from a variety of high-quality voices to suit your application's needs.
  • Flexible Output Formats: Supports various audio formats like MP3, WAV, FLAC, and more.
  • Customizable File Naming: Control the output file naming with suffix options to prevent overwrites.
  • Robust Error Handling: Comprehensive validation and descriptive error messages for easy debugging.

Installation

Install ai-text-to-speech via NPM:

npm install ai-text-to-speech

Usage

How to load the module in your code

// Use this import statement if your project supports ES Modules
import aiSpeech from 'ai-text-to-speech';

OR

// Use this require statement if your project uses CommonJS modules
const aiSpeech = require('ai-text-to-speech');

Basic Example with async/await

(async () => { // or nested inside another async function

  try {
    const audioFilePath = await aiSpeech({
      input: 'Buy me a coffee if it works for you.'
      // If the OPENAI_API_KEY environment variable is already set,
      // you don't have to specify the api_key option
    });
    console.log(`Audio file saved at: ${audioFilePath}`);
  } catch (error) {
    console.error('Error generating speech audio:', error.message);
  }

})();

Basic Example with Promise .then()/.catch()

// This approach can be useful if you prefer working with promises directly
// or if you're in an environment where async/await is not supported.

aiSpeech({
  input: 'Buy me a coffee if it works for you.',
  // You can explicitly provide your OpenAI API key here
  api_key: 'YOUR_OPENAI_API_KEY', // if process.env.OPENAI_API_KEY is not set
})
.then((audioFilePath) => {
    console.log(`Audio file saved at: ${audioFilePath}`);
})
.catch((error) => {
    console.error('Error generating speech audio:', error.message);
});

Advanced Usage with async/await

(async () => { // or nested inside another async function

  try {
    const audioFilePath = await aiSpeech({
      input: 'Buy me a coffee if it works for you.',
      dest_dir: './audio',
      file_name: 'welcome-message',
      voice: 'echo',
      model: 'tts-1-hd',
      response_format: 'wav',
      suffix_type: 'nano',
      api_key: 'YOUR_OPENAI_API_KEY', // if process.env.OPENAI_API_KEY is not set
    });
    console.log(`Audio file saved at: ${audioFilePath}`);
  } catch (error) {
    console.error('Error generating speech audio:', error.message);
  }

})();

Advanced Usage with Promise .then()/.catch()

// This approach can be useful if you prefer working with promises directly
// or if you're in an environment where async/await is not supported.

aiSpeech({
    input: 'Buy me a coffee if it works for you.',
    dest_dir: './audio',
    file_name: 'welcome-message',
    voice: 'echo',
    model: 'tts-1-hd',
    response_format: 'wav',
    suffix_type: 'nano',
    api_key: 'YOUR_OPENAI_API_KEY', // if process.env.OPENAI_API_KEY is not set
})
.then((audioFilePath) => {
    console.log(`Audio file saved at: ${audioFilePath}`);
})
.catch((error) => {
    console.error('Error generating speech audio:', error.message);
});

Options

input (string, required)

The text to generate audio for. Maximum length is 4096 characters.

dest_dir (string, optional)

The destination directory to save the audio file. Default: './' (current directory).

file_name (string, optional)

The base name of the output file. Default: 'speech-audio'.

voice (string, optional)

The voice to use for speech synthesis. Default: 'nova'.

model (string, optional)

The TTS model to use. Default: 'tts-1'.

response_format (string, optional)

The audio format for the output file. Default: 'mp3'.

suffix_type (string, optional)

The type of unique suffix used in the file name. Default: 'uuid'.

api_key (string, optional)

Your OpenAI API key. Default: The value of the OPENAI_API_KEY environment variable.

Allowed Values

Voices

  • alloy
  • echo
  • fable
  • onyx
  • nova (default)
  • shimmer

Models

  • tts-1 (default)
  • tts-1-hd

Response Formats

  • mp3 (default)
  • opus
  • aac
  • flac
  • wav
  • pcm

Suffix Types

  • uuid (default): A unique UUID string.
  • milli: Timestamp in milliseconds.
  • micro: Timestamp in microseconds.
  • nano: Timestamp in nanoseconds.
  • none: No suffix. Warning: May overwrite existing files if filenames collide.

Edge Cases and Warnings

  • Input Length: The input text must not exceed 4096 characters. Exceeding this limit will result in an error.
  • File Overwrite Risk: Using suffix_type: 'none' without specifying a unique file_name may lead to overwriting existing files.
  • Directory Permissions: Ensure the dest_dir exists and the application has write permissions. The module will throw an error if it cannot write to the directory.
  • API Key Requirement: An OpenAI API key is required. Set it via the api_key option or the OPENAI_API_KEY environment variable.
  • Network Errors: Network issues or incorrect API endpoints will result in errors. Ensure you have a stable internet connection.
  • Unsupported Values: Providing unsupported values for voice, model, response_format, or suffix_type will result in an error.

License

This project is licensed under the MIT License.

Acknowledgements

  • OpenAI for providing the API used in this module.
  • Node.js community for their continuous support.

For more information on voice options, see the OpenAI Text-to-Speech Voice Options.

Audio Format Descriptions:

  • Opus: Ideal for internet streaming and communication due to low latency.
  • AAC: Preferred for digital audio compression; widely used on YouTube, Android, and iOS.
  • FLAC: Suitable for lossless audio compression; favored by audio enthusiasts for archiving.
  • WAV: Uncompressed audio, suitable for applications requiring minimal decoding overhead.
  • PCM: Raw audio samples in 24kHz (16-bit signed, little-endian), without headers.

Note: Ensure compliance with OpenAI's usage policies when integrating this module into your applications.


☕️ Buy me a coffee


Back to top