npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@picovoice/orca-node

v1.0.0

Published

Picovoice Orca Node.js binding

Downloads

322

Readme

Orca Binding for Node.js

Orca Streaming Text-to-Speech Engine

Made in Vancouver, Canada by Picovoice

Orca is an on-device streaming text-to-speech engine that is designed for use with LLMs, enabling zero-latency voice assistants. Orca is:

  • Private; All voice processing runs locally.
  • Cross-Platform:
    • Linux (x86_64), macOS (x86_64, arm64), and Windows (x86_64)
    • Android and iOS
    • Chrome, Safari, Firefox, and Edge
    • Raspberry Pi (3, 4, 5)

Compatibility

  • Node.js 16+
  • Runs on Linux (x86_64), macOS (x86_64, arm64), Windows (x86_64), and Raspberry Pi (3, 4, 5).

Installation

npm install @picovoice/orca-node

AccessKey

Orca requires a valid Picovoice AccessKey at initialization. AccessKey acts as your credentials when using Orca SDKs. You can get your AccessKey for free. Make sure to keep your AccessKey secret. Signup or Login to Picovoice Console to get your AccessKey.

Usage

Orca supports two modes of operation: streaming and single synthesis. In the streaming synthesis mode, Orca processes an incoming text stream in real-time and generates audio in parallel. In the single synthesis mode, the complete text needs to be known in advance and is synthesized in a single call to the Orca engine.

Create an instance of the Orca engine:

const { Orca } = require("@picovoice/orca-node");

const accessKey = "${ACCESS_KEY}"; // Obtained from the Picovoice Console (https://console.picovoice.ai/)

const orca = new Orca(accessKey);

Replace ${ACCESS_KEY} with your AccessKey obtained from Picovoice Console.

To synthesize a text stream, create an OrcaStream object and add text to it one-by-one:

const stream = orca.streamOpen();

for (const textChunk of textGenerator()) {
  const pcm = stream.synthesize(textChunk);
  if (pcm !== null) {
    // handle pcm
  }
}

The textGenerator() function can be any stream generating text, for example an LLM response. The OrcaStream object buffers input text until there is enough to generate audio. If there is not enough text to generate audio, null is returned.

To ensure smooth transitions between chunks, the stream.synthesize() function returns an audio chunk that only includes the audio for a portion of the text that has been added.

When done, call flush to synthesize any remaining text, and close to delete the OrcaStream object.

const flushedPcm = stream.flush();
if (flushedPcm !== null) {
  // handle flushed pcm
}

stream.close()

If the complete text is known before synthesis, single synthesis mode can be used to generate speech in a single call to Orca:

const result = orca.synthesize("${TEXT}");

const alignments = orca.synthesizeToFile("${TEXT}", "${OUTPUT_PATH}");

Replace ${TEXT} with the text to be synthesized and ${OUTPUT_PATH} with the path to save the generated audio as a single-channel 16-bit PCM WAV file. In single synthesis mode, Orca returns metadata of the synthesized audio in the form of a list of OrcaAlignment objects.

When done make sure to explicitly release the resources using:

orca.release()

Text input

Orca accepts the 26 lowercase (a-z) and 26 uppercase (A-Z) letters of the English alphabet, numbers, basic symbols, as well as common punctuation marks. You can get a list of all supported characters by calling validCharacters(). Pronunciations of characters or words not supported by this list can be achieved with custom pronunciations.

Custom pronunciations

Orca allows to embed custom pronunciations in the text via the syntax: {word|pronunciation}.
The pronunciation is expressed in ARPAbet phonemes, for example:

  • "This is a {custom|K AH S T AH M} pronunciation"
  • "{read|R IY D} this as {read|R EH D}, please."
  • "I {live|L IH V} in {Sevilla|S EH V IY Y AH}. We have great {live|L AY V} sports!"

Voices

Orca can synthesize speech with various voices, each of which is characterized by a model file located in lib/common. To create an instance of the engine with a specific voice, use:

const orca = new Orca(accessKey, { modelPath: "${MODEL_PATH}" });

and replace ${MODEL_PATH} with the path to the model file with the desired voice.

Speech control

Orca allows for keyword arguments to control the synthesized speech. They can be provided to the streamOpen method or the single synthesis methods synthesize and synthesizeToFile:

  • speechRate: Controls the speed of the generated speech. Valid values are within [0.7, 1.3]. A higher (lower) value produces speech that is faster (slower). The default is 1.0.
  • randomState: Sets the random state for sampling during synthesis. This can be used to ensure that the synthesized speech is deterministic across different runs. Valid values are all non-negative integers. If not provided, a random seed will be chosen and the synthesis process will be non-deterministic.
const synthesizeParams = {
  speechRate: 1.3,
  randomState: 42,
};

// Streaming synthesis
const OrcaStream = await orca.streamOpen(synthesizeParams);

// Single synthesis
const result = await orca.synthesize("${TEXT}", synthesizeParams);
const alignments = await orca.synthesizeToFile("${TEXT}", "${OUTPUT_PATH}", synthesizeParams);

Orca properties

To obtain the set of valid characters, call orca.validCharacters. To retrieve the maximum number of characters allowed, call orca.maxCharacterLimit. The sample rate of Orca is orca.sampleRate.

Alignment Metadata

Along with the raw PCM or saved audio file, Orca returns metadata for the synthesized audio in single synthesis mode. The OrcaAlignment object has the following properties:

  • Word: String representation of the word.
  • Start Time: Indicates when the word started in the synthesized audio. Value is in seconds.
  • End Time: Indicates when the word ended in the synthesized audio. Value is in seconds.
  • Phonemes: An array of OrcaPhoneme objects.

The OrcaPhoneme object has the following properties:

  • Phoneme: String representation of the phoneme.
  • Start Time: Indicates when the phoneme started in the synthesized audio. Value is in seconds.
  • End Time: Indicates when the phoneme ended in the synthesized audio. Value is in seconds.

Demos

Orca Node.js demo package provides command-line utilities for processing audio using Orca.