npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@cartesia/cartesia-js

v1.3.0

Published

Client for the Cartesia API.

Downloads

6,452

Readme

Cartesia JavaScript Client

NPM Version Discord

This client provides convenient access to Cartesia's TTS models. Sonic is the fastest text-to-speech model around—it can generate a second of audio in just 650ms, and it can stream out the first audio chunk in just 135ms. Alongside Sonic, we also offer an extensive prebuilt voice library for a variety of use cases.

The JavaScript client is a thin wrapper around the Cartesia API. You can view docs for the API at docs.cartesia.ai.

Installation

# NPM
npm install @cartesia/cartesia-js
# Yarn
yarn add @cartesia/cartesia-js
# PNPM
pnpm add @cartesia/cartesia-js
# Bun
bun add @cartesia/cartesia-js

Usage

CRUD on Voices

import Cartesia from "@cartesia/cartesia-js";

const cartesia = new Cartesia({
	apiKey: "your-api-key",
});

// List all voices.
const voices = await cartesia.voices.list();
console.log(voices);

// Get a voice.
const voice = await cartesia.voices.get("<voice-id>");
console.log(voice);

// Clone a voice from a file.
const clonedVoiceEmbedding = await cartesia.voices.clone({
	mode: "clip",
	clip: myFile, // Pass a File object or a Blob.
});

// Mix voices together.
const mixedVoiceEmbedding = await cartesia.voices.mix({
	voices: [{ id: "<voice-id-1>", weight: 0.6 }, { id: "<voice-id-2>", weight: 0.4 }],
});

// Localize a voice.
const localizedVoiceEmbedding = await cartesia.voices.localize({
	embedding: Array(192).fill(1.0),
	original_speaker_gender: "female",
	language: "es",
});

// Create a voice.
const newVoice = await cartesia.voices.create({
	name: "Tim",
	description: "A deep, resonant voice.",
	embedding: Array(192).fill(1.0),
});
console.log(newVoice);

TTS over WebSocket

import Cartesia from "@cartesia/cartesia-js";

const cartesia = new Cartesia({
	apiKey: "your-api-key",
});

// Initialize the WebSocket. Make sure the output format you specify is supported.
const websocket = cartesia.tts.websocket({
	container: "raw",
	encoding: "pcm_f32le",
	sampleRate: 44100,
});

try {
	await websocket.connect({
		// If using Node.js, you can pass a custom WebSocket constructor, such as from `ws`.
		// This is not needed for browser usage, so you can call connect() without any arguments.
		WebSocket: WS,
	});
} catch (error) {
	console.error(`Failed to connect to Cartesia: ${error}`);
}

// Create a stream.
const response = await websocket.send({
	model_id: "sonic-english",
	voice: {
		mode: "id",
		id: "a0e99841-438c-4a64-b679-ae501e7d6091",
	},
	transcript: "Hello, world!"
	// The WebSocket sets output_format on your behalf.
});

// Access the raw messages from the WebSocket.
response.on("message", (message) => {
	// Raw message.
	console.log("Received message:", message);
});

// You can also access messages using a for-await-of loop.
for await (const message of response.events('message')) {
	// Raw message.
	console.log("Received message:", message);
}

Input Streaming with Contexts

const contextOptions = {
	context_id: "my-context",
	model_id: "sonic-english",
	voice: {
		mode: "id",
		id: "a0e99841-438c-4a64-b679-ae501e7d6091",
	},
}

// Initial request on the context uses websocket.send().
// This response object will aggregate the results of all the inputs sent on the context.
const response = await websocket.send({
	...contextOptions,
	transcript: "Hello, world!",
});

// Subsequent requests on the same context use websocket.continue().
await websocket.continue({
	...contextOptions,
	transcript: " How are you today?",
});

See the input streaming docs for more information.

Timestamps

To receive timestamps in responses, set the add_timestamps field in the request object to true.

const response = await websocket.send({
	model_id: "sonic-english",
	voice: {
		mode: "id",
		id: "a0e99841-438c-4a64-b679-ae501e7d6091",
	},
	transcript: "Hello, world!",
	add_timestamps: true,
});

You can then listen for timestamps on the returned response object.

response.on("timestamps", (timestamps) => {
	console.log("Received timestamps for words:", timestamps.words);
	console.log("Words start at:", timestamps.start);
	console.log("Words end at:", timestamps.end);
});

// You can also access timestamps using a for-await-of loop.
for (await const timestamps of response.events('timestamps')) {
	console.log("Received timestamps for words:", timestamps.words);
	console.log("Words start at:", timestamps.start);
	console.log("Words end at:", timestamps.end);
}

Speed and emotion controls [Alpha]

The API has experimental support for speed and emotion controls that is not subject to semantic versioning and is subject to change without notice. You can control the speed and emotion of the synthesized speech by setting the speed and emotion fields under voice.__experimental_controls in the request object.

const response = await websocket.send({
	model_id: "sonic-english",
	voice: {
		mode: "id",
		id: "a0e99841-438c-4a64-b679-ae501e7d6091",
		__experimental_controls: {
			speed: "fastest",
			emotion: ["sadness", "surprise:high"],
		},
	},
	transcript: "Hello, world!",
});

Multilingual TTS [Alpha]

You can define the language of the text you want to synthesize by setting the language field in the request object. Make sure that you are using model_id: "sonic-multilingual" in the request object.

Supported languages are listed at docs.cartesia.ai.

Playing audio in the browser

(The WebPlayer class only supports playing audio in the browser and the raw PCM format with fp32le encoding.)

// If you're using the client in the browser, you can control audio playback using our WebPlayer:
import { WebPlayer } from "@cartesia/cartesia-js";

console.log("Playing stream...");

// Create a Player object.
const player = new WebPlayer();

// Play the audio. (`response` includes a custom Source object that the Player can play.)
// The call resolves when the audio finishes playing.
await player.play(response.source);

console.log("Done playing.");

React

We export a React hook that simplifies the process of using the TTS API. The hook manages the WebSocket connection and provides a simple interface for buffering, playing, pausing and restarting audio.

import { useTTS } from '@cartesia/cartesia-js/react';

function TextToSpeech() {
	const tts = useTTS({
		apiKey: "your-api-key",
		sampleRate: 44100,
	})

	const [text, setText] = useState("");

	const handlePlay = async () => {
		// Begin buffering the audio.
		const response = await tts.buffer({
			model_id: "sonic-english",
			voice: {
        		mode: "id",
        		id: "a0e99841-438c-4a64-b679-ae501e7d6091",
        	},
			transcript: text,
		});

		// Immediately play the audio. (You can also buffer in advance and play later.)
		await tts.play();
	}

	return (
		<div>
			<input type="text" value={text} onChange={(event) => setText(event.target.value)} />
			<button onClick={handlePlay}>Play</button>

			<div>
				{tts.playbackStatus} | {tts.bufferStatus} | {tts.isWaiting}
			</div>
		</div>
	);
}