npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@koi-rtc/speech-sdk

v1.0.3

Published

统一的语音服务SDK,支持多个云服务商的ASR和TTS服务

Downloads

224

Readme

Speech SDK

一个统一的语音服务 SDK,支持多个主流云服务商的语音识别(ASR)和语音合成(TTS)功能。

特性

  • 支持多个主流云服务商
    • 微软 Azure 语音服务
    • 腾讯云语音服务
    • 阿里云智能语音服务
    • Google Cloud 语音服务
  • 统一的 API 接口
  • 支持语音转文字(ASR)和文字转语音(TTS)
  • 支持实时语音识别
  • 内置缓存机制
  • TypeScript 支持

安装

npm install @koi-rtc/speech-sdk
# 或
yarn add @koi-rtc/speech-sdk

快速开始

基础用法

const { SpeechFactory } = require('@koi-rtc/speech-sdk');

async function example() {
  // 初始化语音服务
  const factory = new SpeechFactory({
    key: 'your-api-key',
    region: 'your-region'
  });
  
  // 选择服务提供商(microsoft/tencent/aliyun/google)
  const provider = factory.initialize('microsoft');
  
  // 语音转文字
  const text = await provider.speechToText('audio.wav');
  console.log('识别结果:', text);
  
  // 文字转语音
  const audioBuffer = await provider.textToSpeech('你好,世界!', {
    voice: 'zh-CN-XiaoxiaoNeural',
    speed: 1.0
  });
}

example().catch(console.error);

浏览器端语音服务

const { TTSService, ASRService } = require('speech-sdk');

// 初始化语音合成服务
const tts = new TTSService({
  language: 'zh-CN',
  pitch: 1,
  rate: 1,
  volume: 1
});

// 播放语音
tts.speak('你好,世界!');

// 初始化语音识别服务
const asr = new ASRService({
  language: 'zh-CN',
  continuous: true,
  interimResults: true
});

// 开始识别
asr.on('result', (result) => {
  console.log('识别结果:', result.transcript);
  console.log('是否最终结果:', result.isFinal);
});

asr.start();

配置说明

通用配置

{
  // 默认语音服务配置
  defaultProvider: 'microsoft',
  
  // 通用配置
  common: {
    language: 'zh-CN',
    sampleRate: 16000,
    encoding: 'LINEAR16',
  },
  
  // 缓存配置
  cache: {
    storage: 'memory', // 或 'redis'
    ttl: 3600, // 缓存过期时间(秒)
    redis: {
      host: 'localhost',
      port: 6379,
    }
  }
}

服务商特定配置

微软 Azure

{
  key: 'your-subscription-key',
  region: 'eastasia',
  // 可选配置
  language: 'zh-CN',
  voice: 'zh-CN-XiaoxiaoNeural'
}

腾讯云

{
  secretId: 'your-secret-id',
  secretKey: 'your-secret-key',
  region: 'ap-guangzhou'
}

阿里云

{
  accessKeyId: 'your-access-key-id',
  accessKeySecret: 'your-access-key-secret',
  endpoint: 'http://nls-meta.cn-shanghai.aliyuncs.com'
}

Google Cloud

{
  googleCredentials: 'path/to/credentials.json',
  languageCode: 'zh-CN'
}

API 文档

SpeechFactory

工厂类,用于创建和管理不同的语音服务提供商实例。

  • constructor(config): 创建工厂实例
  • initialize(providerName): 初始化指定的服务提供商
  • getProvider(): 获取当前的服务提供商实例

BaseProvider

所有服务提供商的基类,定义了统一的接口。

  • speechToText(audioFile): 将音频文件转换为文字
  • textToSpeech(text, options): 将文字转换为语音
  • startRealtimeSTT(options): 开始实时语音识别
  • stopRealtimeSTT(): 停止实时语音识别

TTSService

浏览器端的语音合成服务。

  • speak(text): 播放文字内容
  • pause(): 暂停播放
  • resume(): 恢复播放
  • stop(): 停止播放
  • on(event, callback): 注册事件监听器

ASRService

浏览器端的语音识别服务。

  • start(): 开始录音识别
  • stop(): 停止录音识别
  • on(event, callback): 注册事件监听器