npm package discovery and stats viewer.

Discover Tips

  • General search

    [free text search, go nuts!]

  • Package details

    pkg:[package-name]

  • User packages

    @[username]

Sponsor

Optimize Toolset

I’ve always been into building performant and accessible sites, but lately I’ve been taking it extremely seriously. So much so that I’ve been building a tool to help me optimize and monitor the sites that I build to make sure that I’m making an attempt to offer the best experience to those who visit them. If you’re into performant, accessible and SEO friendly sites, you might like it too! You can check it out at Optimize Toolset.

About

Hi, 👋, I’m Ryan Hefner  and I built this site for me, and you! The goal of this site was to provide an easy way for me to check the stats on my npm packages, both for prioritizing issues and updates, and to give me a little kick in the pants to keep up on stuff.

As I was building it, I realized that I was actually using the tool to build the tool, and figured I might as well put this out there and hopefully others will find it to be a fast and useful way to search and browse npm packages as I have.

If you’re interested in other things I’m working on, follow me on Twitter or check out the open source projects I’ve been publishing on GitHub.

I am also working on a Twitter bot for this site to tweet the most popular, newest, random packages from npm. Please follow that account now and it will start sending out packages soon–ish.

Open Software & Tools

This site wouldn’t be possible without the immense generosity and tireless efforts from the people who make contributions to the world and share their work via open source initiatives. Thank you 🙏

© 2024 – Pkg Stats / Ryan Hefner

@callmyai/ai

v1.1.0

Published

## Description Inspired by Vercel's Language Model Specification, this is a proposal for introducing a Speech Model Specification to streamline the integration of various speech providers into our platform. This specification aims to provide a standardize

Downloads

3

Readme

AI-Voice

Description

Inspired by Vercel's Language Model Specification, this is a proposal for introducing a Speech Model Specification to streamline the integration of various speech providers into our platform. This specification aims to provide a standardized interface for interacting with different speech models, eliminating the complexity of dealing with unique APIs and reducing the risk of vendor lock-in.

Problem Statement

Currently, there are numerous speech providers available, each with its own distinct method for interfacing with their models. This lack of standardization complicates the process of switching providers and increases the likelihood of vendor lock-in. Developers are required to learn and implement different APIs for each provider, leading to increased development time and maintenance overhead.

The open-source community has created the following providers:

  • OpenAI Provider (@bishwenduk029-ai-voice/openai)

  • Elevenlabs Provider (@bishwenduk029-ai-voice/elevenlabs)

  • Deepgram Provider (@bishwenduk029-ai-voice/deepgram)

  • PlayHt Provider (@bishwenduk029-ai-voice/playht)

Features

  1. React Hook useVoiceChat for easy transcription and streaming voice chat in client side.
  2. Consistent API for streaming speech response from various AI speech providers.
  3. Works well with streamText from Vercel AI SDK.

Installation

pnpm install @callmyai/ai

alt text

Usage

Client Side React Hook

🎙️ Real-Time Speech Transcription React Hook Seamlessly integrate real-time speech transcription into your React applications with this powerful and efficient hook! 🚀

✨ Hook Capabilities

  • 🎤 Detect human speech end or silence using the robust @ricky0123/vad-react library
  • ⏱️ Intelligently debounce speech input, ensuring continuous recording and transcription as long as the user speaks within a configurable time frame (e.g., 500ms)
  • 🗣️ Gracefully handle speech interruptions, allowing users to pause and resume speaking naturally
  • 🌐 Efficiently trigger REST calls for transcription, optimizing performance by waiting for the user to pause before sending requests
  • 🔌 Easy to integrate into your existing React projects, with a simple and intuitive API
...
import { useVoiceChat } from '@callmyai/ai/ui'
...

export default function Chat({ id, initialMessages, className }: ChatProps) {
  const { speaking, listening, thinking, initialized, messages, setMessages } =
    useVoiceChat({
      api: '/api/chat/voice',
      initialMessages,
      transcribeAPI: '/api/transcribe',
      body: {
        id
      },
      speakerPause: 500,
      onSpeechCompletion: async () => {
        if (id) {
          const chat = await getChat(id)
          setMessages(chat?.messages || [])
        }
      }
    })

  return (
    <>
      <div className={cn('pb-[200px] pt-4 md:pt-10', className)}>
        {messages.length ? (
          <>
            <ChatList messages={messages} />
            <ChatScrollAnchor trackVisibility={initialized} />
          </>
        ) : (
          <EmptyScreen />
        )}
      </div>
      <ChatPanel
        id={id}
        initialized={initialized}
        speaking={speaking}
        listening={listening}
        thinking={thinking}
        messages={messages}
      />
    </>
  )
}

Hook up Server APIs

/api/chat/voice

import 'server-only'
...
import { streamText } from 'ai'
import { ollama, createOllama } from 'ollama-ai-provider'
import { openaiSpeech, playhtSpeech, streamSpeech } from '@callmyai/ai/server'
...

export const runtime = 'edge'

const model = ollama('llama3:latest')

export async function POST(req: Request) {
  const cookieStore = cookies()
  const supabase = createRouteHandlerClient<Database>({
    cookies: () => cookieStore
  })
  const json = await req.json()
  const { messages } = json
  // const userId = (await auth({ cookieStore }))?.user.id

  // if (!userId) {
  //   return new Response('Unauthorized', {
  //     status: 401
  //   })
  // }

  const systemPrompt = `
  Your role is to act as a friendly human assistant by the user preferred name. Your given name is Nova.
  `

  const result = await streamText({
    model,
    messages: [
      {
        role: 'system',
        content: `${systemPrompt}`
      },
      ...messages.map((message: { role: any; content: any }) => ({
        role: message.role,
        content: message.content
      }))
    ]
  })

  // OpenAI - env:OPENAI_API_KEY
  const speechModel = openaiSpeech(
    'tts-1',   //openai_speech_model
    'nova'     //openai_voice_id
  )

  // ElevenLabsIO - env:ELEVENLABS_API_KEY
  // const speechModel = elevenlabsSpeech(
  //   'eleven_turbo_v2',   //elevenlabs_speech_model
  //   'DIBkDE5u33APYlfhjihh' //elevenlabs_voice_id
  // )

    // PlayHt - env:PLAYHT_API_KEY
  // const speechModel = playhtSpeech(
  //   'PlayHT2.0-turbo', //playht_speech_model
  //   '<your-playht-user-id>',  
  //   "s3://voice-cloning-zero-shot/1afba232-fae0-4b69-9675-7f1aac69349f/delilahsaad/manifest.json"   //playht_voice_id
  // )

  // Deepgram - env:DEEPGRAM_API_KEY
  // const speechModel = deepgramSpeech("aura-asteria-en")

  try {
    const speech = await streamSpeech(speechModel)(result.textStream)
    return new Response(speech, {
      headers: { 'Content-Type': 'audio/mpeg' }
    })
  } catch (error) {
    console.log(error)
    return new Response(null, {
      status: 500
    })
  }
}

Limitations

Currently the speech streaming only works for english text streams. Multi-lingual support in future.

Roadmap

  • [x] Speech Model Specification Done
  • [ ] Improve the implementation of sentence-boundary detection algorithm for the text stream to sentence stream conversion
  • [ ] Add more tests
  • [ ] Enhance the specification to also take case of WebSocket-based Speech Providers