Using the node library I want to use the tts plugin openAI o LiveKit Community #ask-ai

Using the node library, I want to use the tts plug...

curved-forest-97958

02/26/2025, 2:51 AM

Using the node library, I want to use the tts plugin (openAI or elevenlabs) in a custom agent but Im having trouble figuring out how to use it. I have the string that I would like the TTS to say but how do I take what the plugin outputs and stream it back to the non-agent participant?

gray-monkey-83525

02/26/2025, 2:51 AM

To use the TTS plugin in a custom agent and stream the output back to a non-agent participant, you can follow these steps: 1. Initialize TTS Plugin: Use the TTS plugin from either OpenAI or ElevenLabs. For OpenAI, you can create an instance of the TTS class and set the necessary options like API key and model. 2. Stream Text to TTS: Use the

stream

method of the TTS instance to push your text input and receive audio data. This method returns a

SynthesizeStream

that you can iterate over to get the audio frames. 3. Publish Audio to Participant: Use the

rtc.LocalAudioTrack

to create an audio track and publish it to the room. You can capture frames from the TTS stream and send them to the participant. Here's a basic example using OpenAI TTS:

Copy code

import { TTS } from '@livekit/agents-plugin-openai';
import { LocalAudioTrack, AudioSource } from '@livekit/rtc-node';

const tts = new TTS({ apiKey: 'YOUR_OPENAI_API_KEY', model: 'gpt-4o-mini' });
const ttsStream = tts.stream();

async function sendAudio(text) {
  const audioSource = new AudioSource(44100, 1);
  const track = LocalAudioTrack.createAudioTrack('agent-audio', audioSource);

  // Push text to TTS stream
  ttsStream.push(text);
  ttsStream.endInput();

  // Capture and publish audio frames
  for await (const audioFrame of ttsStream) {
    await audioSource.captureFrame(audioFrame);
  }
}

// Example usage
sendAudio("Hello, participant!");

Would you like more details on any specific part of this process? Sources: - LiveKit TTS Plugin Documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

curved-forest-97958

02/26/2025, 3:06 AM

For the line 'await audioSource.captureFrame(audioFrame);' Its giving me the following error on audioFrame: "Argument of type 'SynthesizedAudio | unique symbol' is not assignable to parameter of type 'AudioFrame'. Type 'SynthesizedAudio' is missing the following properties from type 'AudioFrame': data, sampleRate, channels, samplesPerChannel, protoInfo" @millions-hair-24956

gray-monkey-83525

02/26/2025, 3:06 AM

It seems likely that the

audioFrame

object you're trying to pass to

captureFrame

is not in the expected

AudioFrame

format. The

SynthesizedAudio

type from the TTS stream needs to be converted to an

AudioFrame

before it can be used with

captureFrame

. Here's how you can convert

SynthesizedAudio

to `AudioFrame`: 1. Extract Audio Data: Extract the necessary audio data from the

SynthesizedAudio

object. 2. Create AudioFrame: Construct an

AudioFrame

using the extracted data. Here's an example of how you might do this:

Copy code

import { AudioFrame } from '@livekit/rtc-node';

async function sendAudio(text) {
  const audioSource = new AudioSource(44100, 1);
  const track = LocalAudioTrack.createAudioTrack('agent-audio', audioSource);

  ttsStream.push(text);
  ttsStream.endInput();

  for await (const synthesizedAudio of ttsStream) {
    const audioFrame = new AudioFrame({
      data: synthesizedAudio.frame.data, // Assuming frame contains raw audio data
      sampleRate: 44100, // Set the correct sample rate
      channels: 1, // Set the correct number of channels
      samplesPerChannel: synthesizedAudio.frame.samplesPerChannel, // Set the correct samples per channel
    });

    await audioSource.captureFrame(audioFrame);
  }
}

Would you like more details on any specific part of this process? Sources: - LiveKit SynthesizedAudio Documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

Open in Slack

Previous Next