Using the node library, I want to use the tts plug...
# ask-ai
c
Using the node library, I want to use the tts plugin (openAI or elevenlabs) in a custom agent but Im having trouble figuring out how to use it. I have the string that I would like the TTS to say but how do I take what the plugin outputs and stream it back to the non-agent participant?
g
To use the TTS plugin in a custom agent and stream the output back to a non-agent participant, you can follow these steps: 1. Initialize TTS Plugin: Use the TTS plugin from either OpenAI or ElevenLabs. For OpenAI, you can create an instance of the TTS class and set the necessary options like API key and model. 2. Stream Text to TTS: Use the
stream
method of the TTS instance to push your text input and receive audio data. This method returns a
SynthesizeStream
that you can iterate over to get the audio frames. 3. Publish Audio to Participant: Use the
rtc.LocalAudioTrack
to create an audio track and publish it to the room. You can capture frames from the TTS stream and send them to the participant. Here's a basic example using OpenAI TTS:
Copy code
import { TTS } from '@livekit/agents-plugin-openai';
import { LocalAudioTrack, AudioSource } from '@livekit/rtc-node';

const tts = new TTS({ apiKey: 'YOUR_OPENAI_API_KEY', model: 'gpt-4o-mini' });
const ttsStream = tts.stream();

async function sendAudio(text) {
  const audioSource = new AudioSource(44100, 1);
  const track = LocalAudioTrack.createAudioTrack('agent-audio', audioSource);

  // Push text to TTS stream
  ttsStream.push(text);
  ttsStream.endInput();

  // Capture and publish audio frames
  for await (const audioFrame of ttsStream) {
    await audioSource.captureFrame(audioFrame);
  }
}

// Example usage
sendAudio("Hello, participant!");
Would you like more details on any specific part of this process? Sources: - LiveKit TTS Plugin Documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
c
For the line 'await audioSource.captureFrame(audioFrame);' Its giving me the following error on audioFrame: "Argument of type 'SynthesizedAudio | unique symbol' is not assignable to parameter of type 'AudioFrame'. Type 'SynthesizedAudio' is missing the following properties from type 'AudioFrame': data, sampleRate, channels, samplesPerChannel, protoInfo" @millions-hair-24956
g
It seems likely that the
audioFrame
object you're trying to pass to
captureFrame
is not in the expected
AudioFrame
format. The
SynthesizedAudio
type from the TTS stream needs to be converted to an
AudioFrame
before it can be used with
captureFrame
. Here's how you can convert
SynthesizedAudio
to `AudioFrame`: 1. Extract Audio Data: Extract the necessary audio data from the
SynthesizedAudio
object. 2. Create AudioFrame: Construct an
AudioFrame
using the extracted data. Here's an example of how you might do this:
Copy code
import { AudioFrame } from '@livekit/rtc-node';

async function sendAudio(text) {
  const audioSource = new AudioSource(44100, 1);
  const track = LocalAudioTrack.createAudioTrack('agent-audio', audioSource);

  ttsStream.push(text);
  ttsStream.endInput();

  for await (const synthesizedAudio of ttsStream) {
    const audioFrame = new AudioFrame({
      data: synthesizedAudio.frame.data, // Assuming frame contains raw audio data
      sampleRate: 44100, // Set the correct sample rate
      channels: 1, // Set the correct number of channels
      samplesPerChannel: synthesizedAudio.frame.samplesPerChannel, // Set the correct samples per channel
    });

    await audioSource.captureFrame(audioFrame);
  }
}
Would you like more details on any specific part of this process? Sources: - LiveKit SynthesizedAudio Documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.