boundless-summer-54220
02/16/2025, 6:15 PMgray-monkey-83525
02/16/2025, 6:15 PMbefore_tts_cb
callback to process the text before it is sent to the TTS module. According to the VoicePipelineAgent documentation, the before_tts_cb
callback allows you to modify the text before it is sent to the TTS model, which can be useful for customizing pronunciation or altering the text content.
To send the text as a single chunk to the LiveKit frontend, you might consider using the data channel to send the entire message. As per the LiveKit data messages documentation, you can publish arbitrary data messages to other participants in the same room using LocalParticipant.publishData
or RoomService.SendData
in the server API. This allows you to send the entire text as a single message, which can then be formatted as Markdown on the frontend.
Would you like more detailed guidance on implementing this approach?
Sources:
- VoicePipelineAgent documentation
- LiveKit data messages documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.boundless-summer-54220
02/16/2025, 6:20 PMgray-monkey-83525
02/16/2025, 6:20 PMboundless-summer-54220
02/16/2025, 6:21 PMgray-monkey-83525
02/16/2025, 6:21 PMminimal_assistant
example, you can follow these steps:
1. Modify the before_tts_cb
Callback: In the minimal_assistant.py
example, you can modify the before_tts_cb
to filter out specific parts of the response (e.g., Markdown) and prepare the text to be sent as a single chunk.
2. Use the Data Channel: Utilize LiveKit's data channel to send the filtered text as a single message to the frontend. This can be done using the publishData
method.
Here's a code snippet to illustrate these steps:
import asyncio
import json
from livekit import rtc
from livekit.agents import llm, VoicePipelineAgent
from livekit.plugins import deepgram, openai, silero
async def before_tts_cb(assistant, text):
# Filter out Markdown or other unwanted parts
if isinstance(text, str):
filtered_text = text.replace("**", "") # Example: remove Markdown bold
else:
async def process_stream():
full_text = ""
async for chunk in text:
full_text += chunk.replace("**", "")
return full_text
filtered_text = await process_stream()
# Send the filtered text as a single chunk to the frontend
await assistant.ctx.room.local_participant.publish_data(
payload=json.dumps({"text": filtered_text}),
reliable=True
)
return filtered_text
async def entrypoint(ctx: rtc.JobContext):
initial_ctx = llm.ChatContext().append(role="system", text="You are a voice assistant.")
await ctx.connect(auto_subscribe=rtc.AutoSubscribe.AUDIO_ONLY)
participant = await ctx.wait_for_participant()
agent = VoicePipelineAgent(
vad=silero.VAD.load(),
stt=deepgram.STT(model="nova-3-general"),
llm=openai.LLM(),
tts=openai.TTS(),
chat_ctx=initial_ctx,
before_tts_cb=before_tts_cb
)
agent.start(ctx.room, participant)
# Run the agent
if __name__ == "__main__":
import livekit.agents.cli as cli
cli.run_app(entrypoint)
Frontend Handling: On the frontend, using the agents-playground, you can listen for data messages and format them as needed. Here's a basic example in JavaScript:
room.on(RoomEvent.DataReceived, (payload, participant, kind) => {
const data = JSON.parse(new TextDecoder().decode(payload));
if (data.text) {
// Format and display the text as Markdown
displayMarkdown(data.text);
}
});
function displayMarkdown(text) {
// Implement your Markdown rendering logic here
console.log("Received text:", text);
}
gray-monkey-83525
02/16/2025, 6:22 PMboundless-summer-54220
02/16/2025, 6:29 PMassistant
AttributeError: 'VoicePipelineAgent' object has no attribute 'ctx'gray-monkey-83525
02/16/2025, 6:29 PMAttributeError: 'VoicePipelineAgent' object has no attribute 'ctx'
, you should use the ctx
from the JobContext
directly instead of trying to access it from the VoicePipelineAgent
. Here's how you can modify the code:
1. Access the ctx
from the JobContext
: Use the ctx
object directly to publish data to the room.
Here's the corrected code snippet:
import asyncio
import json
from livekit import rtc
from livekit.agents import llm, VoicePipelineAgent
from livekit.plugins import deepgram, openai, silero
async def before_tts_cb(assistant, text, ctx):
# Filter out Markdown or other unwanted parts
if isinstance(text, str):
filtered_text = text.replace("**", "") # Example: remove Markdown bold
else:
async def process_stream():
full_text = ""
async for chunk in text:
full_text += chunk.replace("**", "")
return full_text
filtered_text = await process_stream()
# Send the filtered text as a single chunk to the frontend
await ctx.room.local_participant.publish_data(
payload=json.dumps({"text": filtered_text}),
reliable=True
)
return filtered_text
async def entrypoint(ctx: rtc.JobContext):
initial_ctx = llm.ChatContext().append(role="system", text="You are a voice assistant.")
await ctx.connect(auto_subscribe=rtc.AutoSubscribe.AUDIO_ONLY)
participant = await ctx.wait_for_participant()
agent = VoicePipelineAgent(
vad=silero.VAD.load(),
stt=deepgram.STT(model="nova-3-general"),
llm=openai.LLM(),
tts=openai.TTS(),
chat_ctx=initial_ctx,
before_tts_cb=lambda assistant, text: before_tts_cb(assistant, text, ctx)
)
agent.start(ctx.room, participant)
# Run the agent
if __name__ == "__main__":
import livekit.agents.cli as cli
cli.run_app(entrypoint)
In this corrected version, the before_tts_cb
function is modified to accept the ctx
parameter, which is then used to publish data to the room. The lambda function is used to pass the ctx
to the before_tts_cb
callback.
Would you like further assistance or clarification on this implementation?
Sources:
- https://livekit-users.slack.com/archives/C07FY8WHGPM/p1727139659.431669
- https://docs.livekit.io/python/livekit/agents/voice_assistant/index.html#args 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.boundless-summer-54220
02/16/2025, 6:39 PMgray-monkey-83525
02/16/2025, 6:39 PMVoicePipelineAgent
in the callback. If this doesn't help, please reach out to the Livekit support team for further assistance. 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.