miniature-nail-35141
02/25/2025, 5:58 AMgray-monkey-83525
02/25/2025, 5:58 AMsetMicrophoneEnabled
or setCameraEnabled
to enable tracks, as they handle common cases and permissions automatically (Handling common track types).
Would you like more detailed guidance on any of these steps?
Sources:
- TrackPublishException class
- Handling common track types 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.miniature-nail-35141
02/25/2025, 6:03 AMgray-monkey-83525
02/25/2025, 6:03 AMminiature-nail-35141
02/25/2025, 6:06 AMgray-monkey-83525
02/25/2025, 6:06 AMRealtimeModel
. The threshold
, prefix_padding_ms
, and silence_duration_ms
parameters should be configured to accurately detect when the participant is speaking. Adjust these settings if necessary to improve detection.
3. Event Handling: Make sure that the agent is set up to handle events such as user_started_speaking
and user_stopped_speaking
. These events can help manage the flow of conversation and ensure that the agent responds appropriately.
4. Debugging Logs: Enable detailed logging to capture more information about the interaction. This can help identify where the process might be failing. Look for logs related to track subscription, VAD events, and any errors that might occur during the session.
5. Session Management: Ensure that the session is being managed correctly, and that the agent is not prematurely ending the session or missing audio input due to session mismanagement.
If you need more detailed guidance on any of these points, please let me know.
Sources:
- https://docs.livekit.io/python/livekit/agents/pipeline/human_input.html#livekit.agents.pipeline.human_input.HumanInput
- https://docs.livekit.io/client-sdk-js/#md%3Aaudio-playback 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.miniature-nail-35141
02/25/2025, 6:07 AMgray-monkey-83525
02/25/2025, 6:07 AMfrom __future__ import annotations
import logging
from dotenv import load_dotenv
from livekit import rtc
from livekit.agents import (
AutoSubscribe,
JobContext,
WorkerOptions,
cli,
llm,
)
from livekit.agents.multimodal import MultimodalAgent
from livekit.plugins import openai
load_dotenv(dotenv_path=".env.example")
logger = logging.getLogger("my-worker")
logger.setLevel(logging.INFO)
async def entrypoint(ctx: JobContext):
logger.info(f"connecting to room {ctx.room.name}")
await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
participant = await ctx.wait_for_participant()
run_multimodal_agent(ctx, participant)
logger.info("agent started")
def run_multimodal_agent(ctx: JobContext, participant: rtc.RemoteParticipant):
logger.info("starting multimodal agent")
model = openai.realtime.RealtimeModel(
instructions=(
"You are a voice assistant created by LiveKit. Your interface with users will be voice. "
"You should use short and concise responses, and avoiding usage of unpronouncable punctuation. "
"You were created as a demo to showcase the capabilities of LiveKit's agents framework."
),
voice='shimmer',
turn_detection=openai.realtime.ServerVadOptions(
threshold=0.8, prefix_padding_ms=300, silence_duration_ms=200
),
temperature=0.7,
modalities=["audio", "text"],
)
chat_ctx = llm.ChatContext()
chat_ctx.append(
text="Context about the user: you are talking to a software engineer who's building voice AI applications."
"Greet the user with a friendly greeting and ask how you can help them today.",
role="assistant",
)
agent = MultimodalAgent(
model=model,
chat_ctx=chat_ctx,
)
agent.start(ctx.room, participant)
# Ensure the agent is listening for audio input
@agent.on("user_started_speaking")
async def on_user_started_speaking():
logger.info("User started speaking")
@agent.on("user_stopped_speaking")
async def on_user_stopped_speaking():
logger.info("User stopped speaking")
# Process the user's speech and generate a response
await agent.process_speech()
if __name__ == "__main__":
cli.run_app(
WorkerOptions(
entrypoint_fnc=entrypoint,
)
)
gray-monkey-83525
02/25/2025, 6:08 AMturn_detection
settings are crucial for detecting when the user starts and stops speaking. Adjust these parameters to suit your environment.
- Event Handling: The on_user_started_speaking
and on_user_stopped_speaking
events are used to manage the conversation flow.
- Logging: Use logging to monitor the agent's activity and troubleshoot any issues.
This setup should help the agent to listen and respond to user input effectively. If you need further assistance, feel free to ask!
Sources:
- https://docs.livekit.io/python/livekit/agents/pipeline/pipeline_agent.html#args
- https://docs.livekit.io/agents/openai/multimodal-agent/#radix-%3AR7tkvffep9tkq%3A-content-Python 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.