refined-van-59846
07/26/2025, 3:04 PMAgent
class and how it interacts with AgentSession
tall-airline-16756
07/26/2025, 3:04 PMtall-airline-16756
07/26/2025, 3:05 PMAgent
class and AgentSession
work together to create a complete voice agent system in LiveKit, with distinct roles:
Agent Class:
• The Agent
class defines the behavior and capabilities of an agent. It is initialized with instructions and can optionally specify its own AI components. For example:
class MyAgent(Agent):
def __init__(self):
super().__init__(
instructions="Your name is Kelly. You would interact with users via voice. Keep your responses concise and to the point."
)
• It includes lifecycle hooks like on_enter()
, which are triggered when the agent becomes active:
async def on_enter(self):
self.session.generate_reply()
• Agents can define function tools using the @function_tool
decorator, allowing the LLM to call specific functions:
@function_tool
async def lookup_weather(self, context: RunContext, location: str, latitude: str, longitude: str):
"""Called when the user asks for weather-related information."""
return "Sunny with a temperature of 70 degrees."
AgentSession Class:
• The AgentSession
orchestrates the entire voice agent system, linking audio, video, and text I/O with components like STT, VAD, TTS, and LLM. It handles turn detection, endpointing, interruptions, and multi-step tool calls.
• It is initialized with various AI components and configuration options:
session = AgentSession(
vad=ctx.proc.userdata["vad"],
llm=openai.LLM(model="gpt-4o-mini"),
stt=deepgram.STT(model="nova-3", language="multi"),
tts=openai.TTS(voice="ash"),
turn_detection=MultilingualModel(),
)
• Key parameters include turn_detection
, allow_interruptions
, and max_tool_steps
.
Interaction Between Agent and AgentSession:
• The session is started with an agent instance:
await session.start(
agent=MyAgent(),
room=ctx.room,
)
• The agent accesses the session through self.session
, enabling it to control session behavior:
async def on_enter(self):
self.session.generate_reply()
• Agents can hand off control to other agents by returning new instances from function tools:
@function_tool
async def talk_to_echo(self, ctx: RunContext):
"""Called when the user wants to speak with Echo"""
return EchoAgent()
This architecture separates concerns effectively: AgentSession
handles real-time media processing and AI component orchestration, while Agent
classes focus on defining conversational behavior, instructions, and tools.
Sources: tests/test_evals.py