steep-balloon-41261
07/13/2025, 12:23 AMtall-airline-16756
07/13/2025, 12:23 AMtall-airline-16756
07/13/2025, 12:23 AMAgentSession
instance:
• Speech started/stopped events: The session emits input_speech_started
and input_speech_stopped
events as users begin and finish speaking. These events can be handled using event listeners in your Python code.
• User input transcription events: Use the user_input_transcribed
event to get transcriptions of what the user said, whether interim or final:
@session.on("user_input_transcribed")
async def on_user_input(event: UserInputTranscribedEvent):
if event.is_final:
print(f"User said: {event.transcript}")
else:
print(f"User is saying: {event.transcript}")
• VAD configuration: Server-side VAD (enabled by default) determines when a user starts/stops speaking. You can configure its parameters (like sensitivity, silence duration, etc.) when setting up your session if you need finer control.
You do not need external webhooks for this purpose; LiveKit handles speech event detection and raises the relevant events directly in your Python server process using the realtime session object.
Let me know if you need specific guidance for your session setup or customizing VAD parameters.
Sources: Events and error handling | LiveKit Docs | livekit-agents/livekit/agents/vad.py | OpenAI Realtime API integration guide | LiveKit Docs | Turn detection | LiveKit Docs