This message was deleted.
# ask-ai
s
This message was deleted.
t
Thanks for reaching out, Rishabh. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
You can detect user speech events on a Python server using LiveKit's built-in voice activity detection (VAD) and event system with the Realtime API. The primary workflow involves listening for speech-related events emitted from your
AgentSession
instance: • Speech started/stopped events: The session emits
input_speech_started
and
input_speech_stopped
events as users begin and finish speaking. These events can be handled using event listeners in your Python code. • User input transcription events: Use the
user_input_transcribed
event to get transcriptions of what the user said, whether interim or final:
Copy code
@session.on("user_input_transcribed")
async def on_user_input(event: UserInputTranscribedEvent):
    if event.is_final:
        print(f"User said: {event.transcript}")
    else:
        print(f"User is saying: {event.transcript}")
VAD configuration: Server-side VAD (enabled by default) determines when a user starts/stops speaking. You can configure its parameters (like sensitivity, silence duration, etc.) when setting up your session if you need finer control. You do not need external webhooks for this purpose; LiveKit handles speech event detection and raises the relevant events directly in your Python server process using the realtime session object. Let me know if you need specific guidance for your session setup or customizing VAD parameters. Sources: Events and error handling | LiveKit Docs | livekit-agents/livekit/agents/vad.py | OpenAI Realtime API integration guide | LiveKit Docs | Turn detection | LiveKit Docs