steep-balloon-41261
07/13/2025, 12:18 AMtall-airline-16756
07/13/2025, 12:18 AMtall-airline-16756
07/13/2025, 12:20 AMSTART_OF_SPEECH
and END_OF_SPEECH
using LiveKit's VAD interfaces to detect when users start and stop speaking. For example:
def on_start_of_speech(self, ev):
self._session._update_user_state('speaking')
def on_end_of_speech(self, ev):
self._session._update_user_state('listening')
• STT Events: If you're using STT (like Deepgram or Cartesia), you can listen for SpeechStarted
and SpeechEnded
event types from their APIs, and dispatch events on your server accordingly.
• Session-Level Events: With LiveKit agents, you can listen for user_input_transcribed
events on your session to get final and interim transcripts, speaker IDs, and more:
from livekit.agents import UserInputTranscribedEvent
@session.on("user_input_transcribed")
def on_user_input_transcribed(event: UserInputTranscribedEvent):
print(f"User input transcribed: {event.transcript}, final: {event.is_final}, speaker id: {event.speaker_id}")
• Webhooks: To notify external systems, implement HTTP calls inside these event handlers to trigger your own webhook endpoints and carry speech event data.
This system lets you detect speech events from low-level audio (VAD), high-level transcripts (STT), or at the session level, and use those events as triggers for your server logic or external integrations. For full implementation details on each approach, see the "VAD," "STT," and "session event" integration methods in the official SDK documentation.
Sources: livekit-plugins/livekit-plugins-deepgram/livekit/plugins/deepgram/stt.py | Turn detection | LiveKit Docs | Events and error handling | LiveKit Docs | Agent speech and audio | LiveKit Docs