steep-balloon-41261
07/07/2025, 5:17 PMtall-airline-16756
07/07/2025, 5:17 PMtall-airline-16756
07/07/2025, 5:17 PMSpeechEvent
objects that include start_time
and end_time
properties. Below are some examples:
• *Baseten Plugin:*`event = stt.SpeechEvent(
type=stt.SpeechEventType.FINAL_TRANSCRIPT,
alternatives=[
stt.SpeechData(
language=language,
text=text,
confidence=confidence,
start_time=start_time,
end_time=end_time,
)
],
)`
• *Cartesia Plugin:*`speech_data = stt.SpeechData(
language=language,
start_time=0, # Cartesia doesn’t provide word-level timestamps in this version
end_time=data.get("duration", 0), # This is the duration transcribed so far
confidence=data.get("probability", 1.0),
text=text,
)`
• *Sarvam Plugin:*`# Try to get timestamps if available
timestamps_data = response_json.get("timestamps")
if timestamps_data and isinstance(timestamps_data, dict):
words_ts_start = timestamps_data.get("start_time_seconds")
words_ts_end = timestamps_data.get("end_time_seconds")
if isinstance(words_ts_start, list) and len(words_ts_start) > 0:
start_time = words_ts_start[0]
if isinstance(words_ts_end, list) and len(words_ts_end) > 0:
end_time = words_ts_end[-1]`
Additionally, the TranscriptSynchronizer
component can handle timed text through the push_text
method, which accepts text with timing annotations:
def push_text(self, text: str) -> None:
start_time, end_time = None, None
if isinstance(text, io.TimedString):
start_time = text.start_time or None
end_time = text.end_time or None
You can use these examples to implement transcript generation with timing information in your application.
Sources: GitHub::livekit::agents | livekit-plugins/livekit-plugins-sarvam/livekit/pl...blue-account-81612
07/07/2025, 5:22 PMtall-airline-16756
07/07/2025, 5:23 PMasync def entrypoint(ctx: JobContext):
await ctx.connect()
agent = Agent(
instructions="You are a friendly voice assistant built by LiveKit.",
tools=[lookup_weather],
)
session = AgentSession(
vad=silero.VAD.load(),
# using Deepgram for STT and ElevenLabs for TTS
stt=deepgram.STT(model="nova-3"),
llm=openai.LLM(model="gpt-4o-mini"),
tts=elevenlabs.TTS(),
)
await session.start(agent=agent, room=ctx.room)
await session.generate_reply(instructions="greet the user and ask about their day")
To get this working smoothly, you'll just need to set up a couple of environment variables:
• DEEPGRAM_API_KEY
• OPENAI_API_KEY
Pro tip: Installing the Deepgram plugin is super easy – just run:
pip install livekit-plugins-deepgram
And there you have it! This setup will help you integrate Deepgram's STT and ElevenLabs' TTS into your LiveKit app seamlessly. Happy coding! 🚀 Sources: GitHub::livekit::agents | livekit-plugins/livekit-plugins-deepgram/README.m...