This message was deleted LiveKit Community #ask-ai

Join Slack

This message was deleted.

# ask-ai

steep-balloon-41261

07/07/2025, 5:17 PM

This message was deleted.

tall-airline-16756

07/07/2025, 5:17 PM

Thanks for reaching out, Dhruvil. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

07/07/2025, 5:17 PM

To generate transcripts with start and end times of spoken sentences by an agent or user in LiveKit, you can use the speech-to-text (STT) plugins provided by LiveKit. These plugins generate

SpeechEvent

objects that include

start_time

and

end_time

properties. Below are some examples: • *Baseten Plugin:*`event = stt.SpeechEvent( type=stt.SpeechEventType.FINAL_TRANSCRIPT, alternatives=[ stt.SpeechData( language=language, text=text, confidence=confidence, start_time=start_time, end_time=end_time, ) ], )` • *Cartesia Plugin:*`speech_data = stt.SpeechData( language=language, start_time=0, # Cartesia doesn’t provide word-level timestamps in this version end_time=data.get("duration", 0), # This is the duration transcribed so far confidence=data.get("probability", 1.0), text=text, )` • *Sarvam Plugin:*`# Try to get timestamps if available timestamps_data = response_json.get("timestamps") if timestamps_data and isinstance(timestamps_data, dict): words_ts_start = timestamps_data.get("start_time_seconds") words_ts_end = timestamps_data.get("end_time_seconds") if isinstance(words_ts_start, list) and len(words_ts_start) > 0: start_time = words_ts_start[0] if isinstance(words_ts_end, list) and len(words_ts_end) > 0: end_time = words_ts_end[-1]` Additionally, the

TranscriptSynchronizer

component can handle timed text through the

push_text

method, which accepts text with timing annotations:

Copy code

def push_text(self, text: str) -> None:
    start_time, end_time = None, None
    if isinstance(text, io.TimedString):
        start_time = text.start_time or None
        end_time = text.end_time or None

You can use these examples to implement transcript generation with timing information in your application. Sources: GitHub::livekit::agents | livekit-plugins/livekit-plugins-sarvam/livekit/pl...

blue-account-81612

07/07/2025, 5:22 PM

I'm using Deepgram for STT & Elevenlabs for TTS

tall-airline-16756

07/07/2025, 5:23 PM

To use Deepgram for STT (Speech-to-Text) and ElevenLabs for TTS (Text-to-Speech) in LiveKit, here's a friendly walkthrough that'll make your voice assistant project a breeze! 😊

Copy code

async def entrypoint(ctx: JobContext):
    await ctx.connect()

    agent = Agent(
        instructions="You are a friendly voice assistant built by LiveKit.",
        tools=[lookup_weather],
    )
    session = AgentSession(
        vad=silero.VAD.load(),
        # using Deepgram for STT and ElevenLabs for TTS
        stt=deepgram.STT(model="nova-3"),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=elevenlabs.TTS(),
    )

    await session.start(agent=agent, room=ctx.room)
    await session.generate_reply(instructions="greet the user and ask about their day")

To get this working smoothly, you'll just need to set up a couple of environment variables: •

DEEPGRAM_API_KEY

•

OPENAI_API_KEY

Pro tip: Installing the Deepgram plugin is super easy – just run:

Copy code

pip install livekit-plugins-deepgram

And there you have it! This setup will help you integrate Deepgram's STT and ElevenLabs' TTS into your LiveKit app seamlessly. Happy coding! 🚀 Sources: GitHub::livekit::agents | livekit-plugins/livekit-plugins-deepgram/README.m...

Open in Slack

Previous Next