This message was deleted.
# ask-ai
s
This message was deleted.
t
Thanks for reaching out, Dhruvil. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
To generate transcripts with start and end times of spoken sentences by an agent or user in LiveKit, you can use the speech-to-text (STT) plugins provided by LiveKit. These plugins generate
SpeechEvent
objects that include
start_time
and
end_time
properties. Below are some examples: • *Baseten Plugin:*`event = stt.SpeechEvent( type=stt.SpeechEventType.FINAL_TRANSCRIPT, alternatives=[ stt.SpeechData( language=language, text=text, confidence=confidence, start_time=start_time, end_time=end_time, ) ], )` • *Cartesia Plugin:*`speech_data = stt.SpeechData( language=language, start_time=0, # Cartesia doesn’t provide word-level timestamps in this version end_time=data.get("duration", 0), # This is the duration transcribed so far confidence=data.get("probability", 1.0), text=text, )` • *Sarvam Plugin:*`# Try to get timestamps if available timestamps_data = response_json.get("timestamps") if timestamps_data and isinstance(timestamps_data, dict): words_ts_start = timestamps_data.get("start_time_seconds") words_ts_end = timestamps_data.get("end_time_seconds") if isinstance(words_ts_start, list) and len(words_ts_start) > 0: start_time = words_ts_start[0] if isinstance(words_ts_end, list) and len(words_ts_end) > 0: end_time = words_ts_end[-1]` Additionally, the
TranscriptSynchronizer
component can handle timed text through the
push_text
method, which accepts text with timing annotations:
Copy code
def push_text(self, text: str) -> None:
    start_time, end_time = None, None
    if isinstance(text, io.TimedString):
        start_time = text.start_time or None
        end_time = text.end_time or None
You can use these examples to implement transcript generation with timing information in your application. Sources: GitHub::livekit::agents | livekit-plugins/livekit-plugins-sarvam/livekit/pl...
b
I'm using Deepgram for STT & Elevenlabs for TTS
t
To use Deepgram for STT (Speech-to-Text) and ElevenLabs for TTS (Text-to-Speech) in LiveKit, here's a friendly walkthrough that'll make your voice assistant project a breeze! 😊
Copy code
async def entrypoint(ctx: JobContext):
    await ctx.connect()

    agent = Agent(
        instructions="You are a friendly voice assistant built by LiveKit.",
        tools=[lookup_weather],
    )
    session = AgentSession(
        vad=silero.VAD.load(),
        # using Deepgram for STT and ElevenLabs for TTS
        stt=deepgram.STT(model="nova-3"),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=elevenlabs.TTS(),
    )

    await session.start(agent=agent, room=ctx.room)
    await session.generate_reply(instructions="greet the user and ask about their day")
To get this working smoothly, you'll just need to set up a couple of environment variables: •
DEEPGRAM_API_KEY
OPENAI_API_KEY
Pro tip: Installing the Deepgram plugin is super easy – just run:
Copy code
pip install livekit-plugins-deepgram
And there you have it! This setup will help you integrate Deepgram's STT and ElevenLabs' TTS into your LiveKit app seamlessly. Happy coding! 🚀 Sources: GitHub::livekit::agents | livekit-plugins/livekit-plugins-deepgram/README.m...