acceptable-dream-85180
08/07/2025, 3:20 PMtall-airline-16756
08/07/2025, 3:20 PMtall-airline-16756
08/07/2025, 3:20 PMRoomEvent.TranscriptionReceived
. This event is triggered whenever a transcription update is received. The event includes a list of TranscriptionSegment
objects, which contain the transcribed text, timestamps, and whether the segment is final.
• Extract Word-Level Data: Each TranscriptionSegment
can represent a portion of the transcribed text. Depending on the STT service configuration, these segments might correspond to individual words, phrases, or sentences. To get word-level data, you may need to split the text into words and associate each word with its respective timestamp.
• Handle Interim and Final Transcripts: Interim transcriptions (SpeechEventType.INTERIM_TRANSCRIPT
) are sent as the user speaks, providing near-real-time updates. Final transcriptions (SpeechEventType.FINAL_TRANSCRIPT
) are sent once the STT service has finalized a segment of speech. Use these events to build a comprehensive transcript.
• Process and Store the Data: Store each word along with its start and end times. This allows for precise analysis or playback synchronization if needed.
Let me know if you need further assistance with implementing this!acceptable-dream-85180
08/07/2025, 3:23 PMtall-airline-16756
08/07/2025, 3:23 PMtranscription_node
method to your agent. Here’s a friendly walkthrough:
• Enable TTS-Aligned Transcription: When you start your AgentSession
, set use_tts_aligned_transcript=True
so the transcription timing matches the speech perfectly.
• Implement the transcription_node
Method: Just define async def transcription_node(...)
in your agent class. This lets you handle each TimedString
chunk, which comes with start_time
and end_time
for every word.
• Log or Use Word-Level Data: Inside transcription_node
, you can check if a chunk is a TimedString
and then grab its timing info to log or work with however you like.
• Pick the Right TTS Providers: Providers like Cartesia or ElevenLabs support word-level timestamps, so they’re great if you want precise sync between audio and transcription.
Here’s a quick example:
async def transcription_node(
self, text: AsyncIterable[str | TimedString], model_settings: ModelSettings
) -> AsyncGenerator[str | TimedString, None]:
async for chunk in text:
if isinstance(chunk, TimedString):
<http://logger.info|logger.info>(f"TimedString: '{chunk.text}' (Start: {chunk.start_time}, End: {chunk.end_time})")
yield chunk
Just a heads-up: word-level timing depends on your TTS provider. Cartesia and ElevenLabs are solid choices for this feature. If you have any questions or want a hand setting this up, feel free to ask—I’m here to help!acceptable-dream-85180
08/07/2025, 3:24 PMtall-airline-16756
08/07/2025, 3:25 PMtranscription_node
method is made just for agent participants. It’s a handy part of the agent’s workflow that lets them tweak or fine-tune transcriptions before sharing them with clients. This helps agents manage and polish the transcriptions smoothly within LiveKit’s system.
Sources: examples/voice_agents/structured_output.py | Text and transcriptions | LiveKit Docs | Transcription Applications | livekit/livekit_composite | DeepWikiacceptable-dream-85180
08/07/2025, 3:26 PMsession.history
from there.tall-airline-16756
08/07/2025, 3:26 PMsession.history
in your agent code, here’s a friendly walkthrough to help you out:
• Access Conversation History: Start by using session.history
to pull the full conversation history, which includes messages from non-agent participants. Here’s how you can do it:
# Access the conversation history
conversation_history = session.history
# Loop through the history to spot messages from non-agent participants
for item in conversation_history:
if item.from_.identity != agent_participant.identity:
# This message is from a non-agent participant
transcript_data = item.message
# Feel free to process the transcript data however you need
• Listen for Transcription Events: You can also catch word-level transcript data as it happens by using the user_input_transcribed
event:
@session.on("user_input_transcribed")
async def on_transcript(transcript):
if transcript.is_final:
# This is the final transcript version
print(f"User transcript: {transcript.transcript}")
• Enable TTS-Aligned Transcriptions: If you want word-level timing details, just make sure to enable TTS-aligned transcriptions in your AgentSession
setup like this:
session = AgentSession(
# ... other configurations ...
use_tts_aligned_transcript=True,
)
If anything’s unclear or you want a hand with the implementation, just let me know—I’m here to help!