lively-coat-10905
08/07/2025, 12:22 PMtall-airline-16756
08/07/2025, 12:22 PMtall-airline-16756
08/07/2025, 12:22 PMlively-coat-10905
08/07/2025, 12:26 PMtall-airline-16756
08/07/2025, 12:26 PMlively-coat-10905
08/07/2025, 12:34 PMtall-airline-16756
08/07/2025, 12:35 PMlively-coat-10905
08/07/2025, 12:45 PMtall-airline-16756
08/07/2025, 12:45 PMUSER_TEXT_PARTIAL
or InputTranscriptionCompleted
.
• Voice Inputs: These go through speech-to-text (STT) processing and trigger events like input_speech_started
and input_speech_stopped
. The transcripts come in a bit later, asynchronously.
So, by checking the session history, you can easily spot if the input was text or voice just by looking at the event types and how the content appears. If you want, I’m happy to share more details or examples—just let me know!
Sources: livekit-agents/livekit/agents/voice/agent_activity.pylively-coat-10905
08/07/2025, 12:48 PMtall-airline-16756
08/07/2025, 12:48 PMtranscript_confidence
field doesn’t always show up in the session history. You’ll usually see it for voice inputs that go through speech-to-text, but sometimes it might be missing. This can happen if:
• The voice input doesn’t include confidence info.
• You’re using manual turn detection mode, where this field isn’t really needed.
• There are errors or some limits in the speech-to-text processing.
If you want me to explain anything more or help out with this, just let me know—I’m here for you!
Sources: livekit-plugins/livekit-plugins-google/livekit/plugins/google/beta/realtime...lively-coat-10905
08/07/2025, 1:06 PMtall-airline-16756
08/07/2025, 1:06 PMtranscript_confidence
, here are some handy ways to do it:
• Event Types: Voice inputs come with speech events like START_OF_SPEECH
and END_OF_SPEECH
, which you won’t see with text inputs.
• Turn Detection Phases: Voice inputs have phases like USER_SPEAKING
, while text inputs skip these.
• Audio Buffer Events: Voice inputs include audio buffer events such as input_audio_buffer.append
, but text inputs don’t.
• Transcription Attributes: You might spot attributes like ATTR_TRANSCRIPTION_DELAY
with voice inputs, which aren’t in text inputs.
• Activity Tracking: Voice inputs trigger events like input_speech_started
and input_speech_stopped
.
If you’d like me to walk you through any of these or help set them up, just give me a shout!
Sources: livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/stt.py | agents/src/multimodal/multimodal_agent.tslively-coat-10905
08/07/2025, 1:09 PMtall-airline-16756
08/07/2025, 1:09 PMtranscript_confidence
field:
• Voice Inputs: These will have the transcript_confidence
field, like in the user message showing "transcript_confidence": 1.0
.
• Text Inputs: These won’t include the transcript_confidence
field.
So in this example, the user input with "transcript_confidence": 1.0
means it was a voice input. If you want me to explain more or have any questions, just let me know—I’m happy to help!lively-coat-10905
08/07/2025, 1:09 PMtall-airline-16756
08/07/2025, 1:10 PMtranscript_confidence
field. When that happens, no worries—you can still tell voice and text inputs apart by looking at a few other clues:
• Event Types: Voice inputs usually come with speech-related events like START_OF_SPEECH
or END_OF_SPEECH
, which you won’t see with text inputs.
• Turn Detection Phases: Voice inputs go through phases like USER_SPEAKING
, whereas text inputs skip these.
• Audio Buffer Events: Voice inputs include audio buffer events such as input_audio_buffer.append
, which text inputs don’t have.
• Transcription Attributes: Sometimes voice inputs have attributes like ATTR_TRANSCRIPTION_DELAY
, even if transcript_confidence
is missing.
Putting these together should help you confidently tell voice and text inputs apart, even without transcript_confidence
. If you want, I’m here to help you dig deeper!lively-coat-10905
08/07/2025, 1:11 PMtall-airline-16756
08/07/2025, 1:12 PMlively-coat-10905
08/07/2025, 1:13 PMtall-airline-16756
08/07/2025, 1:13 PMcreated_at
property that shows exactly when it was added.
If you'd like to save the session history along with those timestamps, you can use a shutdown callback to export the history. Here’s a simple example:
async def write_transcript():
current_date = datetime.now().strftime("%Y%m%d_%H%M%S")
filename = f"/tmp/transcript_{ctx.room.name}_{current_date}.json"
with open(filename, 'w') as f:
json.dump(session.history.to_dict(), f, indent=2)
ctx.add_shutdown_callback(write_transcript)
This will save the session history, timestamps included, into a JSON file. If you have any questions or want me to walk you through it, just let me know—I’m here to help!lively-coat-10905
08/07/2025, 1:15 PMtall-airline-16756
08/07/2025, 1:16 PMsession.history
property and including timestamps for each entry. Here’s a simple example to guide you:
import json
from datetime import datetime
# Grab the session history
history = session.history.to_dict()
# Create a dictionary with timestamps as keys
timestamped_history = {}
for item in history:
# See if the item already has a timestamp
if 'created_at' in item:
timestamp = item['created_at']
else:
# If not, generate one using the current time
timestamp = datetime.now().isoformat()
timestamped_history[timestamp] = item
# Save the new timestamped history to a file
with open('session_history.json', 'w') as f:
json.dump(timestamped_history, f, indent=2)
This way, each item in your session history is neatly organized by its timestamp. And if an item doesn’t have one, no worries—you can just create one on the spot. If you want me to walk you through anything else or have questions, just let me know!
Sources: Session recording and transcripts | LiveKit Docs | VADEvent | LiveKit Agents | STTMetrics | LiveKit Agents