When I interrupt a message using text the text I interrupted LiveKit Community #ask-ai

When I interrupt a message using text, the text I ...

lively-coat-10905

08/07/2025, 12:22 PM

When I interrupt a message using text, the text I interrupted gets added after the message I sent, in the session history. But it works fine if I interrupt using voice.

tall-airline-16756

08/07/2025, 12:22 PM

Thanks for reaching out, Computer. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

08/07/2025, 12:22 PM

When you interrupt a message using text, the interrupted text is added to the session history as a new message with an "interrupted" status. This ensures that the interruption is explicitly recorded. However, when using voice, the system typically truncates the audio playback and updates the transcription without adding a new entry to the history. This difference in behavior is due to how text and voice interruptions are processed. If you have specific requirements or need further clarification, let me know! Sources: livekit-agents/livekit/agents/voice/agent_activity.py

lively-coat-10905

08/07/2025, 12:26 PM

Is it possible to show the text interrupted messages in the correct order if I don't want to record the 'interruption' status

tall-airline-16756

08/07/2025, 12:26 PM

Right now, it’s a bit tricky to show interrupted text messages in the right order without keeping track of the 'interruption' status. This status is actually a key part of how we manage the conversation and make sure the session history stays accurate. If you have any specific needs or want to explore other options, just let me know—I’m here to help figure out the best solution with you. Sources: livekit-agents/livekit/agents/voice/transcription/synchronizer.py

lively-coat-10905

08/07/2025, 12:34 PM

In the session history, will the user input ever get interrupted status or is it just the agent's output that gets the interrupted: true status?

tall-airline-16756

08/07/2025, 12:35 PM

Just a heads-up: in the session history, only the agent's responses can show an 'interrupted' status. Your input is always saved as part of the session but won’t be marked as interrupted. If anything’s unclear or you want to know more, don’t hesitate to ask—I’m here to help! Sources: livekit-agents/livekit/agents/telemetry/trace_types.py | Agent speech | LiveKit Docs

lively-coat-10905

08/07/2025, 12:45 PM

how do I differentiate user text and voice inputs in session history?

tall-airline-16756

08/07/2025, 12:45 PM

Hey there! To tell apart user text and voice inputs in the session history, you can take a look at the events logged during the session: • Text Inputs: These show up as text content right away and trigger events like

USER_TEXT_PARTIAL

InputTranscriptionCompleted

. • Voice Inputs: These go through speech-to-text (STT) processing and trigger events like

input_speech_started

and

input_speech_stopped

. The transcripts come in a bit later, asynchronously. So, by checking the session history, you can easily spot if the input was text or voice just by looking at the event types and how the content appears. If you want, I’m happy to share more details or examples—just let me know! Sources: livekit-agents/livekit/agents/voice/agent_activity.py

lively-coat-10905

08/07/2025, 12:48 PM

The event type always shows 'message'. But I see that the voice inputs have an extra 'transcript_confidence' field in the session history. Will this always be the case?

tall-airline-16756

08/07/2025, 12:48 PM

Just a quick heads-up: the

transcript_confidence

field doesn’t always show up in the session history. You’ll usually see it for voice inputs that go through speech-to-text, but sometimes it might be missing. This can happen if: • The voice input doesn’t include confidence info. • You’re using manual turn detection mode, where this field isn’t really needed. • There are errors or some limits in the speech-to-text processing. If you want me to explain anything more or help out with this, just let me know—I’m here for you! Sources: livekit-plugins/livekit-plugins-google/livekit/plugins/google/beta/realtime...

lively-coat-10905

08/07/2025, 1:06 PM

How else do I differentiate between text and voice inputs from the session history?

tall-airline-16756

08/07/2025, 1:06 PM

Hey there! If you want to tell apart text and voice inputs in your session history without using

transcript_confidence

, here are some handy ways to do it: • Event Types: Voice inputs come with speech events like

START_OF_SPEECH

and

END_OF_SPEECH

, which you won’t see with text inputs. • Turn Detection Phases: Voice inputs have phases like

USER_SPEAKING

, while text inputs skip these. • Audio Buffer Events: Voice inputs include audio buffer events such as

input_audio_buffer.append

, but text inputs don’t. • Transcription Attributes: You might spot attributes like

ATTR_TRANSCRIPTION_DELAY

with voice inputs, which aren’t in text inputs. • Activity Tracking: Voice inputs trigger events like

input_speech_started

and

input_speech_stopped

. If you’d like me to walk you through any of these or help set them up, just give me a shout! Sources: livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/stt.py | agents/src/multimodal/multimodal_agent.ts

lively-coat-10905

08/07/2025, 1:09 PM

session.history.to_dict() gave the below output { "id": "item_a0c923a5d094", "type": "message", "role": "assistant", "content": [ "Welcome to" ], "interrupted": True }, { "id": "item_1afcbd2e218f", "type": "message", "role": "user", "content": [ "hello." ], "interrupted": False, "transcript_confidence": 1.0 } How do I differentiate the voice and text inputs?

tall-airline-16756

08/07/2025, 1:09 PM

Hey there! To tell the difference between voice and text inputs from the session history, you can look for the

transcript_confidence

field: • Voice Inputs: These will have the

transcript_confidence

field, like in the user message showing

"transcript_confidence": 1.0

. • Text Inputs: These won’t include the

transcript_confidence

field. So in this example, the user input with

"transcript_confidence": 1.0

means it was a voice input. If you want me to explain more or have any questions, just let me know—I’m happy to help!

lively-coat-10905

08/07/2025, 1:09 PM

But you said not all voice inputs will have transcript_confidence

tall-airline-16756

08/07/2025, 1:10 PM

You're absolutely right that not every voice input will have the

transcript_confidence

field. When that happens, no worries—you can still tell voice and text inputs apart by looking at a few other clues: • Event Types: Voice inputs usually come with speech-related events like

START_OF_SPEECH

END_OF_SPEECH

, which you won’t see with text inputs. • Turn Detection Phases: Voice inputs go through phases like

USER_SPEAKING

, whereas text inputs skip these. • Audio Buffer Events: Voice inputs include audio buffer events such as

input_audio_buffer.append

, which text inputs don’t have. • Transcription Attributes: Sometimes voice inputs have attributes like

ATTR_TRANSCRIPTION_DELAY

, even if

transcript_confidence

is missing. Putting these together should help you confidently tell voice and text inputs apart, even without

transcript_confidence

. If you want, I’m here to help you dig deeper!

lively-coat-10905

08/07/2025, 1:11 PM

Will these be available in the dictionary? Could you give a list of all the values that I could find the session history dictionary?

tall-airline-16756

08/07/2025, 1:12 PM

Here’s a quick rundown of what you’ll usually find in the session history dictionary for each message: • id: This is a unique ID that helps identify the message. • type: The kind of message it is, like "message" or other event types. • role: Who sent it—either the "user" or the "assistant". • content: The actual text or content of the message. • interrupted: A yes/no flag showing if the message got cut off. • transcript_confidence: (Optional) How confident the system is about voice input accuracy. • timestamps: (Optional) When the message was created or last updated. • attributes: (Optional) Extra details like transcription delays or audio buffer info. If you want me to explain any of these in more detail or have questions about specific parts, just let me know—I’m here to help! Sources: Session recording and transcripts | LiveKit Docs | Workflows | LiveKit Docs

lively-coat-10905

08/07/2025, 1:13 PM

How do I add timestamps to the history?

tall-airline-16756

08/07/2025, 1:13 PM

Just so you know, timestamps are automatically added to the session history for each message. Every item in the history has a

created_at

property that shows exactly when it was added. If you'd like to save the session history along with those timestamps, you can use a shutdown callback to export the history. Here’s a simple example:

Copy code

async def write_transcript():
    current_date = datetime.now().strftime("%Y%m%d_%H%M%S")
    filename = f"/tmp/transcript_{ctx.room.name}_{current_date}.json"
    with open(filename, 'w') as f:
        json.dump(session.history.to_dict(), f, indent=2)

ctx.add_shutdown_callback(write_transcript)

This will save the session history, timestamps included, into a JSON file. If you have any questions or want me to walk you through it, just let me know—I’m here to help!

lively-coat-10905

08/07/2025, 1:15 PM

Not to the filename. I meant the timestamp you mentioned in the previous message. as a dictionary key

tall-airline-16756

08/07/2025, 1:16 PM

Hey there! If you want to add timestamps as dictionary keys in your session history, you can do this by accessing the

session.history

property and including timestamps for each entry. Here’s a simple example to guide you:

Copy code

import json
from datetime import datetime

# Grab the session history
history = session.history.to_dict()

# Create a dictionary with timestamps as keys
timestamped_history = {}

for item in history:
    # See if the item already has a timestamp
    if 'created_at' in item:
        timestamp = item['created_at']
    else:
        # If not, generate one using the current time
        timestamp = datetime.now().isoformat()
    
    timestamped_history[timestamp] = item

# Save the new timestamped history to a file
with open('session_history.json', 'w') as f:
    json.dump(timestamped_history, f, indent=2)

This way, each item in your session history is neatly organized by its timestamp. And if an item doesn’t have one, no worries—you can just create one on the spot. If you want me to walk you through anything else or have questions, just let me know! Sources: Session recording and transcripts | LiveKit Docs | VADEvent | LiveKit Agents | STTMetrics | LiveKit Agents

Open in Slack

Previous Next