Whats the recommended approach for savings transcriptions Do LiveKit Community #ask-ai

Whats the recommended approach for savings transcr...

important-judge-78504

03/27/2025, 3:01 PM

Whats the recommended approach for savings transcriptions? Does livekit offer something off the shelf? If not, how can i add my own function to process and save transcriptions

tall-airline-16756

03/27/2025, 3:01 PM

Thanks for reaching out, Ed. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

03/27/2025, 3:01 PM

LiveKit provides two main approaches for saving transcriptions: 1. Server-side (Agent Process): You can create a text log of conversations by listening for agent events and logging user and agent speech to a text file. For example, you can log messages when user speech is committed (

user_speech_committed

) and when the agent stops speaking (

agent_stopped_speaking

). You can refer to example code in Python for saving conversations to a text file here: Conversation Persistor Example. 2. Client-side: For frontend applications, LiveKit provides transcription capabilities through both

VoicePipelineAgent

and

MultimodalAgent

. These can forward transcriptions to clients automatically if you implement support for receiving them in your frontend. The choice between these approaches depends on whether you need server-side or client-side transcription capabilities for your use case. Let me know if you need further assistance! Sources: Recording agent sessions | LiveKit Docs | Transcriptions | LiveKit Docs

important-judge-78504

03/27/2025, 3:14 PM

With the 1.0 RC agents framework it looks like you could potentially just add this logic directly in the transcription node?

tall-airline-16756

03/27/2025, 3:15 PM

It seems that the 1.0 RC Agents framework does provide flexibility, and you could potentially add logic directly in the transcription node to process and save transcriptions. However, the available documentation does not provide specific details or examples for implementing this directly in the transcription node. If you are looking for guidance on how to proceed, I recommend reviewing the Python example for saving conversations to a text file, which might offer insights that can be adapted to your use case. You can find it here: Conversation Persistor Example. Let me know if you need further assistance or clarification!

important-judge-78504

03/27/2025, 3:15 PM

The documentation provides this example (storing in a database, but i imagine saving to S3 or somewhere different would be a small change). Could you educate me on what the format of "delta" is?

Copy code

async def transcription_node(self, text: AsyncIterable[str]) -> AsyncIterable[str]:
    """Process the LLM output to transcriptions and store in database"""
    
    async def store_in_db(text_chunk: str):
        # Method to store the transcription in a database
        pass

    async for delta in text:
        # Store each chunk of text as it comes in
        await store_in_db(delta)
        
        # Forward the text chunk to the next node in the pipeline
        yield delta

tall-airline-16756

03/27/2025, 3:15 PM

The "delta" in the provided transcription node example is a string type. It represents an incremental text update for the transcript, as part of the audio transcript delta events in LiveKit. This allows you to process and store each chunk of text as it is received. If you have further questions or need additional clarification, feel free to ask! Sources: ResponseAudioTranscriptDeltaEvent | LiveKit Agents

important-judge-78504

03/27/2025, 3:20 PM

Does LiveKit offer off the shelf transcription saving? i dont mean incremental transcripts that happen during the call, but full transcripts after a call ends

tall-airline-16756

03/27/2025, 3:20 PM

LiveKit does not appear to offer an off-the-shelf solution specifically for saving full transcripts after a call ends. The available documentation primarily focuses on incremental transcription during the call and recording agent sessions. However, you could implement this functionality by aggregating the incremental transcriptions during the call and saving the complete transcript at the end. If you need guidance on how to set this up, feel free to let me know!

tall-airline-16756

03/27/2025, 3:23 PM

It seems that the documentation does not provide specific details about the accessibility of incremental transcripts. However, based on the general functionality, incremental transcripts are typically forwarded in real-time during the call and can be processed or stored as needed. If you have a specific use case or need further clarification, feel free to share more details, and I’ll do my best to assist!

important-judge-78504

03/27/2025, 3:23 PM

for the incremental transcripts is it feasible to figure out which participant they came from?

tall-airline-16756

03/27/2025, 3:23 PM

Yes, it is feasible to determine which participant incremental transcripts came from. LiveKit's transcription system includes a

participant

property in the

TranscriptionReceived

events. This property references the specific participant associated with the transcription, along with a

publication

property that references the track publication these transcriptions apply to. This information is available in both

RoomEvent.TranscriptionReceived

and

ParticipantEvent.TranscriptionReceived

event types, allowing you to identify the participant and access their details. Let me know if you need further assistance! Sources: TranscriptionReceived | TranscriptionReceived

important-judge-78504

03/27/2025, 4:04 PM

In the new agents version, rc 1.0, how is update_chat_ctx used?

tall-airline-16756

03/27/2025, 4:04 PM

Unfortunately, I couldn't find specific information about the usage of

update_chat_ctx

in LiveKit Agents RC 1.0. The documentation does mention a

set_chat_ctx

method, but there are no details available for

update_chat_ctx

. If you have additional context or details about where this method is referenced, feel free to share, and I’ll do my best to assist further!

important-judge-78504

03/27/2025, 4:22 PM

whats the practical difference between generate_reply() and say()?

tall-airline-16756

03/27/2025, 4:23 PM

The practical difference between

generate_reply()

and

say()

in LiveKit agents is as follows: • `generate_reply()`: This method generates a reply from the agent based on the current conversation context. It commits the audio buffer (if server voice activity detection is not enabled) and creates a response. It also includes an

on_duplicate

parameter to handle duplicate responses with options like "cancel_existing," "cancel_new," or "keep_both." • `say()`: This method plays a specific speech source through the voice assistant. It accepts parameters such as

source

(e.g., a string or an iterable),

allowInterruptions

(to control whether the speech can be interrupted), and

addToChatCtx

(to determine if the speech is added to the chat context). It returns a

SpeechHandle

for managing the speech playback. In summary,

generate_reply()

is used for generating agent responses, while

say()

is used for playing specific speech content. Let me know if you need further clarification! Sources: VoicePipelineAgent | LiveKit Agents | livekit.agents.multimodal API documentation

important-judge-78504

03/27/2025, 4:26 PM

so while generate_reply and say() both force the bot to say something immediately, generate_reply() is more abstract (give the agent instructions and let it come up with its own response) while say() tells the agent to say something exactly as written?

tall-airline-16756

03/27/2025, 4:26 PM

Yes, that's correct! While both

generate_reply()

and

say()

prompt the bot to say something immediately, their purposes differ: • `generate_reply()`: This is more abstract. You provide the agent with instructions or context, and it generates its own response based on the conversation context and its logic. • `say()`: This is more direct. You explicitly tell the agent exactly what to say, and it will play that specific content as written or provided. Let me know if you have any further questions!

important-judge-78504

03/27/2025, 4:27 PM

update_chat_ctx

looks like it can be used to provide the agent more instructions during an ongoing session, is that correct? and just to be clear, it doesn't append to the existing context, it replaces the entire existing context and thats why first the user is instructed to copy the existing context and add a message to it before using that new context object in

update_chat_ctx

tall-airline-16756

03/27/2025, 4:27 PM

It seems that the documentation does not explicitly confirm the behavior of

update_chat_ctx

. However, based on your description, it appears that

update_chat_ctx

replaces the entire existing context rather than appending to it. This would explain why the recommended approach is to first copy the existing context, add the new message or instructions, and then use the updated context object with

update_chat_ctx

. If you have further questions or need clarification, feel free to ask!

important-judge-78504

03/27/2025, 4:30 PM

can you explain the difference between Agent and AgentSession?

tall-airline-16756

03/27/2025, 4:30 PM

The documentation does not provide specific details about the difference between

Agent

and

AgentSession

. However, based on general naming conventions, an

Agent

likely refers to the AI participant itself, while an

AgentSession

might represent a specific instance or session of interaction involving the agent. If you have additional context or specific references, feel free to share, and I’ll do my best to assist further!

important-judge-78504

03/27/2025, 4:37 PM

for metrics collected with

usage_collector = metrics.UsageCollector()

whats the purpose of them - are they displayed in the livekit cloud dashboard?

tall-airline-16756

03/27/2025, 4:38 PM

The documentation does not provide specific details about the purpose of metrics collected with

usage_collector = metrics.UsageCollector()

or whether they are displayed in the LiveKit Cloud dashboard. It is possible that these metrics are intended for custom monitoring or analytics purposes rather than direct integration with the dashboard. If you have additional context or specific requirements, feel free to share, and I’ll do my best to assist further!

important-judge-78504

03/27/2025, 4:41 PM

Does the transcription_enabled option of the RoomOutputOptions when passed to session.start() impact the transcription node of the agent pipeline or just whether or not a frontend using the room has access to transcription?

tall-airline-16756

03/27/2025, 4:41 PM

The documentation does not specify whether the

transcription_enabled

option in

RoomOutputOptions

impacts the transcription node of the agent pipeline or only controls frontend access to transcription. It is possible that this option is primarily intended for enabling or disabling transcription availability for frontends using the room. If you need further clarification or have additional details, feel free to share, and I’ll do my best to assist!

important-judge-78504

03/27/2025, 5:07 PM

It seems like it might be possible to get SIP status callbacks (lets say from twilio) through livekit itself, is that correct?

tall-airline-16756

03/27/2025, 5:08 PM

The documentation does not provide specific details about receiving SIP status callbacks (e.g., from Twilio) directly through LiveKit. While LiveKit supports SIP integration for connecting Twilio conferencing, it is unclear if status callbacks are natively handled by LiveKit or if they need to be managed separately through Twilio's callback mechanisms. If you have additional context or specific requirements, feel free to share, and I’ll do my best to assist further!

important-judge-78504

03/27/2025, 5:08 PM

im talking about this example found in the documentation:

Copy code

start_time = perf_counter()
while perf_counter() - start_time < 30:
    call_status = participant.attributes.get("sip.callStatus")
    if call_status == "active":
        <http://logger.info|logger.info>("user has picked up")
        return
    elif call_status == "automation":
        # if DTMF is used in the `sip_call_to` number, typically used to dial
        # an extension or enter a PIN.
        # during DTMF dialing, the participant will be in the "automation" state
        pass
    elif call_status == "hangup":
        # user hung up, we'll exit the job
        <http://logger.info|logger.info>("user hung up, exiting job")
        break
    await asyncio.sleep(0.1)

<http://logger.info|logger.info>("session timed out, exiting job")
ctx.shutdown()

tall-airline-16756

03/27/2025, 5:08 PM

Thank you for providing the example! Based on this, it appears that LiveKit can indeed access SIP status information (e.g.,

callStatus

) through participant attributes. This allows you to monitor the SIP call's state, such as "active," "automation," or "hangup," and take appropriate actions based on the status. This functionality suggests that LiveKit can handle SIP status updates internally, potentially eliminating the need for separate callback mechanisms from Twilio. Let me know if you need further clarification or assistance with implementing this!

important-judge-78504

03/27/2025, 5:09 PM

im wondering if we can access these via callbacks rather than having this inefficient while loop

tall-airline-16756

03/27/2025, 5:09 PM

The documentation does not specify whether SIP status updates can be accessed via callbacks instead of using a polling loop. Based on the example provided, it seems that polling is the suggested approach for monitoring SIP call statuses. If callbacks are critical for your use case, you might need to explore custom implementations or integrations to handle SIP status updates more efficiently. Let me know if you'd like further assistance or have additional details to share!

important-judge-78504

03/27/2025, 7:39 PM

is it a good idea to shut down an agent once a call ends? otherwise, if it stays active, can it take new jobs and will its context be cleaned for those new jobs?

tall-airline-16756

03/27/2025, 7:39 PM

The documentation does not explicitly state whether an agent should be shut down after a call ends. However, here are some considerations: • If the agent remains active, it may be able to take on new jobs. It is important to ensure that its context is properly cleaned or reset before starting a new job to avoid any carryover from the previous session. • Shutting down the agent after a call ends can help free up resources and ensure a clean state for future sessions. You can also use shutdown callbacks to save any necessary user state before shutting down. If you have specific requirements or concerns, feel free to share, and I’ll do my best to assist further! Sources: Inside a session | LiveKit Docs | Anatomy of an Agent | LiveKit Docs

important-judge-78504

03/27/2025, 7:41 PM

is there an event i can use to determine when a participant left in order to end shut down the agent?

tall-airline-16756

03/27/2025, 7:42 PM

Yes, LiveKit provides events to detect when a participant leaves a room. The specific event depends on the SDK you are using: • Flutter SDK: The

ParticipantDisconnectedEvent

is emitted by the

Room

class when a

RemoteParticipant

leaves the room. • Swift SDK: You can implement the delegate method

room(_:participantDidDisconnect:)

, which is called when a

RemoteParticipant

leaves the room. • Android SDK: The

ParticipantDisconnected

event is triggered when a

RemoteParticipant

leaves after the local participant has joined. You can use these events to determine when to shut down the agent. Let me know if you need further assistance! Sources: ParticipantDisconnectedEvent class - livekit_client library - Dart API | room(_:participantDidDisconnect:) | Documentation | ParticipantDisconnected

important-judge-78504

03/27/2025, 7:42 PM

no i mean in the python server sdk

tall-airline-16756

03/27/2025, 7:42 PM

The documentation does not provide specific details about an event in the Python server SDK to detect when a participant leaves. However, similar functionality exists in other SDKs, so it is likely that the Python server SDK has a comparable mechanism. If you have additional context or need further assistance, feel free to share, and I’ll do my best to help!

important-judge-78504

03/27/2025, 7:59 PM

From Livekits docs: Webhook requests are HTTP POST requests sent to URLs that you specify in your config or Cloud dashboard. A

WebhookEvent

is encoded as JSON and sent in the body of the request. The

Content-Type

header of the request is set to

application/webhook+json

. Please ensure your webserver is configured to receive payloads with this content type. In order to ensure webhook requests are coming from LiveKit, these requests have an

Authorization

header containing a signed JWT token. The token includes a sha256 hash of the payload. LiveKit's server SDKs provide webhook receiver libraries which should help with validation and decoding of the payload. I want to use webhooks like this, for example to know when a participant left the room, to trigger a shutdown for my agent. However, given that the agent has to be deployed with cli.run_app, does that mean it cant run in a server and obtain these webhooks?

tall-airline-16756

03/27/2025, 7:59 PM

Thank you for providing the details! You can indeed use webhooks to detect events like when a participant leaves the room and trigger actions such as shutting down your agent. However, since the agent is deployed using

cli.run_app

, it might not be running as a traditional server capable of receiving HTTP POST requests directly. To work around this, you could deploy a separate server to handle the webhooks. This server would receive the webhook events, validate and decode the payload using LiveKit's server SDKs, and then communicate with your agent (e.g., via an API call or message queue) to trigger the shutdown or other actions. Let me know if you need further guidance on setting this up!

important-judge-78504

03/27/2025, 8:05 PM

How would i use an api call to communicate with my agent?

tall-airline-16756

03/27/2025, 8:05 PM

The documentation does not provide specific details on how to communicate with a LiveKit agent using API calls. However, one possible approach is to expose an API endpoint within your agent application (if feasible) or use an intermediary server to relay commands to the agent. For example, you could set up a REST or WebSocket API in your agent application to listen for specific commands (e.g., "shutdown"). Your webhook handler could then send an API request to this endpoint to trigger the desired action in the agent. If you'd like more detailed guidance on implementing this, feel free to share additional details about your setup!

important-judge-78504

03/27/2025, 9:28 PM

Whats the process for communicating with already deployed agents (sending them messages/info, etc)

tall-airline-16756

03/27/2025, 9:28 PM

The documentation does not provide specific details about the process for communicating with already deployed agents. However, in general, communication with deployed agents could involve: • Using APIs or WebSocket connections if the agent exposes endpoints for receiving messages or commands. • Leveraging LiveKit's server SDKs or event mechanisms to send information to the agent indirectly. • Implementing a message queue or intermediary service to relay messages to the agent. If you have more details about your use case or setup, I’d be happy to help explore potential solutions!

Open in Slack

Previous Next