Whats the recommended approach for savings transcr...
# ask-ai
i
Whats the recommended approach for savings transcriptions? Does livekit offer something off the shelf? If not, how can i add my own function to process and save transcriptions
t
Thanks for reaching out, Ed. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
LiveKit provides two main approaches for saving transcriptions: 1. Server-side (Agent Process): You can create a text log of conversations by listening for agent events and logging user and agent speech to a text file. For example, you can log messages when user speech is committed (
user_speech_committed
) and when the agent stops speaking (
agent_stopped_speaking
). You can refer to example code in Python for saving conversations to a text file here: Conversation Persistor Example. 2. Client-side: For frontend applications, LiveKit provides transcription capabilities through both
VoicePipelineAgent
and
MultimodalAgent
. These can forward transcriptions to clients automatically if you implement support for receiving them in your frontend. The choice between these approaches depends on whether you need server-side or client-side transcription capabilities for your use case. Let me know if you need further assistance! Sources: Recording agent sessions | LiveKit Docs | Transcriptions | LiveKit Docs
i
With the 1.0 RC agents framework it looks like you could potentially just add this logic directly in the transcription node?
t
It seems that the 1.0 RC Agents framework does provide flexibility, and you could potentially add logic directly in the transcription node to process and save transcriptions. However, the available documentation does not provide specific details or examples for implementing this directly in the transcription node. If you are looking for guidance on how to proceed, I recommend reviewing the Python example for saving conversations to a text file, which might offer insights that can be adapted to your use case. You can find it here: Conversation Persistor Example. Let me know if you need further assistance or clarification!
i
The documentation provides this example (storing in a database, but i imagine saving to S3 or somewhere different would be a small change). Could you educate me on what the format of "delta" is?
Copy code
async def transcription_node(self, text: AsyncIterable[str]) -> AsyncIterable[str]:
    """Process the LLM output to transcriptions and store in database"""
    
    async def store_in_db(text_chunk: str):
        # Method to store the transcription in a database
        pass

    async for delta in text:
        # Store each chunk of text as it comes in
        await store_in_db(delta)
        
        # Forward the text chunk to the next node in the pipeline
        yield delta
t
The "delta" in the provided transcription node example is a string type. It represents an incremental text update for the transcript, as part of the audio transcript delta events in LiveKit. This allows you to process and store each chunk of text as it is received. If you have further questions or need additional clarification, feel free to ask! Sources: ResponseAudioTranscriptDeltaEvent | LiveKit Agents
i
Does LiveKit offer off the shelf transcription saving? i dont mean incremental transcripts that happen during the call, but full transcripts after a call ends
t
LiveKit does not appear to offer an off-the-shelf solution specifically for saving full transcripts after a call ends. The available documentation primarily focuses on incremental transcription during the call and recording agent sessions. However, you could implement this functionality by aggregating the incremental transcriptions during the call and saving the complete transcript at the end. If you need guidance on how to set this up, feel free to let me know!
It seems that the documentation does not provide specific details about the accessibility of incremental transcripts. However, based on the general functionality, incremental transcripts are typically forwarded in real-time during the call and can be processed or stored as needed. If you have a specific use case or need further clarification, feel free to share more details, and I’ll do my best to assist!
i
for the incremental transcripts is it feasible to figure out which participant they came from?
t
Yes, it is feasible to determine which participant incremental transcripts came from. LiveKit's transcription system includes a
participant
property in the
TranscriptionReceived
events. This property references the specific participant associated with the transcription, along with a
publication
property that references the track publication these transcriptions apply to. This information is available in both
RoomEvent.TranscriptionReceived
and
ParticipantEvent.TranscriptionReceived
event types, allowing you to identify the participant and access their details. Let me know if you need further assistance! Sources: TranscriptionReceived | TranscriptionReceived
i
In the new agents version, rc 1.0, how is update_chat_ctx used?
t
Unfortunately, I couldn't find specific information about the usage of
update_chat_ctx
in LiveKit Agents RC 1.0. The documentation does mention a
set_chat_ctx
method, but there are no details available for
update_chat_ctx
. If you have additional context or details about where this method is referenced, feel free to share, and I’ll do my best to assist further!
i
whats the practical difference between generate_reply() and say()?
t
The practical difference between
generate_reply()
and
say()
in LiveKit agents is as follows: • `generate_reply()`: This method generates a reply from the agent based on the current conversation context. It commits the audio buffer (if server voice activity detection is not enabled) and creates a response. It also includes an
on_duplicate
parameter to handle duplicate responses with options like "cancel_existing," "cancel_new," or "keep_both." • `say()`: This method plays a specific speech source through the voice assistant. It accepts parameters such as
source
(e.g., a string or an iterable),
allowInterruptions
(to control whether the speech can be interrupted), and
addToChatCtx
(to determine if the speech is added to the chat context). It returns a
SpeechHandle
for managing the speech playback. In summary,
generate_reply()
is used for generating agent responses, while
say()
is used for playing specific speech content. Let me know if you need further clarification! Sources: VoicePipelineAgent | LiveKit Agents | livekit.agents.multimodal API documentation
i
so while generate_reply and say() both force the bot to say something immediately, generate_reply() is more abstract (give the agent instructions and let it come up with its own response) while say() tells the agent to say something exactly as written?
t
Yes, that's correct! While both
generate_reply()
and
say()
prompt the bot to say something immediately, their purposes differ: • `generate_reply()`: This is more abstract. You provide the agent with instructions or context, and it generates its own response based on the conversation context and its logic. • `say()`: This is more direct. You explicitly tell the agent exactly what to say, and it will play that specific content as written or provided. Let me know if you have any further questions!
i
update_chat_ctx
looks like it can be used to provide the agent more instructions during an ongoing session, is that correct? and just to be clear, it doesn't append to the existing context, it replaces the entire existing context and thats why first the user is instructed to copy the existing context and add a message to it before using that new context object in
update_chat_ctx
?
t
It seems that the documentation does not explicitly confirm the behavior of
update_chat_ctx
. However, based on your description, it appears that
update_chat_ctx
replaces the entire existing context rather than appending to it. This would explain why the recommended approach is to first copy the existing context, add the new message or instructions, and then use the updated context object with
update_chat_ctx
. If you have further questions or need clarification, feel free to ask!
i
can you explain the difference between Agent and AgentSession?
t
The documentation does not provide specific details about the difference between
Agent
and
AgentSession
. However, based on general naming conventions, an
Agent
likely refers to the AI participant itself, while an
AgentSession
might represent a specific instance or session of interaction involving the agent. If you have additional context or specific references, feel free to share, and I’ll do my best to assist further!
i
for metrics collected with
usage_collector = metrics.UsageCollector()
whats the purpose of them - are they displayed in the livekit cloud dashboard?
t
The documentation does not provide specific details about the purpose of metrics collected with
usage_collector = metrics.UsageCollector()
or whether they are displayed in the LiveKit Cloud dashboard. It is possible that these metrics are intended for custom monitoring or analytics purposes rather than direct integration with the dashboard. If you have additional context or specific requirements, feel free to share, and I’ll do my best to assist further!
i
Does the transcription_enabled option of the RoomOutputOptions when passed to session.start() impact the transcription node of the agent pipeline or just whether or not a frontend using the room has access to transcription?
t
The documentation does not specify whether the
transcription_enabled
option in
RoomOutputOptions
impacts the transcription node of the agent pipeline or only controls frontend access to transcription. It is possible that this option is primarily intended for enabling or disabling transcription availability for frontends using the room. If you need further clarification or have additional details, feel free to share, and I’ll do my best to assist!
i
It seems like it might be possible to get SIP status callbacks (lets say from twilio) through livekit itself, is that correct?
t
The documentation does not provide specific details about receiving SIP status callbacks (e.g., from Twilio) directly through LiveKit. While LiveKit supports SIP integration for connecting Twilio conferencing, it is unclear if status callbacks are natively handled by LiveKit or if they need to be managed separately through Twilio's callback mechanisms. If you have additional context or specific requirements, feel free to share, and I’ll do my best to assist further!
i
im talking about this example found in the documentation:
Copy code
start_time = perf_counter()
while perf_counter() - start_time < 30:
    call_status = participant.attributes.get("sip.callStatus")
    if call_status == "active":
        <http://logger.info|logger.info>("user has picked up")
        return
    elif call_status == "automation":
        # if DTMF is used in the `sip_call_to` number, typically used to dial
        # an extension or enter a PIN.
        # during DTMF dialing, the participant will be in the "automation" state
        pass
    elif call_status == "hangup":
        # user hung up, we'll exit the job
        <http://logger.info|logger.info>("user hung up, exiting job")
        break
    await asyncio.sleep(0.1)

<http://logger.info|logger.info>("session timed out, exiting job")
ctx.shutdown()
t
Thank you for providing the example! Based on this, it appears that LiveKit can indeed access SIP status information (e.g.,
callStatus
) through participant attributes. This allows you to monitor the SIP call's state, such as "active," "automation," or "hangup," and take appropriate actions based on the status. This functionality suggests that LiveKit can handle SIP status updates internally, potentially eliminating the need for separate callback mechanisms from Twilio. Let me know if you need further clarification or assistance with implementing this!
i
im wondering if we can access these via callbacks rather than having this inefficient while loop
t
The documentation does not specify whether SIP status updates can be accessed via callbacks instead of using a polling loop. Based on the example provided, it seems that polling is the suggested approach for monitoring SIP call statuses. If callbacks are critical for your use case, you might need to explore custom implementations or integrations to handle SIP status updates more efficiently. Let me know if you'd like further assistance or have additional details to share!
i
is it a good idea to shut down an agent once a call ends? otherwise, if it stays active, can it take new jobs and will its context be cleaned for those new jobs?
t
The documentation does not explicitly state whether an agent should be shut down after a call ends. However, here are some considerations: • If the agent remains active, it may be able to take on new jobs. It is important to ensure that its context is properly cleaned or reset before starting a new job to avoid any carryover from the previous session. • Shutting down the agent after a call ends can help free up resources and ensure a clean state for future sessions. You can also use shutdown callbacks to save any necessary user state before shutting down. If you have specific requirements or concerns, feel free to share, and I’ll do my best to assist further! Sources: Inside a session | LiveKit Docs | Anatomy of an Agent | LiveKit Docs
i
is there an event i can use to determine when a participant left in order to end shut down the agent?
t
Yes, LiveKit provides events to detect when a participant leaves a room. The specific event depends on the SDK you are using: • Flutter SDK: The
ParticipantDisconnectedEvent
is emitted by the
Room
class when a
RemoteParticipant
leaves the room. • Swift SDK: You can implement the delegate method
room(_:participantDidDisconnect:)
, which is called when a
RemoteParticipant
leaves the room. • Android SDK: The
ParticipantDisconnected
event is triggered when a
RemoteParticipant
leaves after the local participant has joined. You can use these events to determine when to shut down the agent. Let me know if you need further assistance! Sources: ParticipantDisconnectedEvent class - livekit_client library - Dart API | room(_:participantDidDisconnect:) | Documentation | ParticipantDisconnected
i
no i mean in the python server sdk
t
The documentation does not provide specific details about an event in the Python server SDK to detect when a participant leaves. However, similar functionality exists in other SDKs, so it is likely that the Python server SDK has a comparable mechanism. If you have additional context or need further assistance, feel free to share, and I’ll do my best to help!
i
From Livekits docs: Webhook requests are HTTP POST requests sent to URLs that you specify in your config or Cloud dashboard. A
WebhookEvent
is encoded as JSON and sent in the body of the request. The
Content-Type
header of the request is set to
application/webhook+json
. Please ensure your webserver is configured to receive payloads with this content type. In order to ensure webhook requests are coming from LiveKit, these requests have an
Authorization
header containing a signed JWT token. The token includes a sha256 hash of the payload. LiveKit's server SDKs provide webhook receiver libraries which should help with validation and decoding of the payload. I want to use webhooks like this, for example to know when a participant left the room, to trigger a shutdown for my agent. However, given that the agent has to be deployed with cli.run_app, does that mean it cant run in a server and obtain these webhooks?
t
Thank you for providing the details! You can indeed use webhooks to detect events like when a participant leaves the room and trigger actions such as shutting down your agent. However, since the agent is deployed using
cli.run_app
, it might not be running as a traditional server capable of receiving HTTP POST requests directly. To work around this, you could deploy a separate server to handle the webhooks. This server would receive the webhook events, validate and decode the payload using LiveKit's server SDKs, and then communicate with your agent (e.g., via an API call or message queue) to trigger the shutdown or other actions. Let me know if you need further guidance on setting this up!
i
How would i use an api call to communicate with my agent?
t
The documentation does not provide specific details on how to communicate with a LiveKit agent using API calls. However, one possible approach is to expose an API endpoint within your agent application (if feasible) or use an intermediary server to relay commands to the agent. For example, you could set up a REST or WebSocket API in your agent application to listen for specific commands (e.g., "shutdown"). Your webhook handler could then send an API request to this endpoint to trigger the desired action in the agent. If you'd like more detailed guidance on implementing this, feel free to share additional details about your setup!
i
Whats the process for communicating with already deployed agents (sending them messages/info, etc)
t
The documentation does not provide specific details about the process for communicating with already deployed agents. However, in general, communication with deployed agents could involve: • Using APIs or WebSocket connections if the agent exposes endpoints for receiving messages or commands. • Leveraging LiveKit's server SDKs or event mechanisms to send information to the agent indirectly. • Implementing a message queue or intermediary service to relay messages to the agent. If you have more details about your use case or setup, I’d be happy to help explore potential solutions!