I have self-hosted livekit server, a self-hosted s...
# ask-ai
q
I have self-hosted livekit server, a self-hosted sip server. I would like to record conversation between user and agent, along with transcript text.
t
Thanks for reaching out, Rashed. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
You can record conversations between users and agents—including those connected via your self-hosted SIP server—and obtain transcripts in the following ways: • Audio/Video Recording: Use LiveKit’s Egress feature for room composite recording. This records all audio and video in the room, including SIP participants. You can configure it to record audio-only, and have recordings automatically uploaded to a cloud storage provider such as Amazon S3, Google Cloud Storage, or Azure Blob Storage. Recording typically starts when the agent enters the room, but SIP calls can also be configured for auto-egress (automatic recording upon connection). • Text Transcript: For transcription, access real-time or automatic conversation transcripts using LiveKit's
llm_node
or
transcription_node
. Conversation history is available via the
session.history
property, which can be saved automatically when the call ends. You can also save transcripts by reacting to specific agent events (like
user_speech_committed
). Ensure both your LiveKit and SIP servers are interconnected (often using Redis) and that required ports are accessible. For real-time transcription display, use the transmitted transcription protocol over the frontend API. If you need a step-by-step guide (e.g., example code or detailed configuration), let me know which part of this setup you’d like more details on—audio recording, transcript saving, SIP integration, or another aspect. Sources: Session recording and transcripts | LiveKit Docs | Recording agent sessions | LiveKit Docs | Self-hosted SIP server | LiveKit Docs | AI Agents Framework | livekit/livekit_composite | DeepWiki | AI voice agents | LiveKit Docs
q
How the recorded conversation by egress which uploaded to gcp can be linked to the transcripts text on the agent side? In other words, how we know this recorded voice is for this transcript?
t
Hey there! Let me walk you through a super helpful way to keep your audio recordings and transcripts neatly organized and easy to match up. Think of it like labeling your boxes when you're moving – you want everything to have a clear, consistent identifier. Here's a friendly tip: when you're recording audio or video, use the same room name both in the filename and how you track things in your system. For example, you might save a recording like
recordings/room123_20240622_150200.ogg
and its matching transcript as
transcripts/room123_20240622_150200.json
. It's like giving each recording and transcript a matching name tag! Want to make things even more bulletproof? Here are some extra-credit suggestions: • Add some extra details like who was in the session, when it started and ended, and a unique session ID in both the filename and transcript. • Keep all your recordings and transcripts in the same organized folder in your Google Cloud Storage – think of it like a neat filing cabinet. • If you're feeling extra thorough, you could even create a small reference list that explicitly connects each audio file with its transcript. It's like creating a buddy system for your files! By using this approach, you'll always be able to find exactly which transcript goes with which recording, no matter when or where you're looking. Pretty neat, right? Sources: Session recording and transcripts | LiveKit Docs | AI voice agents | LiveKit Docs