Hi I want customer agent conversation audio recording and it LiveKit Community #ask-ai

Hi, I want customer agent conversation audio recor...

future-cpu-86293

07/11/2025, 1:05 PM

Hi, I want customer agent conversation audio recording and its transcription. I want to use Deepgram's API to find word level timestamp along with speaker ids but Deepgram is not reliable as it is not able to detect speakers correctly. I am thinking that if I can record agent and customer audios in separate channels. Deepgram can provide transcription with word level timestamp per channel. Can anyone please help me with how can I record agent and customer audio in separate channels? Or any other way to find the transcription reliably?

tall-airline-16756

07/11/2025, 1:05 PM

Thanks for reaching out, Maneesh. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

07/11/2025, 1:06 PM

You can record agent and customer audio in separate channels using LiveKit's dual-channel composite recording feature. When starting a room composite egress, set the

audio_mixing

mode to DUAL_CHANNEL_AGENT. This will place the agent's audio in the left channel and all other (customer) audio in the right channel. Here is a Python example:

Copy code

from livekit import api

req = api.RoomCompositeEgressRequest(
    room_name="your-room-name",
    audio_mixing=AudioMixing.DUAL_CHANNEL_AGENT,  # Agent left, customers right
    file_outputs=[api.EncodedFileOutput(
        file_type=api.EncodedFileType.OGG,
        filepath="recording.ogg",
        # ... your storage configuration
    )]
)

lkapi = api.LiveKitAPI()
res = await lkapi.egress.start_room_composite_egress(req)

This approach creates an OGG file with two channels, enabling tools like Deepgram to provide word-level timestamps per channel. Alternatively, if you want to record agent and customer completely separately (as individual files), you can use Participant Egress: record each participant by specifying their identity separately in the API request. This produces one file per participant. Either approach will let you perform more reliable transcription and speaker separation. Let me know if you need step-by-step guidance for a specific method or integration! Reference: Sources: livekit-protocol/src/livekit.rs | src/main/kotlin/io/livekit/server/EgressServiceClient.kt | Recording Participants | LiveKit Docs | livekit-agents/livekit/agents/voice/room_io/room_io.py | How to set a fixed participant for agent audio processing | Session recording and transcripts | LiveKit Docs

future-cpu-86293

07/11/2025, 2:15 PM

This is still uploading audio with 1 channel

tall-airline-16756

07/11/2025, 2:16 PM

Hey there! I totally get how frustrating audio channel issues can be. Let me break down why you might be seeing single-channel audio even when you're trying to set up dual channels. Here's the key thing to know: • You'll only get dual-channel audio if you turn on

audio_only=True

when starting your composite egress. • Make sure you've got at least one participant marked as an agent - LiveKit needs this to separate the audio channels. If you've already done that and are still scratching your head about single-channel output, double-check that the participant you want on the left channel is definitely marked as an agent, and you've got at least one non-agent participant in the mix. I'm confident these tweaks will help you get that dual-channel audio file working perfectly for your transcription needs. Hang in there! Reference: Sources: protobufs/livekit_egress.proto | test/edge.go

future-cpu-86293

07/11/2025, 2:18 PM

How to mark a participant as agent

tall-airline-16756

07/11/2025, 2:19 PM

I don't have the answer you're looking for. You could also try asking your question: • in one of the other Slack channels or • to https://deepwiki.com/livekit/livekit_composite which is trained on all LiveKit source code If you find the answer, please post it here to help others!

12 Views

Open in Slack

Previous Next