Hi, I want customer agent conversation audio recor...
# ask-ai
f
Hi, I want customer agent conversation audio recording and its transcription. I want to use Deepgram's API to find word level timestamp along with speaker ids but Deepgram is not reliable as it is not able to detect speakers correctly. I am thinking that if I can record agent and customer audios in separate channels. Deepgram can provide transcription with word level timestamp per channel. Can anyone please help me with how can I record agent and customer audio in separate channels? Or any other way to find the transcription reliably?
t
Thanks for reaching out, Maneesh. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
You can record agent and customer audio in separate channels using LiveKit's dual-channel composite recording feature. When starting a room composite egress, set the
audio_mixing
mode to DUAL_CHANNEL_AGENT. This will place the agent's audio in the left channel and all other (customer) audio in the right channel. Here is a Python example:
Copy code
from livekit import api

req = api.RoomCompositeEgressRequest(
    room_name="your-room-name",
    audio_mixing=AudioMixing.DUAL_CHANNEL_AGENT,  # Agent left, customers right
    file_outputs=[api.EncodedFileOutput(
        file_type=api.EncodedFileType.OGG,
        filepath="recording.ogg",
        # ... your storage configuration
    )]
)

lkapi = api.LiveKitAPI()
res = await lkapi.egress.start_room_composite_egress(req)
This approach creates an OGG file with two channels, enabling tools like Deepgram to provide word-level timestamps per channel. Alternatively, if you want to record agent and customer completely separately (as individual files), you can use Participant Egress: record each participant by specifying their identity separately in the API request. This produces one file per participant. Either approach will let you perform more reliable transcription and speaker separation. Let me know if you need step-by-step guidance for a specific method or integration! Reference: Sources: livekit-protocol/src/livekit.rs | src/main/kotlin/io/livekit/server/EgressServiceClient.kt | Recording Participants | LiveKit Docs | livekit-agents/livekit/agents/voice/room_io/room_io.py | How to set a fixed participant for agent audio processing | Session recording and transcripts | LiveKit Docs
f
This is still uploading audio with 1 channel
t
Hey there! I totally get how frustrating audio channel issues can be. Let me break down why you might be seeing single-channel audio even when you're trying to set up dual channels. Here's the key thing to know: • You'll only get dual-channel audio if you turn on
audio_only=True
when starting your composite egress. • Make sure you've got at least one participant marked as an agent - LiveKit needs this to separate the audio channels. If you've already done that and are still scratching your head about single-channel output, double-check that the participant you want on the left channel is definitely marked as an agent, and you've got at least one non-agent participant in the mix. I'm confident these tweaks will help you get that dual-channel audio file working perfectly for your transcription needs. Hang in there! Reference: Sources: protobufs/livekit_egress.proto | test/edge.go
f
How to mark a participant as agent
t
I don't have the answer you're looking for. You could also try asking your question: • in one of the other Slack channels or • to https://deepwiki.com/livekit/livekit_composite which is trained on all LiveKit source code If you find the answer, please post it here to help others!