Is there a way to transcribe audio from multiple participant LiveKit Community #ask-ai

Join Slack

Is there a way to transcribe audio from multiple p...

# ask-ai

important-judge-78504

06/06/2025, 6:26 PM

Is there a way to transcribe audio from multiple participants with the server api?

tall-airline-16756

06/06/2025, 6:26 PM

Thanks for reaching out, Ed. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

06/06/2025, 6:26 PM

Yes, you can transcribe audio from multiple participants using the LiveKit server API by leveraging the LiveKit Agents framework. Here's a brief overview of how it works: • Set up a speech-to-text system (e.g., OpenAI Whisper, AWS, AssemblyAI, Deepgram, Google, or Azure). • Create a track subscription handler to start a transcription task for each audio track from different participants. • For each audio track, create an audio stream, set up a forwarder to send transcriptions back to clients, and process the audio frames. • Connect to the room with

AutoSubscribe.AUDIO_ONLY

to automatically subscribe to all audio tracks. This setup allows you to transcribe audio from multiple participants simultaneously, with each participant's audio being processed in a separate task. If you'd like more detailed guidance or code examples, feel free to ask! Sources: Tree View: | dse-workbench

important-judge-78504

06/06/2025, 6:27 PM

can you show me a python example?

tall-airline-16756

06/06/2025, 6:28 PM

Here is a Python example for transcribing audio from multiple participants using the LiveKit server API:

Copy code

import asyncio
from livekit import rtc, transcription
from livekit.rtc import AutoSubscribe

async def transcribe_track(participant, track):
    audio_stream = rtc.AudioStream(track)
    stt_forwarder = transcription.STTSegmentsForwarder(
        room=ctx.room, participant=participant, track=track
    )

    stt_stream = transcription.SpeechStream()
    asyncio.create_task(_forward_transcription(stt_stream, stt_forwarder))

    async for ev in audio_stream:
        stt_stream.push_frame(ev.frame)

@ctx.room.on("track_subscribed")
def on_track_subscribed(track, publication, participant):
    if track.kind == rtc.TrackKind.KIND_AUDIO:
        asyncio.create_task(transcribe_track(participant, track))

async def main():
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)

# Replace with your LiveKit server details
ctx = rtc.Context(
    url="<wss://yourhost.livekit.cloud>",
    api_key="livekit-api-key",
    api_secret="your-api-secret"
)

asyncio.run(main())

Before running this code, ensure you have the required dependencies installed:

Copy code

pip install livekit-agents>=0.12.20 livekit-plugins-deepgram>=0.7.3 python-dotenv~=1.0

Also, set up the necessary environment variables:

Copy code

export LIVEKIT_URL=<wss://yourhost.livekit.cloud>
export LIVEKIT_API_KEY=livekit-api-key
export LIVEKIT_API_SECRET=your-api-secret
export OPENAI_API_KEY=your-api-key

Once everything is set up, you can run the script to start transcribing audio from multiple participants. Let me know if you need further assistance! Sources: Tree View: | dse-workbench | Tree View: | dse-workbench

important-judge-78504

06/06/2025, 6:29 PM

ok so the agent by default cant hear multiple participants? we have to have an async task for each agent to participant track subscription?

tall-airline-16756

06/06/2025, 6:29 PM

Yes, that's correct. By default, the agent does not automatically handle multiple participants' audio tracks simultaneously. You need to create an asynchronous task for each participant's track subscription. This ensures that each audio track is processed independently, allowing the system to transcribe audio from multiple participants concurrently. The example provided earlier demonstrates how to set up a track subscription handler that listens for new audio tracks and starts a transcription task for each one. This approach ensures scalability and efficient handling of multiple participants in a LiveKit room. If you have further questions or need clarification, feel free to ask!

important-judge-78504

06/06/2025, 6:31 PM

From an existing room/sip call that has participant A and an agent, if the agent makes an outbound call to another phone number to add another participant to the room, and participant B joins the room, im noticing that participant B does not get audio from participant A. Is there any workaround for this? So basically, the agent is having a conversation over SIP in a room with participant A, then the agent invites participant B to the call with create_sip_participant. But when participant B joins the room they cant hear participant A. Why?

tall-airline-16756

06/06/2025, 6:31 PM

I'm sorry, but I don't have the information you need at the moment. If you would like a member of the LiveKit team to respond, please ask your question in one of the other channels.

3 Views

Open in Slack

Previous Next