Is there a way to transcribe audio from multiple p...
# ask-ai
i
Is there a way to transcribe audio from multiple participants with the server api?
t
Thanks for reaching out, Ed. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
Yes, you can transcribe audio from multiple participants using the LiveKit server API by leveraging the LiveKit Agents framework. Here's a brief overview of how it works: • Set up a speech-to-text system (e.g., OpenAI Whisper, AWS, AssemblyAI, Deepgram, Google, or Azure). • Create a track subscription handler to start a transcription task for each audio track from different participants. • For each audio track, create an audio stream, set up a forwarder to send transcriptions back to clients, and process the audio frames. • Connect to the room with
AutoSubscribe.AUDIO_ONLY
to automatically subscribe to all audio tracks. This setup allows you to transcribe audio from multiple participants simultaneously, with each participant's audio being processed in a separate task. If you'd like more detailed guidance or code examples, feel free to ask! Sources: Tree View: | dse-workbench
i
can you show me a python example?
t
Here is a Python example for transcribing audio from multiple participants using the LiveKit server API:
Copy code
import asyncio
from livekit import rtc, transcription
from livekit.rtc import AutoSubscribe

async def transcribe_track(participant, track):
    audio_stream = rtc.AudioStream(track)
    stt_forwarder = transcription.STTSegmentsForwarder(
        room=ctx.room, participant=participant, track=track
    )

    stt_stream = transcription.SpeechStream()
    asyncio.create_task(_forward_transcription(stt_stream, stt_forwarder))

    async for ev in audio_stream:
        stt_stream.push_frame(ev.frame)

@ctx.room.on("track_subscribed")
def on_track_subscribed(track, publication, participant):
    if track.kind == rtc.TrackKind.KIND_AUDIO:
        asyncio.create_task(transcribe_track(participant, track))

async def main():
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)

# Replace with your LiveKit server details
ctx = rtc.Context(
    url="<wss://yourhost.livekit.cloud>",
    api_key="livekit-api-key",
    api_secret="your-api-secret"
)

asyncio.run(main())
Before running this code, ensure you have the required dependencies installed:
Copy code
pip install livekit-agents>=0.12.20 livekit-plugins-deepgram>=0.7.3 python-dotenv~=1.0
Also, set up the necessary environment variables:
Copy code
export LIVEKIT_URL=<wss://yourhost.livekit.cloud>
export LIVEKIT_API_KEY=livekit-api-key
export LIVEKIT_API_SECRET=your-api-secret
export OPENAI_API_KEY=your-api-key
Once everything is set up, you can run the script to start transcribing audio from multiple participants. Let me know if you need further assistance! Sources: Tree View: | dse-workbench | Tree View: | dse-workbench
i
ok so the agent by default cant hear multiple participants? we have to have an async task for each agent to participant track subscription?
t
Yes, that's correct. By default, the agent does not automatically handle multiple participants' audio tracks simultaneously. You need to create an asynchronous task for each participant's track subscription. This ensures that each audio track is processed independently, allowing the system to transcribe audio from multiple participants concurrently. The example provided earlier demonstrates how to set up a track subscription handler that listens for new audio tracks and starts a transcription task for each one. This approach ensures scalability and efficient handling of multiple participants in a LiveKit room. If you have further questions or need clarification, feel free to ask!
i
From an existing room/sip call that has participant A and an agent, if the agent makes an outbound call to another phone number to add another participant to the room, and participant B joins the room, im noticing that participant B does not get audio from participant A. Is there any workaround for this? So basically, the agent is having a conversation over SIP in a room with participant A, then the agent invites participant B to the call with create_sip_participant. But when participant B joins the room they cant hear participant A. Why?
t
I'm sorry, but I don't have the information you need at the moment. If you would like a member of the LiveKit team to respond, please ask your question in one of the other channels.