Hey everyone Anyone knows how to let agent subscribe to ever LiveKit Community #ask-ai

Hey everyone, Anyone knows, how to let agent subsc...

orange-tiger-19162

02/15/2025, 5:06 AM

Hey everyone, Anyone knows, how to let agent subscribe to every participant in the call so that it could hear from every participant track? Currently every participant can hear what agent is saying but agent could hear from only last participant.

gray-monkey-83525

02/15/2025, 5:06 AM

To allow an agent to subscribe to every participant's audio track in a LiveKit room, you need to ensure that the agent is set up to subscribe to all audio tracks. By default, the

MultimodalAgent

can only subscribe to a single audio track at a time, which is why it might only be hearing the last participant. You can modify the agent to handle multiple subscriptions by iterating over all participants and subscribing to their audio tracks. According to the LiveKit documentation, you can use the

AutoSubscribe

option to automatically subscribe to audio tracks. However, for more control, you might need to manually subscribe to each participant's track using the

setSubscribed

method on each

TrackPublication

. Would you like more detailed guidance on implementing this? Sources: - LiveKit documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

orange-tiger-19162

02/15/2025, 5:07 AM

orange-tiger-19162

02/15/2025, 5:34 AM

orange-tiger-19162

02/15/2025, 5:36 AM

How to enable this feature?

orange-tiger-19162

02/15/2025, 5:46 AM

@millions-hair-24956 please make changes to enable this. Write code.

gray-monkey-83525

02/15/2025, 5:46 AM

I'm here to help with questions related to LiveKit. If you have a specific question about how to implement a feature in LiveKit, please provide more details or clarify your request. 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

orange-tiger-19162

02/15/2025, 5:47 AM

@millions-hair-24956 pls enable this via code : To allow an agent to subscribe to every participant's audio track in a LiveKit room, you need to ensure that the agent is set up to subscribe to all audio tracks. By default, the

MultimodalAgent

AutoSubscribe

option to automatically subscribe to audio tracks. However, for more control, you might need to manually subscribe to each participant's track using the

setSubscribed

method on each

TrackPublication

gray-monkey-83525

02/15/2025, 5:47 AM

To enable an agent to subscribe to every participant's audio track in a LiveKit room, you can manually iterate over all participants and subscribe to their audio tracks. Here's a Python code snippet that demonstrates how to achieve this using the LiveKit SDK:

Copy code

from livekit import rtc

async def subscribe_to_all_audio_tracks(room: rtc.Room):
    # Iterate over all remote participants in the room
    for participant in room.remoteParticipants.values():
        # Iterate over all track publications for each participant
        for publication in participant.trackPublications.values():
            # Check if the track is an audio track
            if publication.kind == rtc.TrackKind.KIND_AUDIO:
                # Subscribe to the audio track
                publication.setSubscribed(True)

# Example usage
async def main():
    # Connect to the room
    room = rtc.Room()
    await room.connect('ws://your_livekit_server', 'your_access_token')

    # Subscribe to all audio tracks
    await subscribe_to_all_audio_tracks(room)

    # Keep the connection alive
    try:
        while True:
            await asyncio.sleep(1)
    except KeyboardInterrupt:
        print("Disconnecting from room...")
        await room.disconnect()

# Run the main function
if __name__ == '__main__':
    import asyncio
    asyncio.run(main())

This code connects to a LiveKit room and subscribes to all audio tracks published by remote participants. Make sure to replace

'ws://your_livekit_server'

and

'your_access_token'

with your actual LiveKit server URL and access token. Would you like more detailed guidance on any specific part of this implementation? Sources: - https://docs.livekit.io/home/client/tracks/subscribe/#radix-%3AR9dkvfffep9tkq%3A-content-React - https://docs.livekit.io/guides/room/receive#radix-%3AR9dkvfffep9tkq%3A-content-React 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

orange-tiger-19162

02/15/2025, 6:36 AM

@millions-hair-24956 make changes in multimodal agent code, pls

gray-monkey-83525

02/15/2025, 6:36 AM

I'm here to assist with questions related to LiveKit. If you have a specific question about modifying the MultimodalAgent in LiveKit, please provide more details or clarify your request. 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

orange-tiger-19162

02/15/2025, 6:37 AM

@millions-hair-24956 In multimodalagent in livekit, make changes so that agent could hear the microphone from every participant.

gray-monkey-83525

02/15/2025, 6:37 AM

To modify the

MultimodalAgent

in LiveKit to subscribe to the microphone audio track from every participant, you need to adjust the agent's logic to handle multiple audio tracks. Here's a code snippet that demonstrates how you can achieve this:

Copy code

from livekit import rtc
from livekit.agents.multimodal import MultimodalAgent

class CustomMultimodalAgent(MultimodalAgent):
    def __init__(self, model, chat_ctx=None, fnc_ctx=None):
        super().__init__(model=model, chat_ctx=chat_ctx, fnc_ctx=fnc_ctx)

    def start(self, room: rtc.Room):
        super().start(room)
        # Subscribe to all participants' audio tracks
        for participant in room.remote_participants.values():
            self._subscribe_to_all_audio_tracks(participant)

    def _subscribe_to_all_audio_tracks(self, participant: rtc.RemoteParticipant):
        for publication in participant.track_publications.values():
            if publication.source == rtc.TrackSource.SOURCE_MICROPHONE:
                publication.set_subscribed(True)

# Example usage
async def main():
    # Connect to the room
    room = rtc.Room()
    await room.connect('ws://your_livekit_server', 'your_access_token')

    # Initialize the model and agent
    model = ...  # Initialize your model here
    agent = CustomMultimodalAgent(model=model)

    # Start the agent
    agent.start(room)

    # Keep the connection alive
    try:
        while True:
            await asyncio.sleep(1)
    except KeyboardInterrupt:
        print("Disconnecting from room...")
        await room.disconnect()

# Run the main function
if __name__ == '__main__':
    import asyncio
    asyncio.run(main())

This code extends the

MultimodalAgent

to create a

CustomMultimodalAgent

that subscribes to all microphone audio tracks from each participant in the room. Make sure to replace

'ws://your_livekit_server'

and

'your_access_token'

with your actual LiveKit server URL and access token, and initialize your model accordingly. Would you like more detailed guidance on any specific part of this implementation? Sources: - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1729846127.266859 - https://docs.livekit.io/guides/room/receive 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

orange-tiger-19162

02/15/2025, 6:41 AM

@millions-hair-24956 In multimodalagent in livekit, make changes so that agent could hear the microphone from every participant and respond to every participant . Implement in openai_agent.py file with entry point.

gray-monkey-83525

02/15/2025, 6:41 AM

To modify the

MultimodalAgent

in LiveKit to subscribe to the microphone audio track from every participant, you can implement the following changes in your

openai_agent.py

file. This involves creating a custom agent that iterates over all participants and subscribes to their audio tracks. Here's how you can do it:

Copy code

from livekit import rtc
from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli
from livekit.agents.multimodal import MultimodalAgent
from livekit.plugins import openai

class CustomMultimodalAgent(MultimodalAgent):
    def __init__(self, model, chat_ctx=None, fnc_ctx=None):
        super().__init__(model=model, chat_ctx=chat_ctx, fnc_ctx=fnc_ctx)

    def start(self, room: rtc.Room):
        super().start(room)
        # Subscribe to all participants' audio tracks
        for participant in room.remote_participants.values():
            self._subscribe_to_all_audio_tracks(participant)

    def _subscribe_to_all_audio_tracks(self, participant: rtc.RemoteParticipant):
        for publication in participant.track_publications.values():
            if publication.source == rtc.TrackSource.SOURCE_MICROPHONE:
                publication.set_subscribed(True)

async def entrypoint(ctx: JobContext):
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)

    # Initialize OpenAI Realtime model
    model = openai.realtime.RealtimeModel(
        instructions="You are a helpful assistant.",
        voice="alloy",
        temperature=0.8,
        modalities=["audio", "text"]
    )

    # Create and start the custom agent
    agent = CustomMultimodalAgent(model=model)
    agent.start(ctx.room)

    # Keep the connection alive
    try:
        while True:
            await asyncio.sleep(1)
    except KeyboardInterrupt:
        print("Disconnecting from room...")
        await ctx.room.disconnect()

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

This code defines a

CustomMultimodalAgent

that subscribes to all microphone audio tracks from each participant in the room. The

entrypoint

function connects to the room and starts the agent. Make sure to replace the model initialization with your specific model configuration. Would you like more detailed guidance on any specific part of this implementation? Sources: - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1729145467.871189 - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1729088577.610519 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

orange-tiger-19162

02/15/2025, 6:45 AM

gray-monkey-83525

02/15/2025, 6:45 AM

To modify the

MultimodalAgent

in LiveKit so that it can hear and respond to every participant, you need to ensure that the agent subscribes to all participants' audio tracks and processes each participant's input. Here's how you can implement this in your

openai_agent.py

file:

Copy code

from livekit import rtc
from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli
from livekit.agents.multimodal import MultimodalAgent
from livekit.plugins import openai

class CustomMultimodalAgent(MultimodalAgent):
    def __init__(self, model, chat_ctx=None, fnc_ctx=None):
        super().__init__(model=model, chat_ctx=chat_ctx, fnc_ctx=fnc_ctx)

    def start(self, room: rtc.Room):
        super().start(room)
        # Subscribe to all participants' audio tracks
        for participant in room.remote_participants.values():
            self._subscribe_to_all_audio_tracks(participant)

    def _subscribe_to_all_audio_tracks(self, participant: rtc.RemoteParticipant):
        for publication in participant.track_publications.values():
            if publication.source == rtc.TrackSource.SOURCE_MICROPHONE:
                publication.set_subscribed(True)

async def entrypoint(ctx: JobContext):
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)

    # Initialize OpenAI Realtime model
    model = openai.realtime.RealtimeModel(
        instructions="You are a helpful assistant.",
        voice="alloy",
        temperature=0.8,
        modalities=["audio", "text"]
    )

    # Create and start the custom agent
    agent = CustomMultimodalAgent(model=model)
    agent.start(ctx.room)

    # Keep the connection alive
    try:
        while True:
            await asyncio.sleep(1)
    except KeyboardInterrupt:
        print("Disconnecting from room...")
        await ctx.room.disconnect()

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

This code defines a

CustomMultimodalAgent

that subscribes to all microphone audio tracks from each participant in the room. The

entrypoint

function connects to the room and starts the agent. Make sure to replace the model initialization with your specific model configuration. Would you like more detailed guidance on any specific part of this implementation? Sources: - LiveKit Slack discussion 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

orange-tiger-19162

02/15/2025, 1:05 PM

@millions-hair-24956 Above code doesn’t work

gray-monkey-83525

02/15/2025, 1:05 PM

It seems likely that the issue might be related to how the

MultimodalAgent

is set up to handle multiple participants. The

MultimodalAgent

typically subscribes to a single audio track at a time, so modifications are needed to handle multiple participants effectively. Here's a revised approach: 1. Ensure All Participants Are Subscribed: Iterate over all participants and subscribe to their audio tracks. 2. Handle Events for Each Participant: Set up event listeners to manage audio input and responses for each participant. Here's an updated version of the code:

Copy code

from livekit import rtc
from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli
from livekit.agents.multimodal import MultimodalAgent
from livekit.plugins import openai

class CustomMultimodalAgent(MultimodalAgent):
    def __init__(self, model, chat_ctx=None, fnc_ctx=None):
        super().__init__(model=model, chat_ctx=chat_ctx, fnc_ctx=fnc_ctx)

    def start(self, room: rtc.Room):
        super().start(room)
        # Subscribe to all participants' audio tracks
        for participant in room.remote_participants.values():
            self._subscribe_to_all_audio_tracks(participant)

        # Listen for new participants joining
        room.on("participant_connected", self._on_participant_connected)

    def _subscribe_to_all_audio_tracks(self, participant: rtc.RemoteParticipant):
        for publication in participant.track_publications.values():
            if publication.source == rtc.TrackSource.SOURCE_MICROPHONE:
                publication.set_subscribed(True)

    def _on_participant_connected(self, participant: rtc.RemoteParticipant):
        self._subscribe_to_all_audio_tracks(participant)

async def entrypoint(ctx: JobContext):
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)

    # Initialize OpenAI Realtime model
    model = openai.realtime.RealtimeModel(
        instructions="You are a helpful assistant.",
        voice="alloy",
        temperature=0.8,
        modalities=["audio", "text"]
    )

    # Create and start the custom agent
    agent = CustomMultimodalAgent(model=model)
    agent.start(ctx.room)

    # Keep the connection alive
    try:
        while True:
            await asyncio.sleep(1)
    except KeyboardInterrupt:
        print("Disconnecting from room...")
        await ctx.room.disconnect()

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Key Changes: - Event Listener for New Participants: Added a listener for the

participant_connected

event to subscribe to new participants' audio tracks as they join. - Iterate Over Participants: Ensure that all participants' audio tracks are subscribed to when the agent starts. If this approach still doesn't resolve the issue, please let me know, and we can explore further adjustments. Sources: - LiveKit Slack discussion 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

Open in Slack

Previous Next