< millions hair 24956> How do I use this code and run it wit LiveKit Community #ask-ai

<@U084APK40NS> How do I use this code and run it? ...

faint-finland-12341

02/27/2025, 2:28 PM

@millions-hair-24956 How do I use this code and run it? with command "python main.py dev"?

Copy code

from __future__ import annotations

import logging

from livekit import rtc
from livekit.agents import (
    AutoSubscribe,
    JobContext,
    WorkerOptions,
    cli,
    llm,
)
from livekit.agents.multimodal import MultimodalAgent
from livekit.plugins import openai


logger = logging.getLogger("myagent")
logger.setLevel(<http://logging.INFO|logging.INFO>)

async def entrypoint(ctx: JobContext):
    <http://logger.info|logger.info>("starting entrypoint")

    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)

    participant = await ctx.wait_for_participant()

    model = openai.realtime.RealtimeModel(
        instructions="You are a helpful assistant and you love kittens",
        voice="shimmer",
        temperature=0.8,
        modalities=["audio", "text"],
    )
    assistant = MultimodalAgent(model=model)
    assistant.start(ctx.room)

    <http://logger.info|logger.info>("starting agent")

    session = model.sessions[0]
    session.conversation.item.create(
      llm.ChatMessage(
        role="assistant",
        content="Please begin the interaction with the user in a manner consistent with your instructions.",
      )
    )
    session.response.create()

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

gray-monkey-83525

02/27/2025, 2:29 PM

To run the provided code, you should use the command

python main.py dev

. This command will start the LiveKit agent in development mode, which is suitable for testing and debugging. Before running the command, ensure that you have set up your environment correctly with the necessary dependencies and environment variables, such as

LIVEKIT_API_KEY

LIVEKIT_API_SECRET

, and

LIVEKIT_URL

. Make sure you have installed the required packages, including

livekit-agents

, and that your environment variables are correctly configured. You can verify your environment setup by printing the environment variables at the start of your script to ensure they are being read correctly. If you need more detailed guidance on setting up your environment or running the script, please let me know. Sources: - https://docs.livekit.io/agents/openai/multimodal-agent/ - https://docs.livekit.io/agents/openai/multimodalagent/ 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

faint-finland-12341

02/27/2025, 2:32 PM

@millions-hair-24956 I have LIVEKIT_API_KEY in my .env but I have such error line 240, in run_worker worker = Worker(args.opts, devmode=args.devmode, loop=loop) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\User\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\livekit\agents\worker.py", line 238, in init raise ValueError( ValueError: api_key is required, or add LIVEKIT_API_KEY in your environment

gray-monkey-83525

02/27/2025, 2:32 PM

The error indicates that the

LIVEKIT_API_KEY

is not being correctly read from your environment variables. Here are a few steps to troubleshoot and resolve this issue: 1. Check .env File: Ensure that your

.env

file is in the correct directory and contains the

LIVEKIT_API_KEY

variable. The file should look something like this:

Copy code

LIVEKIT_API_KEY=your_api_key_here
   LIVEKIT_API_SECRET=your_api_secret_here
   LIVEKIT_URL=wss://your_livekit_url_here

2. Load Environment Variables: Make sure you are using a library like

python-dotenv

to load the environment variables from the

.env

file. You can do this by adding the following code at the beginning of your script:

Copy code

from dotenv import load_dotenv
   import os

   load_dotenv()  # This loads the environment variables from the .env file

   # Optional: Print to verify
   print("LIVEKIT_API_KEY:", os.getenv("LIVEKIT_API_KEY"))

3. Verify Environment Variables: Before running your script, print the environment variables to ensure they are being loaded correctly. This can help you confirm that the

LIVEKIT_API_KEY

is available in your environment. 4. Check Python Environment: Ensure that the Python environment you are using to run the script has access to the

.env

file and the

python-dotenv

package is installed. If you follow these steps and the issue persists, please let me know for further assistance. Sources: - https://docs.livekit.io/python/livekit/agents/index.html#livekit.agents.Worker 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

faint-finland-12341

02/27/2025, 2:36 PM

@millions-hair-24956 why this error exist? WARNING livekit.agents - failed to connect to livekit, retrying in 10s: Cannot connect to host localhost:7880 ssl:default

gray-monkey-83525

02/27/2025, 2:36 PM

The error message indicates that the agent is unable to connect to the LiveKit server at

localhost:7880

. This could be due to several reasons: 1. Server Not Running: Ensure that the LiveKit server is running on your local machine. You can start the server in development mode using the command

livekit-server --dev

. This will bind the server to

localhost:7880

by default. 2. Incorrect Configuration: Verify that your configuration file or environment variables are correctly set to point to the LiveKit server. The

LIVEKIT_URL

should be set to

ws://localhost:7880

if you are running the server locally. 3. Network Issues: Check if there are any network issues or firewall settings that might be blocking the connection to the specified port. 4. SSL Configuration: If you are using SSL, ensure that the server is configured to handle SSL connections. If not, make sure the URL uses

ws://

instead of

wss://

. If you need more detailed guidance on resolving this issue, please let me know. Sources: - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1726262419.738319 - https://docs.livekit.io/client-sdk-flutter/livekit_client/ConnectException-class.html 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

faint-finland-12341

02/27/2025, 2:44 PM

@millions-hair-24956 How do I change this code to also chat with agent (using text) from future import annotations import logging from livekit import rtc from livekit.agents import ( AutoSubscribe, JobContext, WorkerOptions, cli, llm, ) from livekit.agents.multimodal import MultimodalAgent from livekit.plugins import openai from dotenv import load_dotenv load_dotenv() logger = logging.getLogger("myagent") logger.setLevel(logging.INFO) async def entrypoint(ctx: JobContext): logger.info("starting entrypoint") await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY) participant = await ctx.wait_for_participant() model = openai.realtime.RealtimeModel( instructions="You are a helpful assistant and you love kittens", voice="shimmer", temperature=0.8, modalities=["audio", "text"], ) assistant = MultimodalAgent(model=model) assistant.start(ctx.room) logger.info("starting agent") session = model.sessions[0] session.conversation.item.create( llm.ChatMessage( role="assistant", content="Please begin the interaction with the user in a manner consistent with your instructions.", ) ) session.response.create() if name == "__main__": cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

gray-monkey-83525

02/27/2025, 2:44 PM

To enable text-based interaction with the agent, you need to set up a mechanism to send and receive text messages. You can use the

rtc.ChatManager

to handle chat messages within the LiveKit room. Here's how you can modify your code to include text chat functionality: 1. Initialize ChatManager: Create an instance of

rtc.ChatManager

to manage chat messages in the room. 2. Listen for Incoming Messages: Set up an event listener to handle incoming chat messages. 3. Send Messages: Use the

ChatManager

to send messages to the agent. Here's the modified code:

Copy code

from __future__ import annotations

import logging
import asyncio
from livekit import rtc
from livekit.agents import (
    AutoSubscribe,
    JobContext,
    WorkerOptions,
    cli,
    llm,
)
from livekit.agents.multimodal import MultimodalAgent
from livekit.plugins import openai
from dotenv import load_dotenv

load_dotenv()

logger = logging.getLogger("myagent")
logger.setLevel(logging.INFO)

async def entrypoint(ctx: JobContext):
    logger.info("starting entrypoint")

    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)

    participant = await ctx.wait_for_participant()

    model = openai.realtime.RealtimeModel(
        instructions="You are a helpful assistant and you love kittens",
        voice="shimmer",
        temperature=0.8,
        modalities=["audio", "text"],
    )
    assistant = MultimodalAgent(model=model)
    assistant.start(ctx.room)

    logger.info("starting agent")

    session = model.sessions[0]
    session.conversation.item.create(
      llm.ChatMessage(
        role="assistant",
        content="Please begin the interaction with the user in a manner consistent with your instructions.",
      )
    )
    session.response.create()

    # Initialize ChatManager
    chat = rtc.ChatManager(ctx.room)
    logger.info("ChatManager initialized")

    # Listen for incoming chat messages
    @chat.on("message_received")
    def on_chat_received(msg: rtc.ChatMessage):
        logger.info("message_received event fired")
        logger.info(f"Received message object: {msg}")
        if msg.message:
            logger.info(f"Received chat message: {msg.message}")
            asyncio.create_task(answer_from_text(msg.message))

    async def answer_from_text(txt: str):
        chat_ctx = assistant.chat_ctx_copy()
        chat_ctx.append(role="user", text=txt)
        stream = assistant.llm.chat(chat_ctx=chat_ctx)
        await assistant.say(stream)

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

gray-monkey-83525

02/27/2025, 2:45 PM

This code sets up a

ChatManager

to listen for incoming text messages and respond to them using the agent. The

answer_from_text

function processes the text input and generates a response. If you need further assistance or more detailed guidance, please let me know. Sources: - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1729785405.894609 - https://docs.livekit.io/python/livekit/agents/multimodal/multimodal_agent.html#livekit.agents.multimodal.multimodal_agent.MultimodalAgent 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

faint-finland-12341

02/27/2025, 2:50 PM

@millions-hair-24956 I have this error ERROR asyncio - Task exception was never retrieved future: <Task finished name='Task-57' coro=<entrypoint.<locals>.answer_from_text() done, defined at C\Users\User\Desktop\GoIT\ML application \main.py56> exception=AttributeError("'MultimodalAgent' object has no attribute 'llm'")> Traceback (most recent call last): File "C:\Users\User\Desktop\GoIT\ML-application-\main.py", line 59, in answer_from_text stream = assistant.llm.chat(chat_ctx=chat_ctx)

gray-monkey-83525

02/27/2025, 2:50 PM

The error message indicates that the

MultimodalAgent

object does not have an attribute

llm

. This suggests that the

MultimodalAgent

class does not directly expose an

llm

attribute for interacting with the language model. Instead, the

MultimodalAgent

manages the conversation flow internally. To resolve this, you should use the `MultimodalAgent`'s methods to handle text input and generate responses. Here's how you can modify the

answer_from_text

function to work correctly: 1. Use the `MultimodalAgent`'s Methods: Instead of accessing

llm

directly, use the

MultimodalAgent

to handle the conversation. Here's the updated code snippet:

Copy code

async def answer_from_text(txt: str):
    chat_ctx = assistant.chat_ctx_copy()
    chat_ctx.append(role="user", text=txt)
    # Use the MultimodalAgent to handle the chat context
    await assistant.set_chat_ctx(chat_ctx)
    # The agent will automatically handle the response generation

This modification ensures that the

MultimodalAgent

manages the conversation context and response generation without directly accessing an

llm

attribute. If you need further assistance or more detailed guidance, please let me know. Sources: - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1732022090.671149 - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1730673248.486419 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

faint-finland-12341

02/27/2025, 2:57 PM

@millions-hair-24956 The agent still no answer on my text from future import annotations import logging import asyncio from livekit import rtc from livekit.agents.multimodal import MultimodalAgent from livekit.plugins import openai from dotenv import load_dotenv from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli, llm load_dotenv() logger = logging.getLogger("myagent") logger.setLevel(logging.INFO) async def entrypoint(ctx: JobContext): logger.info("starting entrypoint") await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY) participant = await ctx.wait_for_participant() model = openai.realtime.RealtimeModel( instructions="You are a helpful assistant and you love kittens", voice="shimmer", temperature=0.8, modalities=["audio", "text"], ) assistant = MultimodalAgent(model=model) assistant.start(ctx.room) logger.info("starting agent") session = model.sessions[0] session.conversation.item.create( llm.ChatMessage( role="assistant", content="Please begin the interaction with the user in a manner consistent with your instructions.", ) ) session.response.create() # Initialize ChatManager chat = rtc.ChatManager(ctx.room) logger.info("ChatManager initialized") # Listen for incoming chat messages @chat.on("message_received") def on_chat_received(msg: rtc.ChatMessage): logger.info("message_received event fired") logger.info(f"Received message object: {msg}") if msg.message: logger.info(f"Received chat message: {msg.message}") asyncio.create_task(answer_from_text(msg.message)) async def answer_from_text(txt: str): chat_ctx = assistant.chat_ctx_copy() chat_ctx.append(role="user", text=txt) # Use the MultimodalAgent to handle the chat context await assistant.set_chat_ctx(chat_ctx) # The agent will automatically handle the response generation if name == "__main__": cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

gray-monkey-83525

02/27/2025, 2:57 PM

The issue might be related to how the

MultimodalAgent

is expected to handle text input and generate responses. Here are a few steps to ensure the agent responds to text messages: 1. Ensure Correct Context Handling: When you update the chat context with

set_chat_ctx

, make sure the agent is configured to generate a response. The

MultimodalAgent

should automatically handle response generation once the context is updated. 2. Check for Response Events: Ensure that the agent is emitting events when a response is generated. You can listen for events like

agent_speech_committed

to confirm that the agent is processing the input and generating a response. 3. Verify Model Configuration: Double-check that the

RealtimeModel

is correctly configured to handle both audio and text modalities. The

modalities

parameter should include both "audio" and "text". Here's a refined approach to ensure the agent responds:

Copy code

async def answer_from_text(txt: str):
    chat_ctx = assistant.chat_ctx_copy()
    chat_ctx.append(role="user", text=txt)
    await assistant.set_chat_ctx(chat_ctx)
    # Trigger a response generation
    session = model.sessions[0]
    session.response.create()

# Listen for response events
@assistant.on("agent_speech_committed")
def on_agent_speech_committed(msg: llm.ChatMessage):
    logger.info(f"Agent responded with: {msg.content}")

This setup ensures that the agent's response is logged, and you can verify that the agent is processing the input correctly. If the issue persists, consider checking the logs for any warnings or errors that might indicate why the agent is not responding. If you need further assistance or more detailed guidance, please let me know. Sources: - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1732022090.671149 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

faint-finland-12341

02/27/2025, 3:01 PM

@millions-hair-24956 ERROR livekit - failed to emit event agent_speech_committed Traceback (most recent call last): File "C:\Users\User\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\livekit\rtc\event_emitter.py", line 58, in emit callback(*callback_args) File "C:\Users\User\Desktop\GoIT\ML-application-\main.py", line 54, in on_agent_speech_committed logger.info(f"Agent responded with: {msg.content}")

gray-monkey-83525

02/27/2025, 3:01 PM

The error message indicates that there is an issue with the

on_agent_speech_committed

event handler, specifically with the logging statement. The problem seems to be with the way the logger is being used. The

<http://logger.info|logger.info>

syntax is incorrect and should be replaced with a standard logging call. Here's how you can fix the logging statement: 1. Correct the Logger Usage: Replace the incorrect logger syntax with the standard

logger.info

method. Here's the corrected code snippet:

Copy code

# Listen for response events
@assistant.on("agent_speech_committed")
def on_agent_speech_committed(msg: llm.ChatMessage):
    logger.info(f"Agent responded with: {msg.content}")

Make sure that all logging statements in your code use the correct syntax, such as

logger.info("message")

logger.debug("message")

, etc. If you continue to experience issues, ensure that the

logger

is properly configured at the beginning of your script, and that the logging level is set to capture the desired level of detail. If you need further assistance or more detailed guidance, please let me know. Sources: - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1732022090.671149 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

faint-finland-12341

02/27/2025, 3:04 PM

gray-monkey-83525

02/27/2025, 3:04 PM

The error you're encountering is due to an incorrect syntax in the logging statement within the

on_agent_speech_committed

event handler. The

<http://logger.info|logger.info>

syntax is not valid in Python. You should use the standard logging method provided by Python's

logging

module. Here's how you can fix the issue: 1. Correct the Logger Syntax: Replace the incorrect logger syntax with the correct

logger.info

method. Here's the corrected code snippet:

Copy code

# Listen for response events
@assistant.on("agent_speech_committed")
def on_agent_speech_committed(msg: llm.ChatMessage):
    logger.info(f"Agent responded with: {msg.content}")

Ensure that all logging statements in your code use the correct syntax, such as

logger.info("message")

logger.debug("message")

, etc. This should resolve the error and allow the event to be emitted correctly. If you need further assistance or more detailed guidance, please let me know. Sources: - https://docs.livekit.io/agents/openai/multimodal-agent/ - https://docs.livekit.io/agents/openai/multimodalagent/ 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

faint-finland-12341

02/27/2025, 3:09 PM

@millions-hair-24956 How I modify my code that agent will answer on audio and text? from future import annotations import logging import asyncio from livekit import rtc from livekit.agents.multimodal import MultimodalAgent from livekit.plugins import openai from dotenv import load_dotenv from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli, llm load_dotenv() logger = logging.getLogger("myagent") logger.setLevel(logging.INFO) async def entrypoint(ctx: JobContext): logger.info("starting entrypoint") await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY) participant = await ctx.wait_for_participant() model = openai.realtime.RealtimeModel( instructions="You are a helpful assistant and you love kittens", voice="shimmer", temperature=0.8, modalities=["audio", "text"], ) assistant = MultimodalAgent(model=model) assistant.start(ctx.room) logger.info("starting agent") session = model.sessions[0] session.conversation.item.create( llm.ChatMessage( role="assistant", content="Please begin the interaction with the user in a manner consistent with your instructions.", ) ) session.response.create() @agent.on("user_speech_committed") def on_user_speech_committed(msg: llm.ChatMessage): # convert string lists to strings, drop images if isinstance(msg.content, list): msg.content = "\n".join( "[image]" if isinstance(x, llm.ChatImage) else x for x in msg ) log_queue.put_nowait(f"[{datetime.now()}] USER:\n{msg.content}\n\n") if name == "__main__": cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

gray-monkey-83525

02/27/2025, 3:10 PM

To modify your code so that the agent can respond to both audio and text inputs, you need to ensure that the

MultimodalAgent

is set up to handle both modalities and that you have the appropriate event handlers to process and respond to these inputs. Here's how you can achieve this: 1. Ensure Modalities are Set: The

RealtimeModel

should have both "audio" and "text" in the

modalities

parameter, which you have already done. 2. Handle Text Input: Use the

ChatManager

to listen for text messages and process them. 3. Handle Audio Input: Use the

MultimodalAgent

to process audio input and generate responses. 4. Emit and Handle Events: Ensure that you have event handlers for both text and audio responses. Here's the modified code:

gray-monkey-83525

02/27/2025, 3:10 PM

Copy code

from __future__ import annotations

import logging
import asyncio
from livekit import rtc
from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli, llm
from livekit.agents.multimodal import MultimodalAgent
from livekit.plugins import openai
from dotenv import load_dotenv
from datetime import datetime

load_dotenv()

logger = logging.getLogger("myagent")
logger.setLevel(<http://logging.INFO|logging.INFO>)

async def entrypoint(ctx: JobContext):
    <http://logger.info|logger.info>("starting entrypoint")

    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)

    participant = await ctx.wait_for_participant()

    model = openai.realtime.RealtimeModel(
        instructions="You are a helpful assistant and you love kittens",
        voice="shimmer",
        temperature=0.8,
        modalities=["audio", "text"],
    )
    assistant = MultimodalAgent(model=model)
    assistant.start(ctx.room)

    <http://logger.info|logger.info>("starting agent")

    session = model.sessions[0]
    session.conversation.item.create(
      llm.ChatMessage(
        role="assistant",
        content="Please begin the interaction with the user in a manner consistent with your instructions.",
      )
    )
    session.response.create()

    # Initialize ChatManager for text input
    chat = rtc.ChatManager(ctx.room)
    <http://logger.info|logger.info>("ChatManager initialized")

    # Listen for incoming chat messages
    @chat.on("message_received")
    def on_chat_received(msg: rtc.ChatMessage):
        <http://logger.info|logger.info>("message_received event fired")
        <http://logger.info|logger.info>(f"Received message object: {msg}")
        if msg.message:
            <http://logger.info|logger.info>(f"Received chat message: {msg.message}")
            asyncio.create_task(answer_from_text(msg.message))

    async def answer_from_text(txt: str):
        chat_ctx = assistant.chat_ctx_copy()
        chat_ctx.append(role="user", text=txt)
        await assistant.set_chat_ctx(chat_ctx)
        session.response.create()

    # Handle user speech committed event
    @assistant.on("user_speech_committed")
    def on_user_speech_committed(msg: llm.ChatMessage):
        if isinstance(msg.content, list):
            msg.content = "\n".join(
                "[image]" if isinstance(x, llm.ChatImage) else x for x in msg.content
            )
        <http://logger.info|logger.info>(f"[{datetime.now()}] USER:\n{msg.content}\n\n")

    # Handle agent speech committed event
    @assistant.on("agent_speech_committed")
    def on_agent_speech_committed(msg: llm.ChatMessage):
        <http://logger.info|logger.info>(f"Agent responded with: {msg.content}")

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

gray-monkey-83525

02/27/2025, 3:10 PM

This code sets up the

MultimodalAgent

to handle both audio and text inputs, and it includes event handlers to log and respond to these inputs. Make sure your environment is correctly set up with the necessary dependencies and configurations. If you need further assistance or more detailed guidance, please let me know. Sources: - https://docs.livekit.io/agents/openai/multimodal-agent/ - https://docs.livekit.io/agents/openai/multimodalagent/ 👍 1 button 👎 0 button 💡 Hint: Mention @millions-hair-24956 in the thread for followups.

faint-finland-12341

02/27/2025, 3:17 PM

@millions-hair-24956 ERROR livekit - failed to emit event agent_speech_committed Traceback (most recent call last): File "C:\Users\User\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\livekit\rtc\event_emitter.py", line 58, in emit callback(*callback_args) File "C:\Users\User\Desktop\GoIT\ML-application-\main.py", line 75, in on_agent_speech_committed logger.info(f"Agent responded with: {msg.content}")

gray-monkey-83525

02/27/2025, 3:17 PM

The error you're encountering is due to an incorrect syntax in the logging statement within the

on_agent_speech_committed

event handler. The

<http://logger.info|logger.info>

syntax is not valid in Python. You should use the standard logging method provided by Python's

logging

module. Here's how you can fix the issue: 1. Correct the Logger Syntax: Replace the incorrect logger syntax with the correct

logger.info

method. Here's the corrected code snippet:

Copy code

# Handle agent speech committed event
@assistant.on("agent_speech_committed")
def on_agent_speech_committed(msg: llm.ChatMessage):
    logger.info(f"Agent responded with: {msg.content}")

Ensure that all logging statements in your code use the correct syntax, such as

logger.info("message")

logger.debug("message")

, etc. This should resolve the error and allow the event to be emitted correctly. If you continue to experience issues, ensure that the

logger

is properly configured at the beginning of your script, and that the logging level is set to capture the desired level of detail. If you need further assistance or more detailed guidance, please let me know. Sources: - https://docs.livekit.io/agents/openai/multimodal-agent/ 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

faint-finland-12341

02/27/2025, 3:21 PM

@millions-hair-24956 How can I modify my code with that text from future import annotations import logging import asyncio from livekit import rtc from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli, llm from livekit.agents.multimodal import MultimodalAgent from livekit.plugins import openai from dotenv import load_dotenv from datetime import datetime load_dotenv() logger = logging.getLogger("myagent") logger.setLevel(logging.INFO) async def entrypoint(ctx: JobContext): logger.info("starting entrypoint") await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY) participant = await ctx.wait_for_participant() model = openai.realtime.RealtimeModel( instructions="You are a helpful assistant and you want to help people", voice="shimmer", temperature=0.8, modalities=["audio", "text"], ) assistant = MultimodalAgent(model=model) assistant.start(ctx.room) logger.info("starting agent") session = model.sessions[0] session.conversation.item.create( llm.ChatMessage( role="assistant", content="Please begin the interaction with the user in a manner consistent with your instructions.", ) ) session.response.create() # Initialize ChatManager for text input chat = rtc.ChatManager(ctx.room) logger.info("ChatManager initialized") # Listen for incoming chat messages @chat.on("message_received") def on_chat_received(msg: rtc.ChatMessage): logger.info("message_received event fired") logger.info(f"Received message object: {msg}") if msg.message: logger.info(f"Received chat message: {msg.message}") asyncio.create_task(answer_from_text(msg.message)) async def answer_from_text(txt: str): chat_ctx = assistant.chat_ctx_copy() chat_ctx.append(role="user", text=txt) await assistant.set_chat_ctx(chat_ctx) session.response.create() # Handle user speech committed event @assistant.on("user_speech_committed") def on_user_speech_committed(msg: llm.ChatMessage): if isinstance(msg.content, list): msg.content = "\n".join( "[image]" if isinstance(x, llm.ChatImage) else x for x in msg.content ) logger.info(f"[{datetime.now()}] USER:\n{msg.content}\n\n") # Handle agent speech committed event @assistant.on("agent_speech_committed") def on_agent_speech_committed(msg: llm.ChatMessage): logger.info(f"Agent responded with: {msg.content}") await asyncio.sleep(1) await assistant.say("Hello, I am here to assist you in completing a Geriatric Depression Scale screening. The GDS-15 is a series of 15 questions that will help assess your emotional well-being. For each question, please answer with either 'yes' or 'no'. If you're unsure, please feel free to ask me for clarification. Let's get started!", allow_interruptions=True) if name == "__main__": cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

gray-monkey-83525

02/27/2025, 3:21 PM

To modify your code to handle both text and audio interactions correctly, you need to ensure that the logging statements are using the correct syntax and that the agent is set up to handle both modalities. Here's how you can adjust your code: 1. Correct Logger Syntax: Replace the incorrect logger syntax with the standard

logger.info

method. 2. Ensure Modalities are Set: The

RealtimeModel

should have both "audio" and "text" in the

modalities

parameter, which you have already done. 3. Handle Text and Audio: Ensure that the agent can process both text and audio inputs and generate responses accordingly. Here's the corrected code:

gray-monkey-83525

02/27/2025, 3:22 PM

Copy code

from __future__ import annotations

import logging
import asyncio
from livekit import rtc
from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli, llm
from livekit.agents.multimodal import MultimodalAgent
from livekit.plugins import openai
from dotenv import load_dotenv
from datetime import datetime

load_dotenv()

logger = logging.getLogger("myagent")
logger.setLevel(<http://logging.INFO|logging.INFO>)

async def entrypoint(ctx: JobContext):
    <http://logger.info|logger.info>("starting entrypoint")

    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)

    participant = await ctx.wait_for_participant()

    model = openai.realtime.RealtimeModel(
        instructions="You are a helpful assistant and you want to help people",
        voice="shimmer",
        temperature=0.8,
        modalities=["audio", "text"],
    )
    assistant = MultimodalAgent(model=model)
    assistant.start(ctx.room)

    <http://logger.info|logger.info>("starting agent")

    session = model.sessions[0]
    session.conversation.item.create(
      llm.ChatMessage(
        role="assistant",
        content="Please begin the interaction with the user in a manner consistent with your instructions.",
      )
    )
    session.response.create()

    # Initialize ChatManager for text input
    chat = rtc.ChatManager(ctx.room)
    <http://logger.info|logger.info>("ChatManager initialized")

    # Listen for incoming chat messages
    @chat.on("message_received")
    def on_chat_received(msg: rtc.ChatMessage):
        <http://logger.info|logger.info>("message_received event fired")
        <http://logger.info|logger.info>(f"Received message object: {msg}")
        if msg.message:
            <http://logger.info|logger.info>(f"Received chat message: {msg.message}")
            asyncio.create_task(answer_from_text(msg.message))

    async def answer_from_text(txt: str):
        chat_ctx = assistant.chat_ctx_copy()
        chat_ctx.append(role="user", text=txt)
        await assistant.set_chat_ctx(chat_ctx)
        session.response.create()

    # Handle user speech committed event
    @assistant.on("user_speech_committed")
    def on_user_speech_committed(msg: llm.ChatMessage):
        if isinstance(msg.content, list):
            msg.content = "\n".join(
                "[image]" if isinstance(x, llm.ChatImage) else x for x in msg.content
            )
        <http://logger.info|logger.info>(f"[{datetime.now()}] USER:\n{msg.content}\n\n")

    # Handle agent speech committed event
    @assistant.on("agent_speech_committed")
    def on_agent_speech_committed(msg: llm.ChatMessage):
        <http://logger.info|logger.info>(f"Agent responded with: {msg.content}")

    await asyncio.sleep(1)
    await assistant.say("Hello, I am here to assist you in completing a Geriatric Depression Scale screening. The GDS-15 is a series of 15 questions that will help assess your emotional well-being. For each question, please answer with either 'yes' or 'no'. If you're unsure, please feel free to ask me for clarification. Let's get started!", allow_interruptions=True)

if __name__ == "__main__":

gray-monkey-83525

02/27/2025, 3:22 PM

Copy code

cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

gray-monkey-83525

02/27/2025, 3:22 PM

This code ensures that the agent can handle both text and audio inputs, and it includes event handlers to log and respond to these inputs. Make sure your environment is correctly set up with the necessary dependencies and configurations. If you need further assistance or more detailed guidance, please let me know. Sources: - https://docs.livekit.io/agents/openai/multimodal-agent/ 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

Open in Slack

Previous Next