<@U084APK40NS> How do I use this code and run it? ...
# ask-ai
f
@millions-hair-24956 How do I use this code and run it? with command "python main.py dev"?
Copy code
from __future__ import annotations

import logging

from livekit import rtc
from livekit.agents import (
    AutoSubscribe,
    JobContext,
    WorkerOptions,
    cli,
    llm,
)
from livekit.agents.multimodal import MultimodalAgent
from livekit.plugins import openai


logger = logging.getLogger("myagent")
logger.setLevel(<http://logging.INFO|logging.INFO>)

async def entrypoint(ctx: JobContext):
    <http://logger.info|logger.info>("starting entrypoint")

    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)

    participant = await ctx.wait_for_participant()

    model = openai.realtime.RealtimeModel(
        instructions="You are a helpful assistant and you love kittens",
        voice="shimmer",
        temperature=0.8,
        modalities=["audio", "text"],
    )
    assistant = MultimodalAgent(model=model)
    assistant.start(ctx.room)

    <http://logger.info|logger.info>("starting agent")

    session = model.sessions[0]
    session.conversation.item.create(
      llm.ChatMessage(
        role="assistant",
        content="Please begin the interaction with the user in a manner consistent with your instructions.",
      )
    )
    session.response.create()

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
g
To run the provided code, you should use the command
python main.py dev
. This command will start the LiveKit agent in development mode, which is suitable for testing and debugging. Before running the command, ensure that you have set up your environment correctly with the necessary dependencies and environment variables, such as
LIVEKIT_API_KEY
,
LIVEKIT_API_SECRET
, and
LIVEKIT_URL
. Make sure you have installed the required packages, including
livekit-agents
, and that your environment variables are correctly configured. You can verify your environment setup by printing the environment variables at the start of your script to ensure they are being read correctly. If you need more detailed guidance on setting up your environment or running the script, please let me know. Sources: - https://docs.livekit.io/agents/openai/multimodal-agent/ - https://docs.livekit.io/agents/openai/multimodalagent/ 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
f
@millions-hair-24956 I have LIVEKIT_API_KEY in my .env but I have such error line 240, in run_worker worker = Worker(args.opts, devmode=args.devmode, loop=loop) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\User\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\livekit\agents\worker.py", line 238, in init raise ValueError( ValueError: api_key is required, or add LIVEKIT_API_KEY in your environment
g
The error indicates that the
LIVEKIT_API_KEY
is not being correctly read from your environment variables. Here are a few steps to troubleshoot and resolve this issue: 1. Check .env File: Ensure that your
.env
file is in the correct directory and contains the
LIVEKIT_API_KEY
variable. The file should look something like this:
Copy code
LIVEKIT_API_KEY=your_api_key_here
   LIVEKIT_API_SECRET=your_api_secret_here
   LIVEKIT_URL=wss://your_livekit_url_here
2. Load Environment Variables: Make sure you are using a library like
python-dotenv
to load the environment variables from the
.env
file. You can do this by adding the following code at the beginning of your script:
Copy code
from dotenv import load_dotenv
   import os

   load_dotenv()  # This loads the environment variables from the .env file

   # Optional: Print to verify
   print("LIVEKIT_API_KEY:", os.getenv("LIVEKIT_API_KEY"))
3. Verify Environment Variables: Before running your script, print the environment variables to ensure they are being loaded correctly. This can help you confirm that the
LIVEKIT_API_KEY
is available in your environment. 4. Check Python Environment: Ensure that the Python environment you are using to run the script has access to the
.env
file and the
python-dotenv
package is installed. If you follow these steps and the issue persists, please let me know for further assistance. Sources: - https://docs.livekit.io/python/livekit/agents/index.html#livekit.agents.Worker 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
f
@millions-hair-24956 why this error exist? WARNING livekit.agents - failed to connect to livekit, retrying in 10s: Cannot connect to host localhost:7880 ssl:default
g
The error message indicates that the agent is unable to connect to the LiveKit server at
localhost:7880
. This could be due to several reasons: 1. Server Not Running: Ensure that the LiveKit server is running on your local machine. You can start the server in development mode using the command
livekit-server --dev
. This will bind the server to
localhost:7880
by default. 2. Incorrect Configuration: Verify that your configuration file or environment variables are correctly set to point to the LiveKit server. The
LIVEKIT_URL
should be set to
ws://localhost:7880
if you are running the server locally. 3. Network Issues: Check if there are any network issues or firewall settings that might be blocking the connection to the specified port. 4. SSL Configuration: If you are using SSL, ensure that the server is configured to handle SSL connections. If not, make sure the URL uses
ws://
instead of
wss://
. If you need more detailed guidance on resolving this issue, please let me know. Sources: - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1726262419.738319 - https://docs.livekit.io/client-sdk-flutter/livekit_client/ConnectException-class.html 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
f
@millions-hair-24956 How do I change this code to also chat with agent (using text) from future import annotations import logging from livekit import rtc from livekit.agents import ( AutoSubscribe, JobContext, WorkerOptions, cli, llm, ) from livekit.agents.multimodal import MultimodalAgent from livekit.plugins import openai from dotenv import load_dotenv load_dotenv() logger = logging.getLogger("myagent") logger.setLevel(logging.INFO) async def entrypoint(ctx: JobContext): logger.info("starting entrypoint") await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY) participant = await ctx.wait_for_participant() model = openai.realtime.RealtimeModel( instructions="You are a helpful assistant and you love kittens", voice="shimmer", temperature=0.8, modalities=["audio", "text"], ) assistant = MultimodalAgent(model=model) assistant.start(ctx.room) logger.info("starting agent") session = model.sessions[0] session.conversation.item.create( llm.ChatMessage( role="assistant", content="Please begin the interaction with the user in a manner consistent with your instructions.", ) ) session.response.create() if name == "__main__": cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
g
To enable text-based interaction with the agent, you need to set up a mechanism to send and receive text messages. You can use the
rtc.ChatManager
to handle chat messages within the LiveKit room. Here's how you can modify your code to include text chat functionality: 1. Initialize ChatManager: Create an instance of
rtc.ChatManager
to manage chat messages in the room. 2. Listen for Incoming Messages: Set up an event listener to handle incoming chat messages. 3. Send Messages: Use the
ChatManager
to send messages to the agent. Here's the modified code:
Copy code
from __future__ import annotations

import logging
import asyncio
from livekit import rtc
from livekit.agents import (
    AutoSubscribe,
    JobContext,
    WorkerOptions,
    cli,
    llm,
)
from livekit.agents.multimodal import MultimodalAgent
from livekit.plugins import openai
from dotenv import load_dotenv

load_dotenv()

logger = logging.getLogger("myagent")
logger.setLevel(logging.INFO)

async def entrypoint(ctx: JobContext):
    logger.info("starting entrypoint")

    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)

    participant = await ctx.wait_for_participant()

    model = openai.realtime.RealtimeModel(
        instructions="You are a helpful assistant and you love kittens",
        voice="shimmer",
        temperature=0.8,
        modalities=["audio", "text"],
    )
    assistant = MultimodalAgent(model=model)
    assistant.start(ctx.room)

    logger.info("starting agent")

    session = model.sessions[0]
    session.conversation.item.create(
      llm.ChatMessage(
        role="assistant",
        content="Please begin the interaction with the user in a manner consistent with your instructions.",
      )
    )
    session.response.create()

    # Initialize ChatManager
    chat = rtc.ChatManager(ctx.room)
    logger.info("ChatManager initialized")

    # Listen for incoming chat messages
    @chat.on("message_received")
    def on_chat_received(msg: rtc.ChatMessage):
        logger.info("message_received event fired")
        logger.info(f"Received message object: {msg}")
        if msg.message:
            logger.info(f"Received chat message: {msg.message}")
            asyncio.create_task(answer_from_text(msg.message))

    async def answer_from_text(txt: str):
        chat_ctx = assistant.chat_ctx_copy()
        chat_ctx.append(role="user", text=txt)
        stream = assistant.llm.chat(chat_ctx=chat_ctx)
        await assistant.say(stream)

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
This code sets up a
ChatManager
to listen for incoming text messages and respond to them using the agent. The
answer_from_text
function processes the text input and generates a response. If you need further assistance or more detailed guidance, please let me know. Sources: - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1729785405.894609 - https://docs.livekit.io/python/livekit/agents/multimodal/multimodal_agent.html#livekit.agents.multimodal.multimodal_agent.MultimodalAgent 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
f
@millions-hair-24956 I have this error ERROR asyncio - Task exception was never retrieved future: <Task finished name='Task-57' coro=<entrypoint.<locals>.answer_from_text() done, defined at C\Users\User\Desktop\GoIT\ML application \main.py56> exception=AttributeError("'MultimodalAgent' object has no attribute 'llm'")> Traceback (most recent call last): File "C:\Users\User\Desktop\GoIT\ML-application-\main.py", line 59, in answer_from_text stream = assistant.llm.chat(chat_ctx=chat_ctx)
g
The error message indicates that the
MultimodalAgent
object does not have an attribute
llm
. This suggests that the
MultimodalAgent
class does not directly expose an
llm
attribute for interacting with the language model. Instead, the
MultimodalAgent
manages the conversation flow internally. To resolve this, you should use the `MultimodalAgent`'s methods to handle text input and generate responses. Here's how you can modify the
answer_from_text
function to work correctly: 1. Use the `MultimodalAgent`'s Methods: Instead of accessing
llm
directly, use the
MultimodalAgent
to handle the conversation. Here's the updated code snippet:
Copy code
async def answer_from_text(txt: str):
    chat_ctx = assistant.chat_ctx_copy()
    chat_ctx.append(role="user", text=txt)
    # Use the MultimodalAgent to handle the chat context
    await assistant.set_chat_ctx(chat_ctx)
    # The agent will automatically handle the response generation
This modification ensures that the
MultimodalAgent
manages the conversation context and response generation without directly accessing an
llm
attribute. If you need further assistance or more detailed guidance, please let me know. Sources: - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1732022090.671149 - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1730673248.486419 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
f
@millions-hair-24956 The agent still no answer on my text from future import annotations import logging import asyncio from livekit import rtc from livekit.agents.multimodal import MultimodalAgent from livekit.plugins import openai from dotenv import load_dotenv from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli, llm load_dotenv() logger = logging.getLogger("myagent") logger.setLevel(logging.INFO) async def entrypoint(ctx: JobContext): logger.info("starting entrypoint") await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY) participant = await ctx.wait_for_participant() model = openai.realtime.RealtimeModel( instructions="You are a helpful assistant and you love kittens", voice="shimmer", temperature=0.8, modalities=["audio", "text"], ) assistant = MultimodalAgent(model=model) assistant.start(ctx.room) logger.info("starting agent") session = model.sessions[0] session.conversation.item.create( llm.ChatMessage( role="assistant", content="Please begin the interaction with the user in a manner consistent with your instructions.", ) ) session.response.create() # Initialize ChatManager chat = rtc.ChatManager(ctx.room) logger.info("ChatManager initialized") # Listen for incoming chat messages @chat.on("message_received") def on_chat_received(msg: rtc.ChatMessage): logger.info("message_received event fired") logger.info(f"Received message object: {msg}") if msg.message: logger.info(f"Received chat message: {msg.message}") asyncio.create_task(answer_from_text(msg.message)) async def answer_from_text(txt: str): chat_ctx = assistant.chat_ctx_copy() chat_ctx.append(role="user", text=txt) # Use the MultimodalAgent to handle the chat context await assistant.set_chat_ctx(chat_ctx) # The agent will automatically handle the response generation if name == "__main__": cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
g
The issue might be related to how the
MultimodalAgent
is expected to handle text input and generate responses. Here are a few steps to ensure the agent responds to text messages: 1. Ensure Correct Context Handling: When you update the chat context with
set_chat_ctx
, make sure the agent is configured to generate a response. The
MultimodalAgent
should automatically handle response generation once the context is updated. 2. Check for Response Events: Ensure that the agent is emitting events when a response is generated. You can listen for events like
agent_speech_committed
to confirm that the agent is processing the input and generating a response. 3. Verify Model Configuration: Double-check that the
RealtimeModel
is correctly configured to handle both audio and text modalities. The
modalities
parameter should include both "audio" and "text". Here's a refined approach to ensure the agent responds:
Copy code
async def answer_from_text(txt: str):
    chat_ctx = assistant.chat_ctx_copy()
    chat_ctx.append(role="user", text=txt)
    await assistant.set_chat_ctx(chat_ctx)
    # Trigger a response generation
    session = model.sessions[0]
    session.response.create()

# Listen for response events
@assistant.on("agent_speech_committed")
def on_agent_speech_committed(msg: llm.ChatMessage):
    logger.info(f"Agent responded with: {msg.content}")
This setup ensures that the agent's response is logged, and you can verify that the agent is processing the input correctly. If the issue persists, consider checking the logs for any warnings or errors that might indicate why the agent is not responding. If you need further assistance or more detailed guidance, please let me know. Sources: - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1732022090.671149 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
f
@millions-hair-24956 ERROR livekit - failed to emit event agent_speech_committed Traceback (most recent call last): File "C:\Users\User\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\livekit\rtc\event_emitter.py", line 58, in emit callback(*callback_args) File "C:\Users\User\Desktop\GoIT\ML-application-\main.py", line 54, in on_agent_speech_committed logger.info(f"Agent responded with: {msg.content}")
g
The error message indicates that there is an issue with the
on_agent_speech_committed
event handler, specifically with the logging statement. The problem seems to be with the way the logger is being used. The
<http://logger.info|logger.info>
syntax is incorrect and should be replaced with a standard logging call. Here's how you can fix the logging statement: 1. Correct the Logger Usage: Replace the incorrect logger syntax with the standard
logger.info
method. Here's the corrected code snippet:
Copy code
# Listen for response events
@assistant.on("agent_speech_committed")
def on_agent_speech_committed(msg: llm.ChatMessage):
    logger.info(f"Agent responded with: {msg.content}")
Make sure that all logging statements in your code use the correct syntax, such as
logger.info("message")
,
logger.debug("message")
, etc. If you continue to experience issues, ensure that the
logger
is properly configured at the beginning of your script, and that the logging level is set to capture the desired level of detail. If you need further assistance or more detailed guidance, please let me know. Sources: - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1732022090.671149 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
f
@millions-hair-24956 ERROR livekit - failed to emit event agent_speech_committed Traceback (most recent call last): File "C:\Users\User\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\livekit\rtc\event_emitter.py", line 58, in emit callback(*callback_args) File "C:\Users\User\Desktop\GoIT\ML-application-\main.py", line 54, in on_agent_speech_committed logger.info(f"Agent responded with: {msg.content}")
g
The error you're encountering is due to an incorrect syntax in the logging statement within the
on_agent_speech_committed
event handler. The
<http://logger.info|logger.info>
syntax is not valid in Python. You should use the standard logging method provided by Python's
logging
module. Here's how you can fix the issue: 1. Correct the Logger Syntax: Replace the incorrect logger syntax with the correct
logger.info
method. Here's the corrected code snippet:
Copy code
# Listen for response events
@assistant.on("agent_speech_committed")
def on_agent_speech_committed(msg: llm.ChatMessage):
    logger.info(f"Agent responded with: {msg.content}")
Ensure that all logging statements in your code use the correct syntax, such as
logger.info("message")
,
logger.debug("message")
, etc. This should resolve the error and allow the event to be emitted correctly. If you need further assistance or more detailed guidance, please let me know. Sources: - https://docs.livekit.io/agents/openai/multimodal-agent/ - https://docs.livekit.io/agents/openai/multimodalagent/ 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
f
@millions-hair-24956 How I modify my code that agent will answer on audio and text? from future import annotations import logging import asyncio from livekit import rtc from livekit.agents.multimodal import MultimodalAgent from livekit.plugins import openai from dotenv import load_dotenv from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli, llm load_dotenv() logger = logging.getLogger("myagent") logger.setLevel(logging.INFO) async def entrypoint(ctx: JobContext): logger.info("starting entrypoint") await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY) participant = await ctx.wait_for_participant() model = openai.realtime.RealtimeModel( instructions="You are a helpful assistant and you love kittens", voice="shimmer", temperature=0.8, modalities=["audio", "text"], ) assistant = MultimodalAgent(model=model) assistant.start(ctx.room) logger.info("starting agent") session = model.sessions[0] session.conversation.item.create( llm.ChatMessage( role="assistant", content="Please begin the interaction with the user in a manner consistent with your instructions.", ) ) session.response.create() @agent.on("user_speech_committed") def on_user_speech_committed(msg: llm.ChatMessage): # convert string lists to strings, drop images if isinstance(msg.content, list): msg.content = "\n".join( "[image]" if isinstance(x, llm.ChatImage) else x for x in msg ) log_queue.put_nowait(f"[{datetime.now()}] USER:\n{msg.content}\n\n") if name == "__main__": cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
g
To modify your code so that the agent can respond to both audio and text inputs, you need to ensure that the
MultimodalAgent
is set up to handle both modalities and that you have the appropriate event handlers to process and respond to these inputs. Here's how you can achieve this: 1. Ensure Modalities are Set: The
RealtimeModel
should have both "audio" and "text" in the
modalities
parameter, which you have already done. 2. Handle Text Input: Use the
ChatManager
to listen for text messages and process them. 3. Handle Audio Input: Use the
MultimodalAgent
to process audio input and generate responses. 4. Emit and Handle Events: Ensure that you have event handlers for both text and audio responses. Here's the modified code:
Copy code
from __future__ import annotations

import logging
import asyncio
from livekit import rtc
from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli, llm
from livekit.agents.multimodal import MultimodalAgent
from livekit.plugins import openai
from dotenv import load_dotenv
from datetime import datetime

load_dotenv()

logger = logging.getLogger("myagent")
logger.setLevel(<http://logging.INFO|logging.INFO>)

async def entrypoint(ctx: JobContext):
    <http://logger.info|logger.info>("starting entrypoint")

    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)

    participant = await ctx.wait_for_participant()

    model = openai.realtime.RealtimeModel(
        instructions="You are a helpful assistant and you love kittens",
        voice="shimmer",
        temperature=0.8,
        modalities=["audio", "text"],
    )
    assistant = MultimodalAgent(model=model)
    assistant.start(ctx.room)

    <http://logger.info|logger.info>("starting agent")

    session = model.sessions[0]
    session.conversation.item.create(
      llm.ChatMessage(
        role="assistant",
        content="Please begin the interaction with the user in a manner consistent with your instructions.",
      )
    )
    session.response.create()

    # Initialize ChatManager for text input
    chat = rtc.ChatManager(ctx.room)
    <http://logger.info|logger.info>("ChatManager initialized")

    # Listen for incoming chat messages
    @chat.on("message_received")
    def on_chat_received(msg: rtc.ChatMessage):
        <http://logger.info|logger.info>("message_received event fired")
        <http://logger.info|logger.info>(f"Received message object: {msg}")
        if msg.message:
            <http://logger.info|logger.info>(f"Received chat message: {msg.message}")
            asyncio.create_task(answer_from_text(msg.message))

    async def answer_from_text(txt: str):
        chat_ctx = assistant.chat_ctx_copy()
        chat_ctx.append(role="user", text=txt)
        await assistant.set_chat_ctx(chat_ctx)
        session.response.create()

    # Handle user speech committed event
    @assistant.on("user_speech_committed")
    def on_user_speech_committed(msg: llm.ChatMessage):
        if isinstance(msg.content, list):
            msg.content = "\n".join(
                "[image]" if isinstance(x, llm.ChatImage) else x for x in msg.content
            )
        <http://logger.info|logger.info>(f"[{datetime.now()}] USER:\n{msg.content}\n\n")

    # Handle agent speech committed event
    @assistant.on("agent_speech_committed")
    def on_agent_speech_committed(msg: llm.ChatMessage):
        <http://logger.info|logger.info>(f"Agent responded with: {msg.content}")

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
This code sets up the
MultimodalAgent
to handle both audio and text inputs, and it includes event handlers to log and respond to these inputs. Make sure your environment is correctly set up with the necessary dependencies and configurations. If you need further assistance or more detailed guidance, please let me know. Sources: - https://docs.livekit.io/agents/openai/multimodal-agent/ - https://docs.livekit.io/agents/openai/multimodalagent/ 👍 1 button 👎 0 button 💡 Hint: Mention @millions-hair-24956 in the thread for followups.
f
@millions-hair-24956 ERROR livekit - failed to emit event agent_speech_committed Traceback (most recent call last): File "C:\Users\User\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\livekit\rtc\event_emitter.py", line 58, in emit callback(*callback_args) File "C:\Users\User\Desktop\GoIT\ML-application-\main.py", line 75, in on_agent_speech_committed logger.info(f"Agent responded with: {msg.content}")
g
The error you're encountering is due to an incorrect syntax in the logging statement within the
on_agent_speech_committed
event handler. The
<http://logger.info|logger.info>
syntax is not valid in Python. You should use the standard logging method provided by Python's
logging
module. Here's how you can fix the issue: 1. Correct the Logger Syntax: Replace the incorrect logger syntax with the correct
logger.info
method. Here's the corrected code snippet:
Copy code
# Handle agent speech committed event
@assistant.on("agent_speech_committed")
def on_agent_speech_committed(msg: llm.ChatMessage):
    logger.info(f"Agent responded with: {msg.content}")
Ensure that all logging statements in your code use the correct syntax, such as
logger.info("message")
,
logger.debug("message")
, etc. This should resolve the error and allow the event to be emitted correctly. If you continue to experience issues, ensure that the
logger
is properly configured at the beginning of your script, and that the logging level is set to capture the desired level of detail. If you need further assistance or more detailed guidance, please let me know. Sources: - https://docs.livekit.io/agents/openai/multimodal-agent/ 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
f
@millions-hair-24956 How can I modify my code with that text from future import annotations import logging import asyncio from livekit import rtc from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli, llm from livekit.agents.multimodal import MultimodalAgent from livekit.plugins import openai from dotenv import load_dotenv from datetime import datetime load_dotenv() logger = logging.getLogger("myagent") logger.setLevel(logging.INFO) async def entrypoint(ctx: JobContext): logger.info("starting entrypoint") await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY) participant = await ctx.wait_for_participant() model = openai.realtime.RealtimeModel( instructions="You are a helpful assistant and you want to help people", voice="shimmer", temperature=0.8, modalities=["audio", "text"], ) assistant = MultimodalAgent(model=model) assistant.start(ctx.room) logger.info("starting agent") session = model.sessions[0] session.conversation.item.create( llm.ChatMessage( role="assistant", content="Please begin the interaction with the user in a manner consistent with your instructions.", ) ) session.response.create() # Initialize ChatManager for text input chat = rtc.ChatManager(ctx.room) logger.info("ChatManager initialized") # Listen for incoming chat messages @chat.on("message_received") def on_chat_received(msg: rtc.ChatMessage): logger.info("message_received event fired") logger.info(f"Received message object: {msg}") if msg.message: logger.info(f"Received chat message: {msg.message}") asyncio.create_task(answer_from_text(msg.message)) async def answer_from_text(txt: str): chat_ctx = assistant.chat_ctx_copy() chat_ctx.append(role="user", text=txt) await assistant.set_chat_ctx(chat_ctx) session.response.create() # Handle user speech committed event @assistant.on("user_speech_committed") def on_user_speech_committed(msg: llm.ChatMessage): if isinstance(msg.content, list): msg.content = "\n".join( "[image]" if isinstance(x, llm.ChatImage) else x for x in msg.content ) logger.info(f"[{datetime.now()}] USER:\n{msg.content}\n\n") # Handle agent speech committed event @assistant.on("agent_speech_committed") def on_agent_speech_committed(msg: llm.ChatMessage): logger.info(f"Agent responded with: {msg.content}") await asyncio.sleep(1) await assistant.say("Hello, I am here to assist you in completing a Geriatric Depression Scale screening. The GDS-15 is a series of 15 questions that will help assess your emotional well-being. For each question, please answer with either 'yes' or 'no'. If you're unsure, please feel free to ask me for clarification. Let's get started!", allow_interruptions=True) if name == "__main__": cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
g
To modify your code to handle both text and audio interactions correctly, you need to ensure that the logging statements are using the correct syntax and that the agent is set up to handle both modalities. Here's how you can adjust your code: 1. Correct Logger Syntax: Replace the incorrect logger syntax with the standard
logger.info
method. 2. Ensure Modalities are Set: The
RealtimeModel
should have both "audio" and "text" in the
modalities
parameter, which you have already done. 3. Handle Text and Audio: Ensure that the agent can process both text and audio inputs and generate responses accordingly. Here's the corrected code:
Copy code
from __future__ import annotations

import logging
import asyncio
from livekit import rtc
from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli, llm
from livekit.agents.multimodal import MultimodalAgent
from livekit.plugins import openai
from dotenv import load_dotenv
from datetime import datetime

load_dotenv()

logger = logging.getLogger("myagent")
logger.setLevel(<http://logging.INFO|logging.INFO>)

async def entrypoint(ctx: JobContext):
    <http://logger.info|logger.info>("starting entrypoint")

    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)

    participant = await ctx.wait_for_participant()

    model = openai.realtime.RealtimeModel(
        instructions="You are a helpful assistant and you want to help people",
        voice="shimmer",
        temperature=0.8,
        modalities=["audio", "text"],
    )
    assistant = MultimodalAgent(model=model)
    assistant.start(ctx.room)

    <http://logger.info|logger.info>("starting agent")

    session = model.sessions[0]
    session.conversation.item.create(
      llm.ChatMessage(
        role="assistant",
        content="Please begin the interaction with the user in a manner consistent with your instructions.",
      )
    )
    session.response.create()

    # Initialize ChatManager for text input
    chat = rtc.ChatManager(ctx.room)
    <http://logger.info|logger.info>("ChatManager initialized")

    # Listen for incoming chat messages
    @chat.on("message_received")
    def on_chat_received(msg: rtc.ChatMessage):
        <http://logger.info|logger.info>("message_received event fired")
        <http://logger.info|logger.info>(f"Received message object: {msg}")
        if msg.message:
            <http://logger.info|logger.info>(f"Received chat message: {msg.message}")
            asyncio.create_task(answer_from_text(msg.message))

    async def answer_from_text(txt: str):
        chat_ctx = assistant.chat_ctx_copy()
        chat_ctx.append(role="user", text=txt)
        await assistant.set_chat_ctx(chat_ctx)
        session.response.create()

    # Handle user speech committed event
    @assistant.on("user_speech_committed")
    def on_user_speech_committed(msg: llm.ChatMessage):
        if isinstance(msg.content, list):
            msg.content = "\n".join(
                "[image]" if isinstance(x, llm.ChatImage) else x for x in msg.content
            )
        <http://logger.info|logger.info>(f"[{datetime.now()}] USER:\n{msg.content}\n\n")

    # Handle agent speech committed event
    @assistant.on("agent_speech_committed")
    def on_agent_speech_committed(msg: llm.ChatMessage):
        <http://logger.info|logger.info>(f"Agent responded with: {msg.content}")

    await asyncio.sleep(1)
    await assistant.say("Hello, I am here to assist you in completing a Geriatric Depression Scale screening. The GDS-15 is a series of 15 questions that will help assess your emotional well-being. For each question, please answer with either 'yes' or 'no'. If you're unsure, please feel free to ask me for clarification. Let's get started!", allow_interruptions=True)

if __name__ == "__main__":
Copy code
cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
This code ensures that the agent can handle both text and audio inputs, and it includes event handlers to log and respond to these inputs. Make sure your environment is correctly set up with the necessary dependencies and configurations. If you need further assistance or more detailed guidance, please let me know. Sources: - https://docs.livekit.io/agents/openai/multimodal-agent/ 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.