https://livekit.io logo
Join Slack
Powered by
# agents
  • s

    swift-garage-98561

    08/06/2025, 7:05 PM
    Hi @refined-appointment-81829 I'm getting a TimeoutError when deploying my LiveKit agent to a GCP VM (2 vCPUs, 3.8GB RAM). The agent works fine locally but fails on the VM during inference executor initialization. Error:
    Copy code
    TimeoutError
    File: /usr/local/lib/python3.13/site-packages/livekit/agents/ipc/supervised_proc.py, line 169
    Container stats:
    Copy code
    CPU: 132.64% (maxing out both cores)
    Memory: 276MiB / 3.821GiB (7.05%)
    Agent logs:
    Copy code
    appuser@8b1bc5f9b849:~$ PYTHONPATH=/home/appuser python -u src/agent.py start
    INFO:google_genai._api_client:The user provided project/location will take precedence over the Vertex AI API key from the environment variable.
    INFO:livekit.agents:starting worker
    {"message": "starting worker", "level": "INFO", "name": "livekit.agents", "version": "1.2.2", "rtc-version": "1.0.12", "timestamp": "2025-08-06T18:56:35.202302+00:00"}
    INFO:livekit.agents:preloading plugins
    {"message": "preloading plugins", "level": "INFO", "name": "livekit.agents", "packages": ["livekit.plugins.cartesia", "livekit.plugins.deepgram", "livekit.plugins.silero", "livekit.plugins.assemblyai", "livekit.plugins.openai", "livekit.plugins.groq", "livekit.plugins.elevenlabs", "livekit.plugins.aws", "livekit.plugins.turn_detector", "livekit.plugins.google", "livekit.plugins.anthropic", "av"], "timestamp": "2025-08-06T18:56:35.204163+00:00"}
    INFO:livekit.agents:starting inference executor
    {"message": "starting inference executor", "level": "INFO", "name": "livekit.agents", "timestamp": "2025-08-06T18:56:35.476252+00:00"}
    INFO:livekit.agents:initializing process
    {"message": "initializing process", "level": "INFO", "name": "livekit.agents", "pid": 86, "inference": true, "timestamp": "2025-08-06T18:57:04.267543+00:00"}
    INFO:livekit.agents:killing process
    {"message": "killing process", "level": "INFO", "name": "livekit.agents", "pid": 86, "inference": true, "timestamp": "2025-08-06T18:57:14.402054+00:00"}
    ERROR:livekit.agents:worker failed
    Traceback (most recent call last):
      File "/usr/local/lib/python3.13/asyncio/tasks.py", line 507, in wait_for
        return await fut
               ^^^^^^^^^
      File "/usr/local/lib/python3.13/site-packages/livekit/agents/ipc/channel.py", line 47, in arecv_message
        return _read_message(await dplx.recv_bytes(), messages)
                             ^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.13/site-packages/livekit/agents/utils/aio/duplex_unix.py", line 35, in recv_bytes
        len_bytes = await self._reader.readexactly(4)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/local/lib/python3.13/asyncio/streams.py", line 769, in readexactly
        await self._wait_for_data('readexactly')
      File "/usr/local/lib/python3.13/asyncio/streams.py", line 539, in _wait_for_data
        await self._waiter
    asyncio.exceptions.CancelledError
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "/usr/local/lib/python3.13/site-packages/livekit/agents/cli/_run.py", line 79, in _worker_run
        await worker.run()
      File "/usr/local/lib/python3.13/site-packages/livekit/agents/worker.py", line 387, in run
        await self._inference_executor.initialize()
      File "/usr/local/lib/python3.13/site-packages/livekit/agents/ipc/supervised_proc.py", line 169, in initialize
        init_res = await asyncio.wait_for(
                   ^^^^^^^^^^^^^^^^^^^^^^^
        ...<2 lines>...
        )
        ^
      File "/usr/local/lib/python3.13/asyncio/tasks.py", line 506, in wait_for
        async with timeouts.timeout(timeout):
                   ~~~~~~~~~~~~~~~~^^^^^^^^^
      File "/usr/local/lib/python3.13/asyncio/timeouts.py", line 116, in __aexit__
        raise TimeoutError from exc_val
    TimeoutError
    {"message": "worker failed", "level": "ERROR", "name": "livekit.agents", "exc_info": "Traceback (most recent call last):\n  File \"/usr/local/lib/python3.13/asyncio/tasks.py\", line 507, in wait_for\n    return await fut\n           ^^^^^^^^^\n  File \"/usr/local/lib/python3.13/site-packages/livekit/agents/ipc/channel.py\", line 47, in arecv_message\n    return _read_message(await dplx.recv_bytes(), messages)\n                         ^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.13/site-packages/livekit/agents/utils/aio/duplex_unix.py\", line 35, in recv_bytes\n    len_bytes = await self._reader.readexactly(4)\n                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n  File \"/usr/local/lib/python3.13/asyncio/streams.py\", line 769, in readexactly\n    await self._wait_for_data('readexactly')\n  File \"/usr/local/lib/python3.13/asyncio/streams.py\", line 539, in _wait_for_data\n    await self._waiter\nasyncio.exceptions.CancelledError\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n  File \"/usr/local/lib/python3.13/site-packages/livekit/agents/cli/_run.py\", line 79, in _worker_run\n    await worker.run()\n  File \"/usr/local/lib/python3.13/site-packages/livekit/agents/worker.py\", line 387, in run\n    await self._inference_executor.initialize()\n  File \"/usr/local/lib/python3.13/site-packages/livekit/agents/ipc/supervised_proc.py\", line 169, in initialize\n    init_res = await asyncio.wait_for(\n               ^^^^^^^^^^^^^^^^^^^^^^^\n    ...<2 lines>...\n    )\n    ^\n  File \"/usr/local/lib/python3.13/asyncio/tasks.py\", line 506, in wait_for\n    async with timeouts.timeout(timeout):\n               ~~~~~~~~~~~~~~~~^^^^^^^^^\n  File \"/usr/local/lib/python3.13/asyncio/timeouts.py\", line 116, in __aexit__\n    raise TimeoutError from exc_val\nTimeoutError", "timestamp": "2025-08-06T18:57:14.409090+00:00"}
    INFO:livekit.agents:draining worker
    {"message": "draining worker", "level": "INFO", "name": "livekit.agents", "id": "unregistered", "timeout": 1800, "timestamp": "2025-08-06T18:57:14.436013+00:00"}
    appuser@8b1bc5f9b849:~$ ad
    What I've tried: - All API keys are set correctly - Network connectivity works Questions: 1. Is 2 vCPUs enough for LiveKit agent initialization? 2. How can I increase the inference executor timeout? 3. Any LiveKit config options to reduce initialization load? The agent initializes multiple AI services (STT, TTS, LLM) simultaneously and seems to timeout after 10 seconds. Should I upgrade to 4+ vCPUs or is there a configuration fix? Thanks! πŸ™
    f
    e
    • 3
    • 2
  • w

    wooden-beard-26644

    08/06/2025, 7:18 PM
    Is there any way to provide job metadata values when using
    python main.py console
    ? We configure a lot of behavior via call metadata and i'd like to be able to test it when running locally
  • a

    acceptable-motorcycle-5430

    08/06/2025, 10:50 PM
    Hi, I have a question. How can i get the user transcript when a function call executes so i can perform RAG?
    r
    • 2
    • 6
  • r

    rapid-salesclerk-34950

    08/07/2025, 12:24 AM
    Hi, I am currently using the iOS vision demo git hub project and with Google Gemini however when I use the screen sharing the broadcast is working, but it cannot see my screen. It is not correctly sending the video stream. The video stream is only working using the camera, not the screen sharing Any tips on how to fix this?
  • r

    rough-pizza-5956

    08/07/2025, 3:38 AM
    How to catch 429 errors on the system, is it possible from on error event
  • b

    bumpy-student-61140

    08/07/2025, 7:43 AM
    Hi @refined-appointment-81829 I have a problem. I think that problem is fixed in the latest release of livekit/agent, but i have no idea how to update it. I know i can make a new folder, new venv and get all the reqs, but that seems unnecessarily tedious. Is there any normal way to update all the livekit dependancies and files to the latest? I currently run a pip-review and make it update all the dependancies, and then i run a download-files again. Is there a simple way for this?
    r
    • 2
    • 1
  • f

    flaky-beard-91685

    08/07/2025, 7:50 AM
    Hi @refined-appointment-81829 @tall-belgium-91876 For our Indian customers we need an Indian phone number so we planned to integrate Exotel with LiveKit. I've been working with the Exotel's team to whitelist the FQDN and destination protocols and they came up with a requirement, which is: 1. They are asking to enable the TCP port 5070. 2. They support 10000-40000 port range for UDP protocol and are asking to enable it. Is it possible that you can override these configurations just for our account? We have two paid accounts basically, one for staging and one for production. Is it possible that you can enable these settings in both of the accounts? Please let me know if that's possible, I'll share the account details with you. Thanks
    r
    • 2
    • 21
  • r

    rhythmic-plumber-379

    08/07/2025, 8:44 AM
    https://docs.ag-ui.com/introduction Does LiveKit have any plans in supporting this protocol in the future?
  • a

    ambitious-ram-96835

    08/07/2025, 11:49 AM
    Hi all! How can I pass metadata from the Next.js app to the Python worker, so I can override the session pre-launch? The goal is to dynamically change llm, stt and tts models & providers based on end-users' preferences. Is adding
    metadata
    to
    AccessToken
    the way to go?
    r
    f
    • 3
    • 2
  • p

    polite-oil-10264

    08/07/2025, 2:24 PM
    how to console log the llm output ?
    f
    • 2
    • 1
  • p

    purple-rainbow-1246

    08/07/2025, 2:43 PM
    Has anyone faced false negative interruptions by the agent? The user is still speaking and gets interrupted often. I've tried increasing the
    min_endpointing_delay
    to 1.0, and made some changes to the
    silero.VAD.load
    options like
    min_speech_duration
    and
    min_silence_duration
    with no luck.
    g
    • 2
    • 7
  • r

    rhythmic-plumber-379

    08/07/2025, 4:09 PM
    Is LiveKit going to support a2a? https://github.com/a2aproject/A2A Would be nice to have a way to bridge different agents over the network using a standard interface
  • a

    ancient-judge-59849

    08/07/2025, 4:23 PM
    Hey! I've been looking at the 1.2.0 release and I'm interested in understanding the Experimental agent tasks. Could you explain what's the difference between using this from just generating myself another agent and passing the call to it and how could we then customize it for other use cases?
    r
    • 2
    • 1
  • d

    delightful-mechanic-38378

    08/07/2025, 4:26 PM
    In the benchmarking section https://docs.livekit.io/home/self-hosting/benchmark/ ... it uses 1 room. Does the tooling support multiple rooms? I want to test breaking point for multiple rooms - each with 1 agent and 1 participant. Any info/pointers/links/docs/code will be very helpful. (@refined-appointment-81829 anything from LK or other threads you help out with)
    e
    • 2
    • 3
  • n

    narrow-chef-82684

    08/07/2025, 5:09 PM
    Hey anyone tried deploying agent with CloudFlare Durable Object? And CloudFlare TTS API
  • m

    many-hair-70963

    08/07/2025, 5:26 PM
    When using the Hedra Plugin with LK, a source image generated from Hedra's video generator is a half-body image. but appears as a head only avatar. I understand output is 512x512 but is there a way to use the half-body image as the avatar within the 512x512?
    r
    • 2
    • 2
  • c

    creamy-judge-56458

    08/07/2025, 6:38 PM
    Hi, since upgrading to 1.2.2 and 1.2.3 we have been getting the following issue when deploying, did something significantly change with how agents are started and initiailized?
    2025-08-07T18:36:47.3107265Z stdout F {"message": "worker failed", "level": "ERROR", "name": "livekit.agents", "exc_info": "Traceback (most recent call last):\n File \"/usr/local/lib/python3.11/asyncio/tasks.py\", line 500, in wait_for\n return fut.result()\n ^^^^^^^^^^^^\n File \"/opt/venv/lib/python3.11/site-packages/livekit/agents/ipc/channel.py\", line 47, in arecv_message\n return _read_message(await dplx.recv_bytes(), messages)\n ^^^^^^^^^^^^^^^^^^^^^^^\n File \"/opt/venv/lib/python3.11/site-packages/livekit/agents/utils/aio/duplex_unix.py\", line 35, in recv_bytes\n len_bytes = await self._reader.readexactly(4)\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/local/lib/python3.11/asyncio/streams.py\", line 750, in readexactly\n await self._wait_for_data('readexactly')\n File \"/usr/local/lib/python3.11/asyncio/streams.py\", line 543, in _wait_for_data\n await self._waiter\nasyncio.exceptions.CancelledError\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n File \"/opt/venv/lib/python3.11/site-packages/livekit/agents/cli/_run.py\", line 79, in _worker_run\n await worker.run()\n File \"/opt/venv/lib/python3.11/site-packages/livekit/agents/worker.py\", line 387, in run\n await self._inference_executor.initialize()\n File \"/opt/venv/lib/python3.11/site-packages/livekit/agents/ipc/supervised_proc.py\", line 169, in initialize\n init_res = await asyncio.wait_for(\n ^^^^^^^^^^^^^^^^^^^^^^^\n File \"/usr/local/lib/python3.11/asyncio/tasks.py\", line 502, in wait_for\n raise exceptions.TimeoutError() from exc\nTimeoutError", "timestamp": "2025-08-07T18:36:47.292634+00:00"}
    2025-08-07T18:36:47.311745748Z stdout F {"message": "draining worker", "level": "INFO", "name": "livekit.agents", "id": "unregistered", "timeout": 1800, "timestamp": "2025-08-07T18:36:47.309834+00:00"}
    2025-08-07T18:56:21.689725611Z stdout F {"message": "Exception in callback Future.set_result(None)\nhandle: <Handle Future.set_result(None)>", "level": "ERROR", "name": "asyncio", "exc_info": "Traceba.....
  • a

    ambitious-ram-96835

    08/07/2025, 6:51 PM
    This Dockerfile from the Python starter is causing errors like below - can anyone please share a working Dockerfile example? πŸ™
    Copy code
    36.58 E: Failed to fetch <http://deb.debian.org/debian/pool/main/p/python3.11/libpython3.11-stdlib_3.11.2-6%2bdeb12u6_amd64.deb>  Hash Sum mismatch
    36.58    Hashes of expected file:
    36.58     - SHA256:409f354d3d5d5b605a5d2d359936e6c2262b6c8f2bb120ec530bc69cb318fac4
    36.58     - MD5Sum:a45c8d12a11e8ca44e191331917d6c37 [weak]
    36.58     - Filesize:1798500 [weak]
    36.58    Hashes of received file:
    36.58     - SHA256:a1ddb99826b09a928d6556bbf35f059c8ce643057d28ed4c5b46448a5733edd6
    36.58     - MD5Sum:c7fe270c74141b29950a0eadde47272a [weak]
    36.58     - Filesize:1798500 [weak]
    36.58    Last modification reported: Sat, 03 May 2025 18:08:56 +0000
    36.58 E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?
    e
    • 2
    • 2
  • c

    cuddly-cartoon-47334

    08/07/2025, 7:35 PM
    Hi. For agents 1.x do we still have to do this:
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
    participant = await ctx.wait_for_participant()
    lkapi = LiveKitAPI()
    ? Also, how do we add shutdown callbacks in agnets 1.x? It semes the 0.x way does not work.
    c
    • 2
    • 4
  • f

    flaky-rain-27278

    08/07/2025, 8:53 PM
    Is GPT-5 supported?
    m
    r
    +3
    • 6
    • 19
  • b

    better-horse-7195

    08/07/2025, 9:08 PM
    why is it so complex to do a dynamic greeting when user joins an outbound call? i play my greeting and then LLM again plays a greeting, this is so irritiating
  • b

    better-horse-7195

    08/07/2025, 9:08 PM
    has anyone solved this? this is my agent.py
    Copy code
    from __future__ import annotations
    
    import asyncio
    import json
    import logging
    import os
    from typing import Any, Set
    
    from dotenv import load_dotenv
    from livekit import api, rtc
    from livekit.agents import (
        Agent, AgentSession, JobContext, JobProcess, RunContext,
        cli, WorkerOptions, RoomInputOptions, function_tool,
        BackgroundAudioPlayer, AudioConfig, BuiltinAudioClip, get_job_context,
    )
    from livekit.plugins import deepgram, openai, cartesia, silero, noise_cancellation
    from livekit.plugins.noise_cancellation import BVCTelephony
    from livekit.plugins.turn_detector.multilingual import MultilingualModel
    
    from metadata import JobMetadata   # your pydantic model
    
    load_dotenv(".env.local")
    
    logger = logging.getLogger("outbound-caller")
    logger.setLevel(<http://logging.INFO|logging.INFO>)
    
    PLEASANTRIES: Set[str] = {"hi", "hello", "hey", "yes"}
    
    
    # ───────────────────────────── Agent ──────────────────────────────── #
    class OutboundCaller(Agent):
        """Minimal agent – greeting & hang-up handled in entrypoint."""
    
        def __init__(self, *, instructions: str):
            super().__init__(instructions=instructions)
            self.participant: rtc.RemoteParticipant | None = None
    
        def set_participant(self, participant: rtc.RemoteParticipant):
            self.participant = participant
    
        # ------ LLM-visible tools ----------------------------------------
        @function_tool()
        async def end_call(self, ctx: RunContext):
            """Hang up when user or LLM decides the call is over."""
            await get_job_context().api.room.delete_room(
                api.DeleteRoomRequest(
                    room=get_job_context().room.name,
                )
            )
    
        @function_tool()
        async def detected_answering_machine(self, ctx: RunContext):
            """Hang up if voicemail is detected."""
            <http://logger.info|logger.info>("AMD Detected")
            await get_job_context().api.room.delete_room(
                api.DeleteRoomRequest(
                    room=get_job_context().room.name,
                )
            )
    
    
    # ──────────────────────────  Pre-warm VAD  ────────────────────────── #
    def prewarm(proc: JobProcess):
        proc.userdata["vad"] = silero.VAD.load()
    
    
    # ─────────────────────────── Entry point ──────────────────────────── #
    async def entrypoint(ctx: JobContext):
        session: AgentSession | None = None
        try:
            # 0  Parse metadata & inject API keys
            meta = JobMetadata(**json.loads(ctx.job.metadata))
            os.environ.update(
                DEEPGRAM_API_KEY=meta.deepgram_api_key,
                CARTESIA_API_KEY=meta.cartesia_api_key,
                OPENAI_API_KEY=meta.openai_api_key,
            )
    
            # 1  Build agent & session
            agent = OutboundCaller(instructions=meta.instructions)
    
            session = AgentSession(
                vad=ctx.proc.userdata["vad"],
                llm=openai.LLM(model="gpt-4o-mini"),
                stt=deepgram.STT(model="nova-3", interim_results=True, language="multi"),
                tts=cartesia.TTS(model="sonic-2", voice="694f9389-aac1-45b6-b726-9d9369183238"),
                turn_detection="stt",
                preemptive_generation=True,
                allow_interruptions=True,
            )
    
            # Pleasantry filter & first-turn latch
            first_turn = asyncio.Event()
    
            @session.on("user_input_transcribed")
            def _filter(ev):
                if ev.is_final:
                    if ev.transcript.strip().lower() in PLEASANTRIES:
                        ev.add_to_chat_ctx = False
                    first_turn.set()
    
            # Background ambience
            background_audio = BackgroundAudioPlayer(
                ambient_sound=AudioConfig(BuiltinAudioClip.OFFICE_AMBIENCE, volume=0.8),
                thinking_sound=[
                    AudioConfig(BuiltinAudioClip.KEYBOARD_TYPING, volume=0.8),
                    AudioConfig(BuiltinAudioClip.KEYBOARD_TYPING2, volume=0.7),
                ],
            )
    
            # 2  Start session as background task & dial
            session_started = asyncio.create_task(
                session.start(
                    agent=agent,
                    room=ctx.room,
                    room_input_options=RoomInputOptions(
                        noise_cancellation=noise_cancellation.BVCTelephony()
                    ),
                )
            )
    
            await ctx.api.sip.create_sip_participant(
                api.CreateSIPParticipantRequest(
                    room_name=ctx.room.name,
                    sip_trunk_id=meta.sip_outbound_trunk_id,
                    sip_call_to=meta.phone_number,
                    participant_identity=meta.phone_number,
                    wait_until_answered=True,
                )
            )
    
            # 3  Wait for session start and participant join
            await session_started
            participant = await ctx.wait_for_participant(identity=meta.phone_number)
            agent.set_participant(participant)
    
            # Start background audio after session is fully established
            await background_audio.start(room=ctx.room, agent_session=session)
    
            try:
                await asyncio.wait_for(first_turn.wait(), timeout=1.5)
            except asyncio.TimeoutError:
                pass  # silent pick-up
    
            # 4  Deterministic greeting (valid SSML)
            greeting = (
                meta.greeting or "Hello, this is Sara from ABC Finance."
            )
    
            await session.say(greeting, allow_interruptions=True, add_to_chat_ctx=False)
    
            # Session will continue running naturally - no session.run() needed
    
        except Exception as exc:
            logger.exception(f"Outbound-caller fatal error: {exc}")
            # best-effort room cleanup
            try:
                await ctx.api.room.delete_room(api.DeleteRoomRequest(room=ctx.room.name))
            except Exception:
                pass
    
        finally:
            # Dump conversation history (works on all SDK versions)
            if session and getattr(session, "history", None):
                h = session.history
                try:
                    out = json.dumps(h.to_dict(), indent=2)      # β‰₯1.0.2
                except AttributeError:
                    out = getattr(h, "to_json", lambda **_: str(h))(indent=2)
                print("\n--- Call Transcript ---")
                print(out)
                print("--- End Transcript ---\n")
    
    
    # ─────────────────────────── CLI runner ──────────────────────────── #
    if __name__ == "__main__":
        cli.run_app(
            WorkerOptions(
                entrypoint_fnc=entrypoint,
                agent_name="outbound-caller",
                prewarm_fnc=prewarm,   # drop if cold-start latency isn't a concern
            )
        )
    m
    b
    • 3
    • 4
  • d

    dry-france-22717

    08/07/2025, 10:16 PM
    Hi there, I am trying to have LK agent talk to the caller using a custom websocket backend without using any instructions / LLM prompt on LK directly. Expectation - The custom websocket receives transcripts after STT and sends back messages which is to be used for TTS by LK Agent. If anyone has any reference around same?
  • a

    acceptable-psychiatrist-80817

    08/07/2025, 11:24 PM
    So I have a multiagent running, the voice agent calls the multiagent for tasks. I'm trying to check if user gave the same task thats already running. Do I need to use external llm or I can Use my voice agent for this? I have the current task listed on the self.current_tasks
  • c

    careful-analyst-10302

    08/08/2025, 4:21 AM
    β€œC:\Users\86177\Desktop\livekit>python agent.py download-files 2025-08-08 121833,028 - INFO livekit.agents - Downloading files for <livekit.plugins.google.GooglePlugin object at 0x000002487C7154F0> 2025-08-08 121833,029 - INFO livekit.agents - Finished downloading files for <livekit.plugins.google.GooglePlugin object at 0x000002487C7154F0> 2025-08-08 121833,029 - INFO livekit.agents - Downloading files for <livekit.plugins.silero.SileroPlugin object at 0x000002487DB3BE30> 2025-08-08 121833,029 - INFO livekit.agents - Finished downloading files for <livekit.plugins.silero.SileroPlugin object at 0x000002487DB3BE30> 2025-08-08 121833,029 - INFO livekit.agents - Downloading files for <livekit.plugins.turn_detector.EOUPlugin object at 0x000002487DBB47A0> 2025-08-08 121859,789 - INFO livekit.agents - Finished downloading files for <livekit.plugins.turn_detector.EOUPlugin object at 0x000002487DBB47A0> C:\Users\86177\Desktop\livekit>python agent.py console 2025-08-08 122108,874 - DEBUG asyncio - Using proactor: IocpProactor ================================================== Livekit Agents - Console ================================================== Press [Ctrl+B] to toggle between Text/Audio mode, [Q] to quit. 2025-08-08 122108,877 - INFO livekit.agents - starting worker {"version": "1.2.2", "rtc-version": "1.0.12"} 2025-08-08 122108,878 - INFO livekit.agents - starting inference executor 2025-08-08 122108,929 - INFO livekit.agents - initializing process {"pid": 56776, "inference": true} 2025-08-08 122114,077 - DEBUG livekit.agents - initializing inference runner {"runner": "lk_end_of_utterance_multilingual", "pid": 56776, "inference": true} 2025-08-08 122118,935 - INFO livekit.agents - killing process {"pid": 56776, "inference": true} 2025-08-08 122118,937 - ERROR livekit.agents - worker failed Traceback (most recent call last): File "C:\Users\86177\AppData\Local\Programs\Python\Python312\Lib\asyncio\tasks.py", line 520, in wait_for return await fut ^^^^^^^^^ File "C:\Users\86177\AppData\Local\Programs\Python\Python312\Lib\site-packages\livekit\agents\ipc\channel.py", line 47, in arecv_message return _read_message(await dplx.recv_bytes(), messages) ^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\86177\AppData\Local\Programs\Python\Python312\Lib\site-packages\livekit\agents\utils\aio\duplex_unix.py", line 35, in recv_bytes len_bytes = await self._reader.readexactly(4) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\86177\AppData\Local\Programs\Python\Python312\Lib\asyncio\streams.py", line 752, in readexactly await self._wait_for_data('readexactly') File "C:\Users\86177\AppData\Local\Programs\Python\Python312\Lib\asyncio\streams.py", line 545, in _wait_for_data await self._waiter asyncio.exceptions.CancelledError The above exception was the direct cause of the following exception: Traceback (most recent call last): File "C:\Users\86177\AppData\Local\Programs\Python\Python312\Lib\site-packages\livekit\agents\cli\_run.py", line 79, in _worker_run await worker.run() File "C:\Users\86177\AppData\Local\Programs\Python\Python312\Lib\site-packages\livekit\agents\worker.py", line 387, in run await self._inference_executor.initialize() File "C:\Users\86177\AppData\Local\Programs\Python\Python312\Lib\site-packages\livekit\agents\ipc\supervised_proc.py", line 169, in initialize init_res = await asyncio.wait_for( ^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\86177\AppData\Local\Programs\Python\Python312\Lib\asyncio\tasks.py", line 519, in wait_for async with timeouts.timeout(timeout): ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\86177\AppData\Local\Programs\Python\Python312\Lib\asyncio\timeouts.py", line 115, in aexit raise TimeoutError from exc_val TimeoutError”why
  • r

    refined-toddler-89382

    08/08/2025, 7:45 AM
    Hello, the agent audio is not being played when the user does the following on our web page: 1. Loads the page in an incognito tab 2. Waits till the livekit room is joined and the agent joined 3. Then grants microphone permission 4. Then interacts with the agent by clicking on a "Play"-Button that prompts the agent with a preset prompt My livekit initialization phase looks like this: 1.:
    Copy code
    public async createAndConnectRoom(userName?: string, version?: string) {
        try {
          const queryOptions: { ROOM_NAME: string, USER_NAME?: string, VERSION?: string } = {
            ROOM_NAME: this.roomName
          }
    
          // Only add 'name' if userName is not null or empty
          if (userName && userName.trim() !== '') {
            queryOptions.USER_NAME = userName
          }
          queryOptions.VERSION = version ?? 'prod'
    
          const query = new URLSearchParams(queryOptions)
    
          console.log('Trying to retrieve LiveKit Token...')
          const response = await fetch(`/api/liveKitToken?${query}`)
          if (!response.ok) {
            throw new Error(`Failed to fetch token: ${response.statusText}`)
          }
          this.liveKitToken = await response.text()
        }
        catch (error) {
          this.logAndSend(`Error retrieving LiveKit token: ${error}`)
        }
        if (this.liveKitToken === undefined) {
          this.logAndSend('Failed to retrieve LiveKit token!')
          return
        }
        console.log('Retrieved LiveKit Token.')
        try {
          // Register VoicePipeline Agent Readyness
          this.registerAgentReady()
    
          let url = this.productionUrl
          switch (version) {
            case 'test':
              url = this.testUrl
              break
            case 'dev':
              url = this.devUrl
              break
            default:
              url = this.productionUrl
              break
          }
          console.log(`Connecting to LiveKit room at ${url} with environment: ${version ?? 'prod'}`)
    
          await this.room.connect(url, this.liveKitToken)
    
          // Set up event listeners after connecting
          this.setupParticipantEventListeners()
    
          // Handle already connected participants (e.g., TTS agent)
          console.log(this.room.remoteParticipants.size + ' remote participants already connected')
          this.room.remoteParticipants.forEach((participant) => {
            this.setupVoicePipelineAgent(participant)
          })
        }
        catch (error) {
          this.logAndSend(`Error connecting to LiveKit room: ${error}`)
        }
      }
    1a.:
    Copy code
    private setupParticipantEventListeners() {
        // Listener for transcriptions received
        this.room.registerTextStreamHandler('lk.transcription', async (reader, participantInfo) => {
          const info = reader.info
    
          if (info.attributes !== undefined) {
            const transcriptionId = info.attributes['lk.transcribed_track_id']
            const transcriptionFinal = info.attributes['lk.transcription_final'] === 'true'
    
            const participantIdentity = participantInfo.identity
            // Option 1: Process the stream incrementally using a for-await loop.
            for await (const chunk of reader) {
              // Process only if the transcription is from your own participant
              if (participantIdentity === this.room.localParticipant.identity) {
                // STT
                this.onOwnTranscriptionReceivedListeners.forEach(listener =>
                  listener(chunk, transcriptionFinal, transcriptionId)
                )
              }
              else {
                // TTS
                this.onForeignTranscriptionReceivedListeners.forEach(listener =>
                  listener(chunk, transcriptionFinal, transcriptionId)
                )
              }
            }
            // TTS finished
            if (participantInfo.identity !== this.room.localParticipant.identity) {
              this.onForeignTranscriptionReceivedListeners.forEach(listener =>
                listener(undefined, true, transcriptionId)
              )
            }
          }
        })
      }
    1b.:
    Copy code
    private setupVoicePipelineAgent(participant: RemoteParticipant) {
        console.log('Setting up VoicePipeLine-Agent:', participant.identity)
    
        // Listener for track subscriptions
        participant.on(
          ParticipantEvent.TrackSubscribed,
          (track/* , publication */) => {
            if (track.kind === Track.Kind.Audio) {
              this.handleTTSAudioTrack(track as RemoteAudioTrack)
            }
          }
        )
    
        // Listener for track unsubscriptions
        participant.on(
          ParticipantEvent.TrackUnsubscribed,
          (track/* , publication */) => {
            if (track.kind === Track.Kind.Audio) {
              this.cleanupTTSAudioTrack(track as RemoteAudioTrack)
            }
          }
        )
    
        // Subscribe to existing audio tracks
        participant.trackPublications.forEach((publication: TrackPublication) => {
          if (
            publication.track
            && publication.track.kind === Track.Kind.Audio
            && publication.isSubscribed
          ) {
            this.handleTTSAudioTrack(publication.track as RemoteAudioTrack)
          }
        })
      }
    2.:
    Copy code
    public async publishMicrophoneTrack() {
        if (this.room.state !== ConnectionState.Connected) {
          this.logAndSend('Not connected to room yet!')
          return
        }
        try {
          // This will prompt the user for microphone permissions
          this.microphoneTrack = await createLocalAudioTrack()
          // Publish the track if permission is granted
          const audioTrack = await this.room.localParticipant.publishTrack(this.microphoneTrack, { name: 'microphone' })
    
          this.publishedAudioTracks.set('microphone', audioTrack)
          console.log('Microphone access granted and track published.')
        }
        catch (error) {
          this.logAndSend(`Microphone access denied: ${error}`)
        }
      }
    3.:
    Copy code
    public async toggleMicrophoneEnabled(enabled: boolean) {
        if (this.microphoneTrack !== undefined) {
          await this.room.localParticipant.setMicrophoneEnabled(enabled)
        }
        else {
          this.logAndSend('No microphone track to toggle!')
        }
      }
    -> Of course I await all functions. I even added this when the User clicks the play button:
    await this.room.startAudio()
    But it did not help either. It only plays the sound if either the user selects very quickly the microphone permission or if he just reloads the page (after set microphone permission).
  • m

    magnificent-dusk-62723

    08/08/2025, 11:04 AM
    anyone had issues with the AI agent just speaking gibberish like "baglalala" then it goes strange but sometimes comes back and sometimes never comes back
  • c

    calm-article-62769

    08/08/2025, 12:06 PM
    πŸ‘‹ Hello, team! If anyone has a working agent file where screen sharing functions correctly, please share it with me so I can try to replicate it. I was trying to set up an agent and wanted to test the screen-sharing feature. For the worker agent, I used this Python script.
    Copy code
    import asyncio
    import base64
    import logging
    
    from dotenv import load_dotenv
    from livekit.agents import (
        Agent,
        AgentSession,
        JobContext,
        RoomInputOptions,
        WorkerOptions,
        cli,
        get_job_context,
    )
    from livekit.agents.llm import ImageContent
    from livekit.plugins import openai, silero
    
    # # Load environment variables from .env
    load_dotenv()
    
    logger = logging.getLogger("vision-assistant")
    
    load_dotenv()
    
    
    class VisionAssistant(Agent):
        def __init__(self) -> None:
            self._tasks = []
            super().__init__(
                instructions=""" You are a helpful voice assistant Tom.""",
                llm=openai.LLM(model="gpt-4o-mini"),
                stt=openai.STT(model="whisper-1"),
                tts=openai.TTS(model="tts-1", voice="nova"),
                vad=silero.VAD.load(),
            )
    
        async def on_enter(self):
            def _image_received_handler(reader, participant_identity):
                task = asyncio.create_task(
                    self._image_received(reader, participant_identity)
                )
                self._tasks.append(task)
                task.add_done_callback(lambda t: self._tasks.remove(t))
                
            get_job_context().room.register_byte_stream_handler("test", _image_received_handler)
    
            self.session.generate_reply(
                instructions="Briefly greet the user and offer your assistance."
            )
        
        async def _image_received(self, reader, participant_identity):
            <http://logger.info|logger.info>("Received image from %s: '%s'", participant_identity, reader.info.name)
            try:
                image_bytes = bytes()
                async for chunk in reader:
                    image_bytes += chunk
    
                chat_ctx = self.chat_ctx.copy()
                chat_ctx.add_message(
                    role="user",
                    content=[
                        ImageContent(
                            image=f"data:image/png;base64,{base64.b64encode(image_bytes).decode('utf-8')}"
                        )
                    ],
                )
                await self.update_chat_ctx(chat_ctx)
                print("Image received", self.chat_ctx.copy().to_dict(exclude_image=False))
            except Exception as e:
                logger.error("Error processing image: %s", e)
    
    
    async def entrypoint(ctx: JobContext):
        await ctx.connect()
        
        session = AgentSession()
        await session.start(
            agent=VisionAssistant(),
            room=ctx.room,
            room_input_options=RoomInputOptions(
                video_enabled=True
            ),
        )
    
    
    if __name__ == "__main__":
        cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
    For the UI interface, I’m using
    agent-starter-react
    package from here. My agent is joining the room and can communicate with me properly, but whenever I share my screen and ask the bot if it’s visible, it says it can’t see my screen or anything I’m showing. It keeps replying with something like, β€œI can't see anything.” So, is there any issue with my
    agent.py
    file, or could it be something else?
  • g

    gorgeous-gpu-30432

    08/08/2025, 12:47 PM
    @refined-appointment-81829 @victorious-nest-89511 @tall-belgium-91876 @dry-elephant-14928 @able-branch-17267 @refined-toddler-89382 We have a few questions about details of gemini and livekit components: β€’ When using gemini we found that there are literals for gemini voices defined here , could we use new voices of gemini if it wasn't defined in this Literal of we should await google plugin update to include new voices ? what's the best strategies here? β€’ Based on google docs , there are 3 different modes for function calling explained here , what's the default function call mode and how can we override it? β€’ Would there be an update for livekit to include support for elastic search? β€’ We need to have a subtitle feature for our application and currently we are streaming LLM response chunks in our transcription node so to have a synchournized subtitle but there are always some trade off between the final stream and the subtitle (subtitle jumps ahead of the actual audio text stream) is there a proposed fix for this?
  • m

    mysterious-van-40803

    08/08/2025, 1:25 PM
    Hi @refined-appointment-81829 @tall-belgium-91876, I’m working on integrating LiveKit Agents Cloud into my project so an autonomous agent worker can join interview rooms. Here’s my current setup: β€’ App: Creates a room via /api/prepare-room (Next.js API route) before participants join. I also use Node.js, but since I see that it's not really supported (v1) I am working on python refactor. β€’ We include roomConfig.agents = [{ agentName: process.env.LIVEKIT_AGENT_NAME }] in the request so the Cloud should auto-dispatch the agent worker. β€’ Frontend: Connects to the room, sends instructions to the agent over the data channel, subscribes to remote audio. β€’ Env vars in the app: LIVEKIT_URL, LIVEKIT_API_KEY, LIVEKIT_API_SECRET, LIVEKIT_AGENT_NAME, OPENAI_API_KEY. The issue: When I log into cloud.livekit.io and open my project, I don’t see the Agents tab in the left sidebar (between β€œRecordings” and β€œSettings”). Without this, I can’t create or view workers, configure dispatch rules, or monitor agent jobs. What I’ve checked so far: β€’ I’m in LiveKit Cloud (not self-hosted). β€’ I’ve confirmed I’m in the correct project that my API keys belong to. β€’ I’ve looked in Project Settings but don’t see an β€œEnable Agents” option. Could you please check if Agents Cloud is enabled for my account/project? If not, could you enable it so I can deploy my worker and set up dispatch?
    t
    • 2
    • 4