https://livekit.io logo
Join Slack
Powered by
# inference
  • s

    stale-gpu-14856

    10/01/2025, 7:23 PM
    Hello! Does an existing LiveKit Cloud BAA apply to models used via Inference?
    👍 2
    💡 2
    r
    • 2
    • 3
  • b

    brave-printer-20093

    10/01/2025, 9:55 PM
    Congrats on the launch team! I had a few questions up front as I'm thinking through whether/when we can utilize LiveKit Inference. 1. Classic latency consideration is to keep STT, LLM and TTS as close to our agent service as possible. What is the mental model we should have now using the Inference service? From the perspective of a self-hosted agent + LiveKit Cloud, how do we proactively ensure that we've done our part to optimize for latency? Under the Global co-location section of the blog entry announcing Inference, it seems to suggest that latency benefits only kick in when deploying agents to LiveKit Cloud... is that the case? 2. Currently we have to manage through provider outages via FallbackAdapters. Is it the intention that Inference would obviate the need for a FallbackAdapter altogether, or is it possible/recommended to configure LiveKit Inference with a fallback to provider public inference endpoints? 3. I assume LiveKit Inference will not be able to handle features like custom cloned TTS voices via its provisioned inference capacity, only the standard voices. Can you confirm whether this is the case?
    r
    • 2
    • 4
  • c

    crooked-dawn-90821

    10/02/2025, 12:46 PM
    I’m excited about the potential performance gain using livekit inference. With plugins I got used to tweaking the parameters used for the STT and TTS (voice speed for example). How can I leverage those options when using livekit inference? The syntax in the examples seems to be limited to choosing a model provider and the voice.
    ➕ 1
    r
    g
    • 3
    • 4
  • m

    many-hair-70963

    10/02/2025, 2:17 PM
    Any estimates as to performance improvements (reduced latency) when leveraging LK inference especially when using avatars?
    r
    • 2
    • 1
  • k

    kind-branch-59377

    10/02/2025, 7:37 PM
    Great job on shipping inference. Super excited for this. Love to see Orpheus-TTS (with baseten) on inference stack.
    r
    • 2
    • 1
  • q

    quaint-waitress-91864

    10/02/2025, 8:06 PM
    Is it posible to do VAD or noise cancellation with inference. I used to use the silero plugin but it seems like it's not working when I switched to inference.
    r
    • 2
    • 5
  • a

    alert-honey-38248

    10/03/2025, 6:44 AM
    Hello, how do we fetch models and voices for each provider if we use inference for stt? And how do we fetch models and supported languages for each provider if we use inference for tts?
    l
    r
    • 3
    • 4
  • b

    brave-island-45242

    10/03/2025, 7:27 PM
    Hi folks, are the models (providers) used with inference covered under BAA?
    r
    r
    • 3
    • 6
  • m

    many-forest-60185

    10/04/2025, 9:33 PM
    Switched from plugin to LK inference. STT and TTS work fine (Deepgram and Elevenlabs respectively). However, I get the following error with
    llm="openai/gpt-4.1-mini"
    :
    livekit.agents._exceptions.APIStatusError: Error proxying completions: provider: azure model: gpt-4.1-mini-provisioned, message: POST "<https://agent-gateway.cognitiveservices.azure.com/openai/v1/chat/completions>": 400 Bad Request {
    "message": "Missing required parameter: 'response_format.json_schema'.",
    "type": "invalid_request_error",
    "param": "response_format.json_schema",
    "code": "missing_required_parameter"
    }
    d
    f
    • 3
    • 13
  • g

    green-tent-71322

    10/06/2025, 2:54 AM
    I can get cartesia working in inference but i cant get any of the elevenlabs voices to work
    d
    r
    • 3
    • 4
  • r

    rough-gpu-50664

    10/06/2025, 11:59 PM
    I work on our inference service at LiveKit. If you're interested in meeting me to discuss your questions and feedback, please DM me. I'd love to meet you!
    ❤️ 2
  • b

    busy-restaurant-25888

    10/07/2025, 1:00 PM
    Hello everyone, I tried to implement inference service in my project, but it was not working. Please help me.
    Copy code
    session = AgentSession[UserData](
      userdata=userdata,
      vad=ctx.proc.userdata["vad"],
      stt="assemblyai/universal-streaming",
      llm="openai/gpt-4.1-mini",
      tts="cartesia/sonic-2:6f84f4b8-58a2-430c-8c79-688dad597532",
      turn_detection=MultilingualModel()
    )
    r
    m
    +2
    • 5
    • 9
  • h

    helpful-lizard-52428

    10/09/2025, 2:04 AM
    HI @rough-gpu-50664 I've run out of inference credits and I'm trying to figure out how to recharge them. Looking at the pricing page, it seems like LiveKit Cloud and inference are packaged together. Is there a way to get just inference credits if I don't need LiveKit Cloud services?
    r
    • 2
    • 3
  • c

    crooked-dawn-90821

    10/11/2025, 5:59 PM
    I've been trying to use inference for elevenlabs with the python SDK and it's failing silently (just no TTS is coming out but no error showing in the logs). Using the elevenlabs plugin works fine.
    r
    • 2
    • 3
  • r

    refined-scientist-34781

    10/16/2025, 12:39 PM
    Hi! We are on LiveKit ship plan for our production app. We are using inference and it’s working great so far. I’m not sure how to create a development agent though. I guess LiveKit doesn’t support environments and wants you to create a new project for development environment. That’s not ideal for us because our LiveKit subscription is not shared and we’d need to sign up for another plan for the development project. Is there any way to share Inference billing with our production project?
    r
    • 2
    • 2
  • w

    worried-knife-36498

    10/18/2025, 11:36 AM
    can i add some sort of filter that it only recognizes my voice? i.e a pre-filter or smth?
    r
    • 2
    • 3
  • r

    rough-gpu-50664

    10/21/2025, 9:25 PM
    Hi LiveKit community, if you're on our build tier (i.e. free tier) and you used up all your included inference credits, what did you do? Please click on one of the emojis below to share what you did. 💳 : you signed up for our ship or higher tier, putting in a credit card. This gets you more credits each month and allows you to pay for additional inference credits. 🔌 : you removed LiveKit Inference from your agent pipeline and put in the plugins with your own API key for each STT/LLM/TTS plugin. 🏃 : you didn't understand the error, got stuck, and moved on to a different platform Feel free to DM me with your experience, if you prefer.
    💳 3
    🔌 2
    🏃 1
  • n

    narrow-engineer-85614

    10/24/2025, 6:48 PM
    We're currently using deepgram for all of our STT work, but have had struggles with it's multilingual capabilities. Gladia seems to do better and we'd potentially want to use them. Is that on the roadmap for inference?
    p
    r
    b
    • 4
    • 4
  • b

    bulky-pager-62731

    10/29/2025, 8:23 AM
    Hi everyone! We are currently using inference model to iterate with different stt/llms/tts models, and we are on ship plan. As I was testing my application, from nowhere I got the voice agent saying "_you have hit your daily limit for advance voice"_. Now when I start my server, I'm getting logs that indicates that it cannot connect to the stt model. Even when I change the model we're using, it just gives the same error and the room closes own it's own. In the docs, I couldn't see where API limit is given for each model, I thought by choosing the ship plan we'd be able to ignore the limiting cases. Can anyone nudge me in the right direction, as to where I can see the daily limit for advance voices is given, and how to resolve this.
    r
    f
    r
    • 4
    • 21
  • m

    modern-restaurant-53740

    10/30/2025, 3:05 AM
    Does inference run only in us-east? any chance I can get access to an ap-south instance? I plan on deploying the agent worker in ap-south...would be great if inference is close by
    r
    • 2
    • 4
  • g

    gentle-traffic-46145

    10/31/2025, 2:25 PM
    Hi! Will GPT-5-Chat-latest be available soon, please? Thanks ! 🙂
    r
    • 2
    • 5
  • r

    rapid-van-16677

    11/02/2025, 5:49 AM
    Hey everyone, anyone experiencing increased latency suddenly with gemini realtime model ?
    r
    • 2
    • 1
  • q

    quick-gpu-24854

    11/04/2025, 11:17 AM
    Hey! I’m self-hosting a livekit STT-LLM-TTS agent. It works flawlessly on my machine: • starts with
    session.generateReply()
    - I can hear it speaking • then I can converse with it as expected But when deployed on Fly.io, it behaves like this: • starts with session.generateReply() - I can hear it speaking - GOOD ✅ • then as soon as I start speaking, I get this error:
    Copy code
    2025-11-04T11:06:36Z app[d8d4295f266538] fra [info]2025-11-04T11:06:36.590Z [uncaughtException] Error [ERR_IPC_CHANNEL_CLOSED]: Channel closed
    2025-11-04T11:06:36Z app[d8d4295f266538] fra [info]    at target.send (node:internal/child_process:753:16)
    2025-11-04T11:06:36Z app[d8d4295f266538] fra [info]    at InferenceProcExecutor.doInference (file:///app/node_modules/@livekit/agents/dist/ipc/inference_proc_executor.js:60:15)
    2025-11-04T11:06:36Z app[d8d4295f266538] fra [info]    at #doInferenceTask (file:///app/node_modules/@livekit/agents/dist/ipc/job_proc_executor.js:63:50)
    2025-11-04T11:06:36Z app[d8d4295f266538] fra [info]    at ChildProcess.<anonymous> (file:///app/node_modules/@livekit/agents/dist/ipc/job_proc_executor.js:49:58)
    2025-11-04T11:06:36Z app[d8d4295f266538] fra [info]    at ChildProcess.emit (node:events:531:35)
    2025-11-04T11:06:36Z app[d8d4295f266538] fra [info]    at emit (node:internal/child_process:949:14)
    2025-11-04T11:06:36Z app[d8d4295f266538] fra [info]    at process.processTicksAndRejections (node:internal/process/task_queues:91:21)
    I’ve spent a lot of time debugging this and I have no clue what goes wrong. Any ideas ? Thanks!!! ---------- This is my
    new voice.AgentSession()
    :
    Copy code
    export function createAgentSession({
      vad,
      userData,
    }: {
      vad: silero.VAD;
      userData: UserContext;
    }): voice.AgentSession {
      return new voice.AgentSession({
        stt: createSTT(),
        llm: createLLM(),
        tts: createTTS(),
        turnDetection: new livekit.turnDetector.MultilingualModel(),
        vad,
        voiceOptions: VOICE_OPTIONS,
        userData,
      });
    }
    l
    • 2
    • 2
  • e

    elegant-businessperson-51313

    11/05/2025, 11:18 AM
    Hi @rough-gpu-50664, a while back you mentioned that inference is available in each region that livekit is available. Does it mean that with agents deployments now available in EU, inference is gonna be there as well? Sorry if that was already answered somewhere else.
    r
    • 2
    • 3
  • w

    white-postman-10482

    11/11/2025, 9:37 PM
    Hey! Will elevenlabs new STT model Scribe v2 that released today be available to use with livekit?
    r
    • 2
    • 1
  • f

    future-continent-10147

    11/12/2025, 11:31 AM
    Hi guys, I am new to LiveKit and started my journey today. Been playing and got everything to work on my local dev machine and testing using the Playground. Late this afternoon I started getting the below error when I launch my agent. Even the one in Production running in LiveKit Cloud has stopepd working. Have I used all my credits on th eBuild plan as the total usage seems very low? [212023.822] WARN (110380): failed to recognize speech, retrying in 2000s tts: "inference.STT" attempt: 3 error: { "body": null, "retryable": true, "name": "APIConnectionError" }
    r
    b
    • 3
    • 2
  • g

    gentle-traffic-46145

    11/12/2025, 7:26 PM
    Hi, I have a question, and I'd appreciate it if anyone knows the answer ! 🙂 For GPT 4.1, the prompt can be cached for $0.50, but is it the same with LiveKit inference? The pricing lists input and output, but not caching. cc @rough-gpu-50664
    r
    r
    • 3
    • 5
  • b

    better-house-57730

    11/16/2025, 4:05 PM
    Hi folks, Having Livekit Agents deployed in eu-central makes the STT/LLM/TTS inference calls go to europe as well? Say we have our agent deployed in
    eu-central
    and using: • Livekit Inference - Deepgram > • Livekit Inference - Gemini > Are these going through EU endpoints or do we have to setup plugins with regional params (like
    location
    for VertexAI)? Not thinking from the compliance standpoint, but on latency: If Livekit Agents are in Europe but Deepgram/Gemini/Eleven Inference are still going to US; from the latency standpoint, it would be better to keep Livekit Agents in the US anyway, right?