```{"message": "inference is slower than realtime"...
# ask-ai
p
Copy code
{
  "message": "inference is slower than realtime",
  "level": "WARNING",
  "name": "livekit.plugins.silero",
  "delay": 0.4698081036210059,
  "pid": 188,
  "job_id": "AJ_",
  "timestamp": "2025-08-29T03:4"
}
What is this
t
Thanks for reaching out, Avinash. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
p
Sure
t
Hey there! That warning message "inference is slower than realtime" in
livekit.plugins.silero
basically means that the voice activity detection (VAD) is taking a bit longer to process the audio than the actual length of the audio itself. This can make real-time audio processing feel a little less snappy. This warning pops up when the processing time goes over 200ms. To give you an idea, with a 16kHz sample rate, Silero VAD handles 512 samples at once (which is about 32ms of audio). So, if it takes more than 32ms to process that chunk and the delay adds up beyond 200ms, you’ll see this warning. Here are a few friendly tips to help fix this: • Try using compute-optimized hardware instead of burstable instances when you’re running things in production. • Since the Silero plugin is built to run on CPU, you can actually force it to use the CPU right from the start. • Prewarming the VAD model during agent setup can also help reduce the load time when it’s actually running. Don’t worry, this warning won’t stop your processing—it’s just a heads-up that things might slow down if the system stays under heavy load for a while. Sources: LiveKit turn detector plugin | LiveKit Docs | livekit-plugins/livekit-plugins-aws/livekit/plugins/aws/experimental/realti... | plugins/silero/CHANGELOG.md
Hey there! That warning message, "inference is slower than realtime," popping up in
livekit.plugins.silero
is just letting you know that the voice activity detection (VAD) is taking a bit longer to process audio frames than the actual length of those frames in real time. This might cause your real-time audio processing to feel a little less snappy. Here’s what’s happening: the warning shows up when inference takes more than 0.2 seconds (200ms). For instance, with a 16kHz sample rate, Silero VAD handles 512 samples at once (which is about 32ms of audio). If it takes longer than 32ms to process that chunk and the delay piles up beyond 200ms, you’ll see this warning. But don’t worry! To help smooth things out, you can: • Use compute-optimized hardware instead of burstable instances when you’re running this in production. • Remember, the Silero plugin is made to run on CPU, and you can even force it to use CPU when you start it up. • Prewarm the VAD model during agent initialization — this helps cut down on load time during inference. This warning won’t stop your processing; it’s just a friendly heads-up that things might slow down if the system stays under heavy load for a while.
p
How to do these: • Use compute-optimized hardware instead of burstable instances when you’re running this in production. • Remember, the Silero plugin is made to run on CPU, and you can even force it to use CPU when you start it up. • Prewarm the VAD model during agent initialization — this helps cut down on load time during inference.
t
Here’s a friendly guide on how you can implement the requested optimizations: 1. Switch to compute-optimized hardware instead of burstable instances: For your production environment, try using compute-optimized instances (like AWS C5 or GCP n2-standard) instead of burstable ones (such as AWS T2/T3). This way, you’ll get steady performance for your real-time processing tasks. 2. Make sure the Silero plugin uses the CPU during initialization: In Python, you can do this with:
Copy code
from livekit.plugins import silero

vad = silero.VAD.load(force_cpu=True)
And in Node.js:
Copy code
import { VAD } from '@livekit/agents';

const vad = await VAD.load({ forceCPU: true });
This makes sure the plugin explicitly runs on the CPU for inference. 3. Prewarm the VAD model when initializing the agent: In Python, you can use the
prewarm
function like this:
Copy code
def prewarm(proc: JobProcess):
    proc.userdata["vad"] = silero.VAD.load()

def entrypoint(ctx: JobContext):
    session = AgentSession(vad=ctx.proc.userdata["vad"])
    # rest of your agent logic
And in Node.js:
Copy code
prewarm: async (proc: JobProcess) => {
  proc.userData.vad = await VAD.load();
},
entry: async (ctx: JobContext) => {
  const vad = ctx.proc.userData.vad as VAD;
  const session = new AgentSession({ vad });
}
This approach loads the model just once per worker process, which helps cut down latency for later jobs. If you have any questions or want to walk through these steps together, just let me know—I’m here to help! Sources: Silero VAD plugin | LiveKit Docs | livekit-plugins/livekit-plugins-silero/livekit/plugins/silero/py.typed | plugins/silero/src/vad.ts | Silero VAD plugin | LiveKit Docs | Worker options | LiveKit Docs