powerful-hairdresser-78108
08/29/2025, 3:43 AM{
"message": "inference is slower than realtime",
"level": "WARNING",
"name": "livekit.plugins.silero",
"delay": 0.4698081036210059,
"pid": 188,
"job_id": "AJ_",
"timestamp": "2025-08-29T03:4"
}
What is thistall-airline-16756
08/29/2025, 3:43 AMpowerful-hairdresser-78108
08/29/2025, 3:44 AMtall-airline-16756
08/29/2025, 3:44 AMlivekit.plugins.silero
basically means that the voice activity detection (VAD) is taking a bit longer to process the audio than the actual length of the audio itself. This can make real-time audio processing feel a little less snappy.
This warning pops up when the processing time goes over 200ms. To give you an idea, with a 16kHz sample rate, Silero VAD handles 512 samples at once (which is about 32ms of audio). So, if it takes more than 32ms to process that chunk and the delay adds up beyond 200ms, you’ll see this warning.
Here are a few friendly tips to help fix this:
• Try using compute-optimized hardware instead of burstable instances when you’re running things in production.
• Since the Silero plugin is built to run on CPU, you can actually force it to use the CPU right from the start.
• Prewarming the VAD model during agent setup can also help reduce the load time when it’s actually running.
Don’t worry, this warning won’t stop your processing—it’s just a heads-up that things might slow down if the system stays under heavy load for a while.
Sources: LiveKit turn detector plugin | LiveKit Docs | livekit-plugins/livekit-plugins-aws/livekit/plugins/aws/experimental/realti... | plugins/silero/CHANGELOG.mdtall-airline-16756
08/29/2025, 3:44 AMlivekit.plugins.silero
is just letting you know that the voice activity detection (VAD) is taking a bit longer to process audio frames than the actual length of those frames in real time. This might cause your real-time audio processing to feel a little less snappy.
Here’s what’s happening: the warning shows up when inference takes more than 0.2 seconds (200ms). For instance, with a 16kHz sample rate, Silero VAD handles 512 samples at once (which is about 32ms of audio). If it takes longer than 32ms to process that chunk and the delay piles up beyond 200ms, you’ll see this warning.
But don’t worry! To help smooth things out, you can:
• Use compute-optimized hardware instead of burstable instances when you’re running this in production.
• Remember, the Silero plugin is made to run on CPU, and you can even force it to use CPU when you start it up.
• Prewarm the VAD model during agent initialization — this helps cut down on load time during inference.
This warning won’t stop your processing; it’s just a friendly heads-up that things might slow down if the system stays under heavy load for a while.powerful-hairdresser-78108
08/29/2025, 3:45 AMtall-airline-16756
08/29/2025, 3:47 AMfrom livekit.plugins import silero
vad = silero.VAD.load(force_cpu=True)
And in Node.js:
import { VAD } from '@livekit/agents';
const vad = await VAD.load({ forceCPU: true });
This makes sure the plugin explicitly runs on the CPU for inference.
3. Prewarm the VAD model when initializing the agent:
In Python, you can use the prewarm
function like this:
def prewarm(proc: JobProcess):
proc.userdata["vad"] = silero.VAD.load()
def entrypoint(ctx: JobContext):
session = AgentSession(vad=ctx.proc.userdata["vad"])
# rest of your agent logic
And in Node.js:
prewarm: async (proc: JobProcess) => {
proc.userData.vad = await VAD.load();
},
entry: async (ctx: JobContext) => {
const vad = ctx.proc.userData.vad as VAD;
const session = new AgentSession({ vad });
}
This approach loads the model just once per worker process, which helps cut down latency for later jobs.
If you have any questions or want to walk through these steps together, just let me know—I’m here to help!
Sources: Silero VAD plugin | LiveKit Docs | livekit-plugins/livekit-plugins-silero/livekit/plugins/silero/py.typed | plugins/silero/src/vad.ts | Silero VAD plugin | LiveKit Docs | Worker options | LiveKit Docs