Thanks for reaching out! While LiveKit doesn’t come with a built-in plugin to directly run the open-source Whisper-3 model, you can definitely integrate it by hosting the model yourself and setting it up as a WebSocket or HTTP API endpoint. Here’s a friendly step-by-step to help you along:
•
Host the Whisper-3 Model: You can deploy the model on your own infrastructure or a hosted platform, just make sure it supports real-time transcription through a streaming API.
•
Expose a WebSocket API: Set up an endpoint that takes in binary audio frames and sends back transcription results in JSON format.
•
Adapt the Baseten Plugin: Use the Baseten STT plugin as a starting point to connect LiveKit to your Whisper-3 endpoint. You’ll want to tweak the plugin to fit your server’s input and output format.
•
Alternative: If your Whisper-3 model is running as an HTTP API that matches OpenAI’s transcription format, you can simply use the LiveKit OpenAI STT plugin with a custom base URL.
If you’d like, I’m here to help you with setting up the WebSocket or HTTP endpoint, or if you want some tips on adapting the plugin—just let me know!
Sources:
Groq STT integration guide | LiveKit Docs |
fal STT integration guide | LiveKit Docs |
livekit-plugins/livekit-plugins-baseten/livekit/plugins/baseten/stt.py |
plugins/openai/src/stt.ts