Based on the new GPT real-time model, is there a w...
# ask-ai
b
Based on the new GPT real-time model, is there a way I can configure an STT model and use the real-time with a text input and audio output? \
t
Thanks for reaching out, Sebastian. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
LiveKit makes it super easy to set up an STT model that works seamlessly with real-time GPT for both text input and audio output. Here’s a quick rundown on how you can get this going: • Built-in STT: You can use OpenAI’s Realtime API, which handles the STT for audio input right out of the box. Just pick your transcription model (like "whisper-1") and tweak settings like language and turn detection to fit your needs. • Text Input & Audio Output: Simply send text input straight to GPT-4o through the Realtime API, and it’ll respond with audio using built-in TTS. Just set
modalities=['text', 'audio']
in the RealtimeModel to make it happen. • External STT: If you want, you can also plug in an external STT service for quicker or more precise transcriptions, including interim results. These features are perfect for things like voice assistants, text-controlled voice bots, or even multilingual translators. Feel free to dive deeper in the LiveKit OpenAI guide or give it a spin yourself in the LiveKit Playground. Sources: Realtime model integrations | LiveKit Docs | Live Conversations with AI using ChatGPT and WebRTC | OpenAI and LiveKit partner to turn Advanced Voice into an API | livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/realtime/real... | OpenAI STT integration guide | LiveKit Docs
b
i mean if i add a stt=deepgram.STT llm=openai.realtime.RealtimeModel() Will it work using the deepgram stt?
t
Absolutely, you can integrate Deepgram STT with OpenAI's RealtimeModel for real-time processing. Here’s a simple way to do it: • STT Configuration: Set Deepgram as your transcription provider like this:
Copy code
stt=deepgram.STT(
    model="nova-3",
    language="en-US",
    api_key="YOUR_DEEPGRAM_API_KEY"
)
RealtimeModel Configuration: To avoid any conflicts, disable OpenAI's internal STT and turn detection like so:
Copy code
llm=openai.realtime.RealtimeModel(
    input_audio_transcription=None,
    turn_detection=None,
    api_key="YOUR_OPENAI_API_KEY"
)
Turn Detection: If you need more advanced handling, you can use LiveKit's multilingual turn detector. • Dependencies: Don’t forget to install the required plugins with this command:
Copy code
pip install "livekit-agents[deepgram]" "livekit-agents[openai]"
This setup lets Deepgram take care of the transcription while OpenAI processes the text and generates the audio output. If you have any questions or want me to walk you through it, just let me know—I’m here to help! Sources: livekit-plugins/livekit-plugins-turn-detector/README.md | Deepgram STT integration guide | LiveKit Docs | Voice AI quickstart | LiveKit Docs