LiveKit Community

Is it possible to use a model that takes audio as input and provides text as output (similar to gemini realtime)? Then use another tts? I would like to skip the STT step. Therefore audio to llm directly and then TTS without STT. I do want TTS though to make my solution more modular

Thanks for reaching out, Phanos. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.


I don't have the answer you're looking for.

You could also try asking your question:
• in one of the other Slack channels
or
• to <https://deepwiki.com/livekit/livekit_composite|https://deepwiki.com/livekit/livekit_composite> which is trained on all LiveKit source code

If you find the answer, please post it here to help others!