Is it possible to use a model that takes audio as ...
# ask-ai
r
Is it possible to use a model that takes audio as input and provides text as output (similar to gemini realtime)? Then use another tts? I would like to skip the STT step. Therefore audio to llm directly and then TTS without STT. I do want TTS though to make my solution more modular
t
Thanks for reaching out, Phanos. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
I don't have the answer you're looking for. You could also try asking your question: • in one of the other Slack channels or • to https://deepwiki.com/livekit/livekit_composite which is trained on all LiveKit source code If you find the answer, please post it here to help others!