with openai models I can do await agent say agent llm chat c LiveKit Community #ask-ai

with openai models, I can do await agent.say(agent...

kind-postman-17041

02/28/2025, 12:01 AM

with openai models, I can do await agent.say(agent.llm.chat(chat_ctx=agent.chat_ctx)) and it will speak, but when i do it with google models it doesnt say anything

gray-monkey-83525

02/28/2025, 12:01 AM

It seems likely that the issue with using Google models in your setup might be related to the configuration of the Text-to-Speech (TTS) component. When using OpenAI models, the

agent.say()

function works because the TTS is likely configured correctly to handle the output from the LLM. However, with Google models, there might be a mismatch or misconfiguration in the TTS setup. To address this, ensure that your TTS is properly configured to work with Google models. According to the LiveKit documentation, you should use the Google TTS plugin and ensure that your environment is set up with the necessary credentials, such as

GOOGLE_APPLICATION_CREDENTIALS

. Additionally, verify that the TTS is correctly initialized and that the voice settings are compatible with the Google model outputs. Would you like more detailed guidance on setting up Google TTS with your agent? Sources: - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1732022090.671149 - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1731289165.978149 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

kind-postman-17041

02/28/2025, 12:04 AM

@millions-hair-24956 i use the same tts both times (cartesia). the only thing that changes is changing LLM from gpt-4o to gemini-2.0-flash

gray-monkey-83525

02/28/2025, 12:04 AM

It might be possible that the issue arises from the way the output from the Gemini model is being processed or formatted before being sent to the Cartesia TTS. When switching from OpenAI's GPT-4o to Google's Gemini model, ensure that the output format remains consistent and compatible with the TTS requirements. It could be helpful to check if the response from the Gemini model is being correctly parsed and if any additional processing is needed before passing it to the TTS. You might also want to verify that the Cartesia TTS is correctly configured to handle the input from the Gemini model, as there might be differences in how the models output text or handle certain characters. Would you like more detailed guidance on troubleshooting this issue? Sources: - https://livekit-users.slack.com/archives/C07FVFM5NA1/p1733600283.071749 - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1731289165.978149 Note: I wasn’t able to find highly relevant data sources, but above are a few potentially relevant links. 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

Open in Slack

Previous Next