gemini llm call is complaining about max prompt token how to LiveKit Community #ask-ai

Join Slack

gemini llm call is complaining about max prompt to...

# ask-ai

witty-kangaroo-6889

08/06/2025, 2:13 AM

gemini llm call is complaining about max prompt token...how to fix this?

tall-airline-16756

08/06/2025, 2:13 AM

Thanks for reaching out, Sunil. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

08/06/2025, 2:14 AM

The "max prompt token" error occurs when the prompt exceeds the model's context size. To resolve this: • Reduce Prompt Length: Simplify or truncate your prompt to focus on the most important details. • Implement Truncation Strategies: For example, truncate the middle portion of the prompt while keeping the beginning and end to maintain context. • Adjust Model Parameters: If possible, use a model with a larger context window or adjust parameters like temperature for better results. • Monitor Token Usage: Use tools to track token usage in real-time and dynamically adjust prompts. Let me know if you need further assistance or specific guidance on implementing these steps.

witty-kangaroo-6889

08/06/2025, 2:46 AM

what does this really mean? - UsageSummary(llm_prompt_tokens=194688, llm_prompt_cached_tokens=71644, llm_completion_tokens=1366, tts_characters_count=2681, tts_audio_duration=200.03999999999974, stt_audio_duration=268.7999999999997)

tall-airline-16756

08/06/2025, 2:46 AM

Here’s a quick rundown of what the

UsageSummary

fields mean for your session: • llm_prompt_tokens: This counts the tokens in the input prompt you sent to the LLM. • llm_prompt_cached_tokens: These are tokens pulled from the cache instead of being generated fresh. • llm_completion_tokens: Tokens that the LLM generated in its response. • tts_characters_count: The number of characters in the text you fed into the Text-to-Speech (TTS) model. • tts_audio_duration: How long (in seconds) the audio output from the TTS model is. • stt_audio_duration: How long (in seconds) the audio input processed by the Speech-to-Text (STT) model is. These little metrics give you a handy way to keep track of how you’re using the LLM, TTS, and STT features during your session. If anything’s unclear or you want to dive deeper, just let me know—I’m here to help! Sources: livekit-plugins/livekit-plugins-aws/livekit/plugins/aws/llm.py

witty-kangaroo-6889

08/06/2025, 2:52 AM

the llm prompt tokens seems unusually high...the instructions provided is around 20k english characters

tall-airline-16756

08/06/2025, 2:52 AM

Hey there! The high

llm_prompt_tokens

count might be happening because of things like tokenization, cached tokens, system prompts, or some extra formatting tokens. If you could share a bit more about your input—like how it’s structured, any special formatting, or if it includes special characters or templates—that would really help us figure out what’s going on. Sources: livekit-plugins/livekit-plugins-anthropic/livekit/plugins/anthropic/llm.py | Capturing metrics | LiveKit Docs | tools/tts/tts.cpp

witty-kangaroo-6889

08/06/2025, 2:54 AM

the actually prompt is not more than 30,000 characters including additional json which is part of the instructions sent to the agent. Any additional prompt is user prompt generated by the STT pipeline. is there a flush that should be done before starting the agent?

tall-airline-16756

08/06/2025, 2:55 AM

I don't have the answer you're looking for. You could also try asking your question: • in one of the other Slack channels or • to https://deepwiki.com/livekit/livekit_composite which is trained on all LiveKit source code If you find the answer, please post it here to help others!

Open in Slack

Previous Next