jolly-apartment-95291
08/14/2025, 1:20 PMtall-airline-16756
08/14/2025, 1:20 PMtall-airline-16756
08/14/2025, 1:21 PMasync def _run(self) -> None:
retryable = True
request_id = utils.shortuuid()
max_retries = 3 # Feel free to tweak this based on what works best for you
retry_count = 0
try:
while retry_count < max_retries:
try:
# Generate content stream
stream = await self._client.aio.models.generate_content_stream(
model=self._model,
contents=cast(types.ContentListUnion, turns),
config=config,
)
async for response in stream:
if response.prompt_feedback:
raise APIStatusError(
response.prompt_feedback.json(),
retryable=False,
request_id=request_id,
)
if (
not response.candidates
or not response.candidates[0].content
or not response.candidates[0].content.parts
):
logger.warning(f"no candidates in the response: {response}")
# Let’s pause a bit before retrying
await asyncio.sleep(1)
retry_count += 1
continue
# When we get valid responses
if len(response.candidates) > 1:
logger.warning("Multiple candidates, using the first one.")
for part in response.candidates[0].content.parts:
chat_chunk = self._parse_part(request_id, part)
if chat_chunk is not None:
retryable = False
self._event_ch.send_nowait(chat_chunk)
break # Success! Exit the retry loop
except APIStatusError as e:
if not e.retryable:
raise
retry_count += 1
if retry_count >= max_retries:
raise
await asyncio.sleep(1)
# If usage metadata is available, process it here
if response.usage_metadata is not None:
usage = response.usage_metadata
self._event_ch.send_nowait(
llm.ChatChunk(
id=request_id,
usage=llm.CompletionUsage(
completion_tokens=usage.candidates_token_count or 0,
prompt_tokens=usage.prompt_token_count or 0,
prompt_cached_tokens=usage.cached_content_token_count or 0,
total_tokens=usage.total_token_count or 0,
),
)
)
except ClientError as e:
# Handling client errors gracefully
raise APIStatusError(
"gemini llm: client error",
status_code=e.code,
body=f"{e.message} {e.status}",
retryable=False if e.code != 429 else True,
) from e
except ServerError as e:
# Handling server errors
raise APIStatusError(
"gemini llm: server error",
status_code=e.code,
body=f"{e.message} {e.status}",
) from e
Here are the key takeaways:
• Try up to max_retries
times if no candidates show up.
• Use asyncio.sleep(1)
to take a short breather between retries.
• Once valid candidates arrive, break out of the loop and handle the response.
Feel free to adjust the max_retries
to whatever fits your app best. And hey, if you want to chat more or need any extra help, just let me know—I’m here for you!jolly-apartment-95291
08/14/2025, 1:21 PMtall-airline-16756
08/14/2025, 1:22 PMFallbackAdapter
in LiveKit can jump in and switch to backup LLM providers if your main one hits a snag. It uses settings like attempt_timeout
, max_retry_per_llm
, and `retry_interval`—though these handle retries across different providers, not retries within the same provider.
Here are a few tips to help out:
• Try using the FallbackAdapter
with multiple LLM providers so if one has trouble, it smoothly falls back to another.
• Consider adding some custom logic in your app to catch those "no candidates" responses and retry as you see fit.
• Keep an eye on your primary LLM provider’s health and response quality to reduce these warnings as much as possible.
If you want, I’m here to help you set up the FallbackAdapter
or brainstorm some custom retry logic. Just let me know!
Sources: LLM Integrations | livekit/livekit_composite | DeepWiki | livekit-plugins/livekit-plugins-google/livekit/plugins/google/beta/gemini_t...