I'm using LiveKit + OpenAI Realtime API for a voic...
# ask-ai
b
I'm using LiveKit + OpenAI Realtime API for a voice agent. I'm getting various errors including 1) timeout, 2) cancelled, and 3) The operation couldn’t be completed(LiveKit.RpcError error 1.). Is there anything I can do to prevent these errors or better handle them when they do happen? They come in waves where one user in one conversation will basically have an unresponsive agent
t
Thanks for reaching out, Eric. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
To address the errors you're encountering with LiveKit and the OpenAI Realtime API, here are some strategies: • Timeouts: These occur when the
generate_reply
method doesn't complete within the expected time. Implement a timeout mechanism (e.g., using
asyncio.get_event_loop().call_later
) to handle this. If a timeout happens, catch the exception and decide whether to retry or log the error for debugging. • Cancelled Errors: These happen when a response is cancelled by the client or server. Use the
interrupt
method to cancel ongoing responses when needed, and handle the "cancelled" status by cleaning up resources or logging the event. • RpcError 1: This is a generic error during real-time communication. Catch and log the error details, then decide whether to retry based on the error's nature. Here's an example of how you can implement these solutions in your code:
Copy code
import asyncio
from livekit.plugins.openai import RealtimeModel

async def handle_realtime_errors():
    try:
        model = RealtimeModel()
        fut = model.generate_reply()

        # Set a timeout for the response
        async def on_timeout():
            if not fut.done():
                fut.set_exception(RealtimeError("generate_reply timed out."))

        loop = asyncio.get_event_loop()
        handle = loop.call_later(5.0, on_timeout)

        try:
            response = await fut
            # Handle the response
        except RealtimeError as e:
            # Handle the timeout or other realtime error
            print(f"Realtime error: {e}")
            # Optionally retry the request here
        finally:
            handle.cancel()

        # Cancel any ongoing response
        model.interrupt()

    except Exception as e:
        # Handle any other exceptions
        print(f"Unexpected error: {e}")

asyncio.run(handle_realtime_errors())
By implementing these measures, you can better manage these errors and improve the reliability of your voice agent. Sources: livekit-plugins/livekit-plugins-resemble/livekit/plugins/resemble/tts.py
b
We have retries (we try up to 3 times), but often the retries fail as well
t
Hey there! If your retries aren’t quite doing the trick, here are some extra tips to help boost error handling and make your system more resilient: • Use FallbackAdapter: Try setting up a fallback with multiple LLM providers so if one goes down, another can step in smoothly. For example:
Copy code
llm = llm.FallbackAdapter([
    openai.LLM(model="gpt-4o"),
    openai.LLM.with_azure(model="gpt-4o"),
    # Add other providers as needed
])
Handle Recoverable vs Non-Recoverable Errors: Keep an eye on the
recoverable
flag in error events to figure out if the session can bounce back on its own or if it needs a little human help.
Copy code
async def on_error(event: RealtimeModelError):
    if event.error.recoverable:
        print(f"Recoverable error: {event.error.message}")
    else:
        print(f"Non-recoverable error: {event.error.message}")
Implement Retries with Exponential Backoff: Set up retries that wait a bit longer each time to better handle those pesky temporary errors.
Copy code
session = AgentSession(
    llm=llm,
    conn_options=APIConnectOptions(
        max_retry=3,
        retry_interval=1.0,
        timeout=30.0
    )
)
Proactive Session Management: Keep an eye on how long sessions run and restart them before they time out to keep things running smoothly. • Use Circuit Breakers: Stop trying to use providers that are failing until they’re back up and running by adding a circuit breaker. • Log and Analyze Errors: Track errors and retry attempts to spot patterns and make your system even more reliable. • Inform Users: Use
.say()
to gently let users know if something’s up and suggest trying again. Hopefully, these ideas help make your voice agent more robust! If you want a hand setting any of these up, just give me a shout.
b
What options do we have? We are using OpenAI's realtime api vs. a STT > LLM > TTS stack. What errors are recoverable vs. non-recoverable? What might we do in either case? Would starting a new room help? How long might you suggest we wait between retries, and how many retries should we do?
t
Here are some detailed answers to your questions—hope this helps clear things up! Recoverable vs. Non-Recoverable ErrorsRecoverable Errors: These are usually temporary hiccups like API connection issues, throttling, or retryable exceptions. The system will automatically try again using an exponential backoff approach (think: waiting 1 second, then 2 seconds, and so on). • Non-Recoverable Errors: These happen with invalid requests, connection problems that can’t be fixed automatically, or serious exceptions. When these pop up, retries won’t do the trick—you’ll need to step in manually, like starting a new session or room. Starting a New Room If you hit a non-recoverable error, starting a new room is a great way to reset things, especially when the issue is tied to session state or connection glitches. It’s a handy fallback when retries just aren’t enough. Retry StrategyNumber of Retries: A good starting point is 3-5 retries, but feel free to tweak that based on what works best for your app. • Retry Interval: Use exponential backoff for spacing out retries—for example, wait 1 second before the first retry, 2 seconds before the second, and keep doubling up to a max delay (like 30 seconds). • Jitter: Adding a bit of randomness (jitter) to your retry intervals helps prevent overloading the system, especially in distributed setups. Example Retry Configuration
Copy code
APIConnectOptions(
    max_retry=5,
    # Initial retry delay in seconds
    _interval_for_retry=lambda retry_count: min(2 ** retry_count, 30)
)
By using these tips, you’ll be better equipped to handle both recoverable and non-recoverable errors smoothly. If you have any more questions or need a hand with anything else, just let me know—I’m here to help! Sources: livekit-agents/livekit/agents/stt/stt.py | src/room/DefaultReconnectPolicy.ts