We are currently running into an issue where we di...
# ask-ai
b
We are currently running into an issue where we disconnect a user from a room, the agent disconnects from a room but we can still see the session active and the user can't create any new sessions for other rooms.
t
Thanks for reaching out, Chris. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
The issue where a user disconnects from a LiveKit room but the session remains active, preventing the user from creating new sessions for other rooms, can stem from several causes and can be addressed through the following troubleshooting steps: 1. Check the disconnection reason: Use the
RoomDisconnectedEvent
to identify why the disconnection occurred, since reasons could include network issues or specific server errors. 2. Handle exceptions gracefully: Some exceptions like
ThrottlingException
or
ModelTimeoutException
might require restarting the session in order to clear the stale state. 3. Review session and participant management settings: Verify configuration parameters such as
close_on_disconnect
that impact session closures to ensure no unexpected session persistence. 4. Verify Redis and session tracking: Ensure that Redis (if used) is properly managing session states to avoid blocking new session creation after a disconnect. Reviewing logs associated with the disconnect event and these configurations will help systematically diagnose and resolve the issue. Sources: RoomDisconnectedEvent constructor - RoomDisconnectedEvent - livekit_client ... | room(_:didDisconnectWithError:) | Documentation | livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/realtime/real... | livekit-plugins/livekit-plugins-openai/livekit/plugins/openai/llm.py | RoomEvent | Disconnected
b
Can you help me find what might cause it in this code.
Copy code
async def on_enter(self):
        """When the agent enters, get pipeline ID from job context"""

        # Get pipeline_id from job context with retry logic
        metadata = await self._get_room_metadata_with_retry()
        pipeline_id = metadata.get("pipeline_id")

        if not pipeline_id:
            logger.error(f"Pipeline ID not found in room metadata: {metadata}")
            await self.session.say(
                "Sorry, there was a configuration error. Please try again."
            )
            return

        <http://logger.info|logger.info>(f"Agent entered room with pipeline ID: {pipeline_id}")


def prewarm(proc: JobProcess):
    """Prewarm function to initialize models"""
    proc.userdata["vad"] = silero.VAD.load()


async def entrypoint(ctx: JobContext):
    """Main entry point for the agent"""
    # Configure logging once at application entry point
    log_level = os.getenv("LOG_LEVEL", "INFO").upper()
    logging.basicConfig(
        level=getattr(logging, log_level, <http://logging.INFO|logging.INFO>),
        format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
        handlers=[logging.StreamHandler()],
    )

    # Enable debug logging for LiveKit components
    logging.getLogger("livekit").setLevel(logging.DEBUG)
    logging.getLogger("livekit.agents").setLevel(logging.DEBUG)
    logging.getLogger("livekit.plugins").setLevel(logging.DEBUG)

    # Reduce noise from third-party libraries
    logging.getLogger("httpcore").setLevel(logging.WARNING)
    logging.getLogger("httpx").setLevel(logging.WARNING)
    logging.getLogger("urllib3").setLevel(logging.WARNING)
    logging.getLogger("openai._base_client").setLevel(logging.WARNING)

    ctx.log_context_fields = {
        "room": ctx.room.name,
    }

    # Validate required environment variables
    required_vars = ["OPENAI_API_KEY"]
    for var in required_vars:
        if not os.getenv(var):
            raise ValueError(f"{var} environment variable is required")

    # Get OpenAI configuration from environment variables (only for STT/TTS, not LLM)
    openai_key = os.getenv("OPENAI_API_KEY")
    stt_model = os.getenv("STT_MODEL", "gpt-4o-transcribe")
    tts_model = os.getenv("TTS_MODEL", "tts-1")
    tts_voice = os.getenv("TTS_VOICE", "shimmer")

    session = AgentSession(
        vad=ctx.proc.userdata["vad"],
        # LLM needed for session but actual processing happens in our custom llm_node
        llm=openai.LLM(model="gpt-4o-mini", api_key=openai_key),
        stt=openai.STT(model=stt_model, api_key=openai_key),
        tts=openai.TTS(model=tts_model, voice=tts_voice, api_key=openai_key),
        turn_detection=MultilingualModel(),
    )

    # Log metrics
    usage_collector = metrics.UsageCollector()

    @session.on("metrics_collected")
    def _on_metrics_collected(ev: MetricsCollectedEvent):
        metrics.log_metrics(ev.metrics)
        usage_collector.collect(ev.metrics)

    async def log_usage():
        summary = usage_collector.get_summary()
        <http://logger.info|logger.info>(f"Usage: {summary}")

    ctx.add_shutdown_callback(log_usage)

    # ERROR HANDLING EVENTS - Track session close events for debugging
    @session.on("close")
    def on_close(event):
        room_name = ctx.room.name
        if event.error:
            logger.error(
                f"=== SESSION CLOSED WITH ERROR === Room: {room_name} | Error: {event.error} | Error type: {type(event.error).__name__} | Reason: {event.reason}"
            )
        else:
            <http://logger.info|logger.info>(
                f"=== SESSION CLOSED NORMALLY === Room: {room_name} | Reason: {event.reason}"
            )

    @session.on("error")
    def on_error(event):
        room_name = ctx.room.name
        recoverable = getattr(event.error, "recoverable", "unknown")
        source = getattr(event, "source", "unknown")
        logger.error(
            f"=== SESSION ERROR === Room: {room_name} | Error: {event.error} | Source: {source} | Recoverable: {recoverable}"
        )

    await session.start(agent=AiriaVoiceProxyAgent(), room=ctx.room)

    await ctx.connect()


if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint, prewarm_fnc=prewarm))
t
Hey there! I totally get how frustrating it can be when sessions get stuck and block new connections. Let me break down what might be happening in your code and share some friendly tips to help you resolve it. • Resource cleanup on session close: When you're trying to delete a room, it looks like errors might be slipping through. If
r.deleteRoom()
doesn't work smoothly, that room could hang around and prevent new sessions from starting. No worries, we can fix this! • Incomplete rollback on errors: Sometimes things can get a bit messy if an error pops up midway through setting up a session. Your room might end up stuck in the active rooms map, creating a bit of a roadblock. • Race conditions during session creation: Even with locking in place, those tricky concurrent sessions can sometimes cause unexpected hiccups. Here are some friendly suggestions to help you smooth things out: 1. Add some extra care to roll back room and session creation if something goes wrong. 2. Make sure your async cleanup is thorough and catches any potential errors. 3. Do a careful walkthrough of your session lifecycle to ensure everything gets cleaned up properly. I'm confident that by working through these steps, you'll be able to track down and resolve those persistent sessions. You've got this! Sources: pkg/service/roommanager.go