I m trying to get speaker diarization to work with speechmat LiveKit Community #ask-ai

I'm trying to get speaker diarization to work with...

breezy-fish-3124

08/20/2025, 12:59 AM

I'm trying to get speaker diarization to work with speechmatics and I'm following all the correct docs... here's my AgentSession setup

Copy code

session = AgentSession(
            llm=realtime.RealtimeModel(
                voice="alloy",
                turn_detection=TurnDetection(
                    type="semantic_vad",
                    eagerness="low",
                    interrupt_response=False,
                ),
                modalities=["text"],
            ),
            tts=openai.TTS(voice="alloy"),
            stt=speechmatics.STT(
                transcription_config=speechmatics.types.TranscriptionConfig(
                    language="en",
                    operating_point="enhanced",
                    enable_partials=True,
                    max_delay=0.7,
                    diarization="speaker",
                    speaker_diarization_config={"max_speakers": 2},
                )
            ),
        )

And I'm trying to read that text inside the

on_user_speech_transcribed

method that I register in my agent code with

Copy code

async def on_enter(self) -> None:
        <http://logger.info|logger.info>("[MitraCouplesAgent] Entering room")
        
        # 🎯 Hybrid transcript handling approach (Scenario 4)
        # Primary: Handle user speech with participant info when available
        self.session.on("user_input_transcribed", self.on_user_speech_transcribed)

But I'm not seeing the diarization happen or any speaker tags

Copy code

def on_user_speech_transcribed(self, event: UserInputTranscribedEvent):
        """🎤 Primary handler for user speech with participant identification"""
        try:
            if not event.is_final:
                logger.debug("🔄 Received interim transcript, skipping...")
                return
            
            participant = self.participants.determine_participant(event)
            
            <http://logger.info|logger.info>(f"🎤 User speech transcribed from participant: {participant.identity}")
            <http://logger.info|logger.info>(f"🔑 Auth token available: {participant.ctx is not None}")
            <http://logger.info|logger.info>(f"📝 Transcript: {event.transcript}")
            <http://logger.info|logger.info>(f"user_input_transcribed: \"[{event.speaker_id}]: {event.transcript}\"")

The log for user_input_transcribed returns "None" for speaker_id. Why?

tall-airline-16756

08/20/2025, 12:59 AM

Thanks for reaching out, Nima. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

08/20/2025, 1:00 AM

Hey there! If you’re seeing

speaker_id

return

None

in the

user_input_transcribed

event, it’s probably because the Speechmatics STT setup isn’t quite aligned—especially around the diarization settings, which might not be properly applied or could be getting overridden. To get speaker diarization working smoothly, try using the modern constructor parameters for the STT configuration like this:

Copy code

stt = speechmatics.STT(
    enable_diarization=True,                # This part is super important
    diarization_sensitivity=0.5,
    speaker_active_format="<{speaker_id}>{text}",
    # ... other settings
)

Also, a quick heads-up: avoid mixing the older

transcription_config

parameter with the newer

enable_diarization

flag, since that can quietly disable diarization without any warnings. If you want, I’m here to help you troubleshoot further—just let me know! Sources: livekit-agents/livekit/agents/stt/stt.py | Speechmatics STT integration guide | LiveKit Docs | Events and error handling | LiveKit Docs

Open in Slack

Previous Next