I'm trying to get speaker diarization to work with...
# ask-ai
b
I'm trying to get speaker diarization to work with speechmatics and I'm following all the correct docs... here's my AgentSession setup
Copy code
session = AgentSession(
            llm=realtime.RealtimeModel(
                voice="alloy",
                turn_detection=TurnDetection(
                    type="semantic_vad",
                    eagerness="low",
                    interrupt_response=False,
                ),
                modalities=["text"],
            ),
            tts=openai.TTS(voice="alloy"),
            stt=speechmatics.STT(
                transcription_config=speechmatics.types.TranscriptionConfig(
                    language="en",
                    operating_point="enhanced",
                    enable_partials=True,
                    max_delay=0.7,
                    diarization="speaker",
                    speaker_diarization_config={"max_speakers": 2},
                )
            ),
        )
And I'm trying to read that text inside the
on_user_speech_transcribed
method that I register in my agent code with
Copy code
async def on_enter(self) -> None:
        <http://logger.info|logger.info>("[MitraCouplesAgent] Entering room")
        
        # 🎯 Hybrid transcript handling approach (Scenario 4)
        # Primary: Handle user speech with participant info when available
        self.session.on("user_input_transcribed", self.on_user_speech_transcribed)
But I'm not seeing the diarization happen or any speaker tags
Copy code
def on_user_speech_transcribed(self, event: UserInputTranscribedEvent):
        """🎤 Primary handler for user speech with participant identification"""
        try:
            if not event.is_final:
                logger.debug("🔄 Received interim transcript, skipping...")
                return
            
            participant = self.participants.determine_participant(event)
            
            <http://logger.info|logger.info>(f"🎤 User speech transcribed from participant: {participant.identity}")
            <http://logger.info|logger.info>(f"🔑 Auth token available: {participant.ctx is not None}")
            <http://logger.info|logger.info>(f"📝 Transcript: {event.transcript}")
            <http://logger.info|logger.info>(f"user_input_transcribed: \"[{event.speaker_id}]: {event.transcript}\"")
The log for user_input_transcribed returns "None" for speaker_id. Why?
t
Thanks for reaching out, Nima. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
Hey there! If you’re seeing
speaker_id
return
None
in the
user_input_transcribed
event, it’s probably because the Speechmatics STT setup isn’t quite aligned—especially around the diarization settings, which might not be properly applied or could be getting overridden. To get speaker diarization working smoothly, try using the modern constructor parameters for the STT configuration like this:
Copy code
stt = speechmatics.STT(
    enable_diarization=True,                # This part is super important
    diarization_sensitivity=0.5,
    speaker_active_format="<{speaker_id}>{text}",
    # ... other settings
)
Also, a quick heads-up: avoid mixing the older
transcription_config
parameter with the newer
enable_diarization
flag, since that can quietly disable diarization without any warnings. If you want, I’m here to help you troubleshoot further—just let me know! Sources: livekit-agents/livekit/agents/stt/stt.py | Speechmatics STT integration guide | LiveKit Docs | Events and error handling | LiveKit Docs