How to Modify STT Transcripts Before They Reach ...
# ask-ai
j
How to Modify STT Transcripts Before They Reach the LLM in a Voice Pipeline Agent? Context: We're building a "talking stick" mode for couple's therapy where two people take turns speaking using Push-to-Talk. We need the LLM to know which person is speaking (e.g., "Albert" vs "Dad"). Current Setup: - Using AgentSession with turn_detection="manual" for PTT - Deepgram STT for transcription - Claude (Anthropic) as the LLM - We track the current speaker in metadata (currentSpeaker: "Albert") Problem: The LLM receives raw transcripts without speaker identification. We can add prefixes in the conversation_item_added event for logging, but this happens AFTER the LLM has already processed the message. What We've Tried: 1. Adding prefixes in conversation_item_added - only affects logging, not LLM input 2. Attempting to modify in user_speech_committed - but can't modify the message that reaches the LLM Question: Is there a way to intercept and modify STT transcripts (to add speaker prefixes like [Albert] or [Dad]) before they're added to the conversation context that the LLM sees? Ideally, we want the LLM to see: - [Albert] I think we should talk about... - [Dad] I disagree because... Instead of just: - I think we should talk about... - I disagree because... Code Snippet: # Current approach (doesn't affect LLM input) @session.on("conversation_item_a dded") def on_conversation_item_added(e vent): item = event.item if ptt_state["is_talking_stick"] and item.role == "user" and ptt_state["current_speaker"]: # This prefix is only for logging, doesn't reach LLM text_content = f"[{ptt_s tate['current_speaker']}] {text_content}" Is there a recommended approach for preprocessing STT output before it reaches the conversation context? --- This question clearly explains the problem and what we're trying to achieve, making it easier for LiveKit support to provide a targeted solution.
t
Thanks for reaching out, Albert. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
To tweak your STT transcripts and add speaker prefixes before they get to the LLM, you can simply override the
stt_node
method in your Agent class. This way, you can preprocess the STT output and tag each speaker nicely. Here’s a friendly walkthrough: • Enable Speaker Diarization: Make sure your STT setup supports speaker diarization so you get those speaker IDs. • Override the
stt_node
Method:
Use this to handle each transcript event and add the speaker’s name or ID right before the text. • Map Speaker IDs to Names: Keep a mapping of speaker IDs to names in
session.userdata
so you can customize it however you like. Here’s a handy example to get you started:
Copy code
from livekit.agents.voice import Agent
from livekit.plugins import speechmatics

class YourAgent(Agent):
    def __init__(self):
        super().__init__(
            stt=speechmatics.STT(
                transcription_config=speechmatics.types.TranscriptionConfig(
                    diarization="speaker",
                )
            )
        )

    async def stt_node(self, text, model_settings=None):
        async def process_transcript(transcript_stream):
            async for event in transcript_stream:
                if hasattr(event, 'type') and str(event.type) == "SpeechEventType.FINAL_TRANSCRIPT":
                    if event.alternatives and event.alternatives[0].speaker_id:
                        speaker_id = event.alternatives[0].speaker_id
                        speaker_name = self.session.userdata.speaker_names.get(speaker_id, speaker_id)
                        event.alternatives[0].text = f"[{speaker_name}] {event.alternatives[0].text}"
                yield event

        return process_transcript(super().stt_node(text, model_settings))
With this in place, the LLM will get transcripts that look like this:
Copy code
[Albert] I think we should talk about...
[Dad] I disagree because...
If you have any questions or want to dive deeper, just let me know—I’m here to help! Sources: Speechmatics STT integration guide | LiveKit Docs | pipeline-llm/transcription_node.py