I want to process the transcript coming from STT before pass LiveKit Community #ask-ai

I want to process the transcript coming from STT b...

millions-actor-17966

06/04/2025, 9:01 PM

I want to process the transcript coming from STT before passing it to the LLM and then pass the processed transcript. I was thinking that I should do this in stt_node(). How can I do this?

tall-airline-16756

06/04/2025, 9:01 PM

Thanks for reaching out, Mridul. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

06/04/2025, 9:01 PM

To process a transcript in the

stt_node

before passing it to the LLM in LiveKit, you can override the

stt_node

method in your custom Agent class. Here’s an example:

Copy code

from livekit import rtc
from livekit.agents import ModelSettings, stt, Agent
from typing import AsyncIterable, Optional

async def stt_node(
    self, audio: AsyncIterable[rtc.AudioFrame], model_settings: ModelSettings
) -> Optional[AsyncIterable[stt.SpeechEvent]]:
    async def process_speech_events():
        async for event in Agent.default.stt_node(self, audio, model_settings):
            # Process the transcript here before passing to LLM
            # For example, remove filler words, fix formatting, etc.
            if event.transcript:
                event.transcript = event.transcript.replace("um", "").replace("uh", "")
                # Add any other text processing you need
            yield event
    
    return process_speech_events()

This example demonstrates how to modify the transcript by removing filler words like "um" and "uh" before the text is sent to the LLM. You can implement any additional text processing logic you need in this node. Sources: Voice pipeline nodes | LiveKit Docs

millions-actor-17966

06/04/2025, 9:01 PM

So basically I have implemented a function which takes in the stt transcripts and returns the processed transcript. I want the processed transcript to be passed in LLM. How can I do this?

tall-airline-16756

06/04/2025, 9:01 PM

To pass your processed transcript to the LLM, you can override the

transcription_node

in your custom Agent class. This allows you to modify or clean up the transcription text before it is sent to the LLM. Here’s an example:

Copy code

from livekit.agents import ModelSettings
from typing import AsyncIterable

async def transcription_node(self, text: AsyncIterable[str], model_settings: ModelSettings) -> AsyncIterable[str]:
    async for delta in text:
        # Process the transcript here
        processed_text = your_processing_function(delta)
        yield processed_text

In this example, replace

your_processing_function

with the function you’ve implemented to process the STT transcripts. This ensures that the processed transcript is passed to the LLM. Sources: Voice pipeline nodes | LiveKit Docs

millions-actor-17966

06/04/2025, 9:03 PM

What is delta? Is it the incomplete transcript? I want to pass the final transcript to the processing_function

tall-airline-16756

06/04/2025, 9:06 PM

In the context of the

transcription_node

, the term "delta" refers to the incremental text segments that are being processed through the transcription pipeline. These segments may be incomplete transcriptions as they are streamed in real-time. If you want to process only the final transcript, you can modify your implementation to collect all the segments and process them once the transcription is complete. Here’s an example:

Copy code

from livekit.agents import ModelSettings
from typing import AsyncIterable

async def transcription_node(self, text: AsyncIterable[str], model_settings: ModelSettings) -> AsyncIterable[str]:
    final_transcript = ""
    async for delta in text:
        final_transcript += delta
    
    # Process the final transcript
    processed_text = your_processing_function(final_transcript)
    yield processed_text

In this example, the

final_transcript

variable accumulates all the text segments, and once the transcription is complete, it is passed to your processing function. This ensures that only the final transcript is processed. Sources: Voice pipeline nodes | LiveKit Docs

millions-actor-17966

06/04/2025, 9:26 PM

I noticed that trancription_node only captures what comes out of the LLM. I want to control what goes in the LLM (basically from the stt)

tall-airline-16756

06/04/2025, 9:26 PM

To control what goes into the LLM from the STT, you can override the

stt_node

in your custom Agent class. This allows you to process the STT output before it is sent to the LLM. Here’s an example:

Copy code

from livekit import rtc
from livekit.agents import ModelSettings, stt, Agent
from typing import AsyncIterable, Optional

async def stt_node(
    self, audio: AsyncIterable[rtc.AudioFrame], model_settings: ModelSettings
) -> Optional[AsyncIterable[stt.SpeechEvent]]:
    async def process_speech_events():
        async for event in Agent.default.stt_node(self, audio, model_settings):
            # Process the transcript here before passing to LLM
            if event.transcript:
                event.transcript = your_processing_function(event.transcript)
            yield event
    
    return process_speech_events()

In this example, replace

your_processing_function

with the function you’ve implemented to process the STT transcripts. This ensures that the processed transcript is passed to the LLM. Additionally, you can use manual turn control to explicitly manage when user input is sent to the LLM. Methods like

session.interrupt()

session.clear_user_turn()

, and

session.commit_user_turn()

millions-actor-17966

06/04/2025, 9:43 PM

It gives me this error: AttributeError: 'SpeechEvent' object has no attribute 'transcript'

tall-airline-16756

06/04/2025, 9:44 PM

The error occurs because the

SpeechEvent

object does not have a direct

transcript

attribute. Instead, the transcript information is contained within the

alternatives

array of the

SpeechEvent

object. You can access the transcript using the following approach:

Copy code

async def stt_node(
    self, audio: AsyncIterable[rtc.AudioFrame], model_settings: ModelSettings
) -> Optional[AsyncIterable[stt.SpeechEvent]]:
    async def process_speech_events():
        async for event in Agent.default.stt_node(self, audio, model_settings):
            # Access the transcript from the alternatives array
            if event.alternatives and event.alternatives[0].text:
                processed_text = your_processing_function(event.alternatives[0].text)
                event.alternatives[0].text = processed_text
            yield event
    
    return process_speech_events()

In this example, the transcript is accessed via

event.alternatives[0].text

, and you can apply your processing function to modify it before passing it to the LLM. Sources: livekit.agents.pipeline.pipeline_agent API documentation | SpeechEvent | LiveKit Agents

Open in Slack

Previous Next