millions-actor-17966
06/04/2025, 9:01 PMtall-airline-16756
06/04/2025, 9:01 PMtall-airline-16756
06/04/2025, 9:01 PMstt_node before passing it to the LLM in LiveKit, you can override the stt_node method in your custom Agent class. Here’s an example:
from livekit import rtc
from livekit.agents import ModelSettings, stt, Agent
from typing import AsyncIterable, Optional
async def stt_node(
self, audio: AsyncIterable[rtc.AudioFrame], model_settings: ModelSettings
) -> Optional[AsyncIterable[stt.SpeechEvent]]:
async def process_speech_events():
async for event in Agent.default.stt_node(self, audio, model_settings):
# Process the transcript here before passing to LLM
# For example, remove filler words, fix formatting, etc.
if event.transcript:
event.transcript = event.transcript.replace("um", "").replace("uh", "")
# Add any other text processing you need
yield event
return process_speech_events()
This example demonstrates how to modify the transcript by removing filler words like "um" and "uh" before the text is sent to the LLM. You can implement any additional text processing logic you need in this node.
Sources: Voice pipeline nodes | LiveKit Docsmillions-actor-17966
06/04/2025, 9:01 PMtall-airline-16756
06/04/2025, 9:01 PMtranscription_node in your custom Agent class. This allows you to modify or clean up the transcription text before it is sent to the LLM. Here’s an example:
from livekit.agents import ModelSettings
from typing import AsyncIterable
async def transcription_node(self, text: AsyncIterable[str], model_settings: ModelSettings) -> AsyncIterable[str]:
async for delta in text:
# Process the transcript here
processed_text = your_processing_function(delta)
yield processed_text
In this example, replace your_processing_function with the function you’ve implemented to process the STT transcripts. This ensures that the processed transcript is passed to the LLM.
Sources: Voice pipeline nodes | LiveKit Docsmillions-actor-17966
06/04/2025, 9:03 PMtall-airline-16756
06/04/2025, 9:06 PMtranscription_node, the term "delta" refers to the incremental text segments that are being processed through the transcription pipeline. These segments may be incomplete transcriptions as they are streamed in real-time.
If you want to process only the final transcript, you can modify your implementation to collect all the segments and process them once the transcription is complete. Here’s an example:
from livekit.agents import ModelSettings
from typing import AsyncIterable
async def transcription_node(self, text: AsyncIterable[str], model_settings: ModelSettings) -> AsyncIterable[str]:
final_transcript = ""
async for delta in text:
final_transcript += delta
# Process the final transcript
processed_text = your_processing_function(final_transcript)
yield processed_text
In this example, the final_transcript variable accumulates all the text segments, and once the transcription is complete, it is passed to your processing function. This ensures that only the final transcript is processed.
Sources: Voice pipeline nodes | LiveKit Docsmillions-actor-17966
06/04/2025, 9:26 PMtall-airline-16756
06/04/2025, 9:26 PMstt_node in your custom Agent class. This allows you to process the STT output before it is sent to the LLM. Here’s an example:
from livekit import rtc
from livekit.agents import ModelSettings, stt, Agent
from typing import AsyncIterable, Optional
async def stt_node(
self, audio: AsyncIterable[rtc.AudioFrame], model_settings: ModelSettings
) -> Optional[AsyncIterable[stt.SpeechEvent]]:
async def process_speech_events():
async for event in Agent.default.stt_node(self, audio, model_settings):
# Process the transcript here before passing to LLM
if event.transcript:
event.transcript = your_processing_function(event.transcript)
yield event
return process_speech_events()
In this example, replace your_processing_function with the function you’ve implemented to process the STT transcripts. This ensures that the processed transcript is passed to the LLM.
Additionally, you can use manual turn control to explicitly manage when user input is sent to the LLM. Methods like session.interrupt(), session.clear_user_turn(), and session.commit_user_turn() can help you control the flow of input.
Sources: Speech-to-text (STT) integrations | LiveKit Docs | Tree View: | dse-workbench | Turn detection and interruptions | LiveKit Docs | Tree View: | dse-workbenchmillions-actor-17966
06/04/2025, 9:43 PMtall-airline-16756
06/04/2025, 9:44 PMSpeechEvent object does not have a direct transcript attribute. Instead, the transcript information is contained within the alternatives array of the SpeechEvent object. You can access the transcript using the following approach:
async def stt_node(
self, audio: AsyncIterable[rtc.AudioFrame], model_settings: ModelSettings
) -> Optional[AsyncIterable[stt.SpeechEvent]]:
async def process_speech_events():
async for event in Agent.default.stt_node(self, audio, model_settings):
# Access the transcript from the alternatives array
if event.alternatives and event.alternatives[0].text:
processed_text = your_processing_function(event.alternatives[0].text)
event.alternatives[0].text = processed_text
yield event
return process_speech_events()
In this example, the transcript is accessed via event.alternatives[0].text, and you can apply your processing function to modify it before passing it to the LLM.
Sources: livekit.agents.pipeline.pipeline_agent API documentation | SpeechEvent | LiveKit Agents