llm node is not yielding text to transcription_nod...
# ask-ai
f
llm node is not yielding text to transcription_node in livekit
t
Thanks for reaching out, Shashwat. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
I'm here to help you with the issue where the
llm_node
isn't passing text to the
transcription_node
in LiveKit. There could be a few reasons for this, so let’s walk through some important points together: • First, please make sure the agent is set up correctly with transcription enabled. • Also, double-check that the
transcription_node
hasn’t been customized in a way that might filter or block the content. • It’s important to confirm that
RoomOutputOptions.transcription_enabled
is set to
True
. Just to clarify, the
llm_node
generates text output which should be captured into a text channel and then forwarded to the
transcription_node
. By default, the
transcription_node
simply yields the text it receives. If you’ve made any customizations to it, please ensure it still yields the processed text chunks properly to keep the flow going smoothly. If you’d like, feel free to share your specific setup or any customizations you’ve made, and I’d be more than happy to guide you through the next steps. You’re not alone in this! Sources: Docs::Agents | livekit-agents/livekit/agents/voice/io.py | agents/src/voice/agent_activity.ts | Docs::Agents
f
I have room_output_options=RoomOutputOptions(transcription_enabled=True), I also modify the llm_node to give structured output
t
Thanks for sharing that you have
RoomOutputOptions(transcription_enabled=True)
set and have adjusted the
llm_node
for structured output. Let’s walk through some friendly steps to help make sure the structured output flows smoothly to the `transcription_node`: 1. First, double-check that your
transcription_node
is properly overridden to handle structured output. A helpful approach is using a function like
process_structured_output
to parse the JSON and pull out the key fields (like
response
). 2. Next, make sure your LLM is actually returning output in the JSON format you expect. If it sends plain text or something different, it might get ignored. It’s a good idea to log the raw output from
llm_node
so you can see exactly what’s coming through. 3. Also, confirm that
transcriptionEnabled
is set to
True
and that you have a valid
TextOutput
sink connected to your session output. 4. Finally, adding some logging inside both
llm_node
and
transcription_node
can really help trace how data flows and verify that
transcription_node
is receiving input and producing a non-empty output stream. If you’d like, I’m here to help you dig deeper! Feel free to share the specific structure of your LLM output and any customizations you’ve made in your
transcription_node
, and we can troubleshoot together. Sources: livekit-agents/livekit/agents/voice/generation.py | agents/src/voice/agent_activity.ts
f
Does the tts_node and transcription_node receive their inputs in the same way or is there a difference?
t
I’d love to help you understand how the
tts_node
and
transcription_node
in LiveKit work, as they handle inputs a bit differently because they serve different purposes: • TTS Node: This one works with plain text streams (
AsyncIterable[str]
) to create speech. It’s focused on turning text into audio frames and doesn’t directly manage structured or timestamped data. If you want, you can customize it to filter structured output and extract just the parts that can be synthesized. • Transcription Node: This node deals with the final transcription output, which might be plain strings or
TimedString
objects (
AsyncIterable[str | TimedString]
). It’s great for syncing text with speech playback, especially when TTS-aligned transcription is turned on. Plus, it lets you do some post-processing, like cleaning up or formatting the text. Both nodes can handle structured output with helpers like
process_structured_output
, but the
transcription_node
stands out because it supports timing data for synchronization. If you have any questions or want help customizing these nodes, just let me know—I’m here for you! Sources: Docs::Agents | Text and transcriptions | LiveKit Docs | Text and transcriptions | LiveKit Docs | livekit-agents/livekit/agents/voice/agent_activity.py | Docs::Agents
f
But how do they receive it in the flow. does llm_node send it to both of them using
yield
?
t
Thanks for reaching out! I’d be happy to explain how this works. The
llm_node
doesn’t send its output directly to both the
tts_node
and
transcription_node
using
yield
. Instead, LiveKit uses a clever stream-splitting method with
tee
to duplicate the output stream from the `llm_node`: 1. The
llm_node
generates an asynchronous text stream using
yield
. 2. This stream is then split into two separate streams using `tee`: ◦ One stream goes to the
tts_node
for speech synthesis. ◦ The other stream goes to the
transcription_node
to finalize and forward transcriptions. This setup allows both nodes to work with the same content simultaneously without repeating LLM calls or getting in each other’s way. Also, if TTS-aligned transcription is turned on, the
transcription_node
might get timed transcription data from the TTS engine instead of the original text stream. Please don’t hesitate to ask if you’d like me to walk you through any part of this flow or if there’s anything else I can help with! Sources: livekit-agents/livekit/agents/voice/agent_activity.py | agents/src/voice/agent.ts | Docs::Agents | Pipeline nodes and hooks | LiveKit Docs
f
My issue is tts_node is receiving the text input correctly but transcription_node is not receiving anything. What can cause this?
t
It sounds like you’re encountering an issue where the
tts_node
is receiving input, but the
transcription_node
isn’t. Let’s walk through some possible reasons and helpful steps to troubleshoot this together: 1. Stream Splitting Issue: Since the LLM output is split into two streams using
tee
, if the transcription stream gets closed or canceled, it won’t receive any data. It’s a good idea to double-check that both branches of the split are being properly consumed. 2. Custom
transcription_node
Logic:
If you’ve customized the
transcription_node
, please make sure it’s yielding output as expected. Adding some logging can really help confirm that input is being received and processed correctly. 3. TTS-Aligned Transcription: When
use_tts_aligned_transcript=True
is enabled, the transcription input might rely on TTS-aligned data. Please verify that your TTS engine supports this feature, or try disabling it temporarily to see if that makes a difference. 4. Transcription Disabled: It’s worth confirming that
RoomOutputOptions(transcription_enabled=True)
is set, and that your frontend is properly listening to the
lk.transcription
topic. 5. Interruption or Cancellation: If the agent’s speech gets interrupted, the transcription stream might be canceled. Checking your logs for any interruption events could provide some clues. To help narrow down the issue, I recommend enabling logging in your
transcription_node
and trying out the default implementation as a test. Please feel free to reach out if you want to go over these steps together or need any further assistance—I’m here to help! Sources: examples/voice_agents/structured_output.py | Docs::Agents | Text and transcriptions | LiveKit Docs