How to dynamically update instructions during runt...
# ask-ai
v
How to dynamically update instructions during runtime
t
Thanks for reaching out, Sarthak. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
To dynamically update instructions during runtime in LiveKit, you can use the
update_instructions
method available on the
Agent
class. This method allows you to modify the agent's instructions while it is running. Here's how you can do it: • Using the
update_instructions
method:
Call this method with a new set of instructions. It works for both realtime and non-realtime modes.
Copy code
await agent.update_instructions("New instructions for the agent")
Example Implementation: You can update instructions during processing or based on conditions. For instance:
Copy code
from livekit.agents import Agent
from livekit.agents.voice import AgentSession

class DynamicInstructionsAgent(Agent):
    async def on_enter(self):
        await self.update_instructions("New instructions for the agent")

    async def llm_node(...):
        await self.update_instructions("Updated instructions during processing")

async def entrypoint(ctx: JobContext):
    session = AgentSession()
    await session.start(
        agent=DynamicInstructionsAgent(),
        room=ctx.room
    )
Realtime Updates: The method automatically updates instructions for the current realtime session. • Error Handling: A
RealtimeError
is raised if updating instructions fails. If you need further assistance or have a specific use case, let me know! Sources: examples/voice_agents/dynamic_tool_creation.py
v
where is the soruce code for this
t
Hey there! Just wanted to let you know that the source code for the
Agent
class and the
update_instructions
method can be found in the LiveKit repository at this path: File:
livekit-agents/livekit/agents/voice/agent.py
You can check it out by visiting the LiveKit Agents repository and browsing to that file. If you have any questions or need a hand with how to use or implement it, just give me a shout—I’m here to help!
v
Here are detailed answers to your questions about LiveKit AgentSession: 1. Turn Detection with Semantic Detector: When
turn_detection
is present, VAD silence still serves as the primary trigger for end-of-turn evaluation. The semantic detector modulates the timing by selecting a delay (
min_endpointing_delay
or
max_endpointing_delay
) based on its confidence level. VAD silence is required to finalize the turn. 2. VAD-Only Mode: In VAD-only mode, turn finalization is determined by VAD silence plus
min_endpointing_delay
. The
max_endpointing_delay
parameter does not apply in this mode. 3. Barge-In Triggers: VAD (audio) triggers first for barge-ins. If VAD detects speech but STT hasn't emitted enough words to meet
min_interruption_words
, the interruption proceeds, and the agent enters a false interruption recovery phase. This ensures responsiveness while managing false positives. 4. Interaction of
min_consecutive_speech_delay
and
min_silence_duration
: The sources do not specify how these parameters interact or which takes precedence. They appear to operate at different levels, with
min_silence_duration
affecting VAD silence detection and
min_consecutive_speech_delay
regulating timing between speech chunks. 5. Defaults for 8 kHz Telephony: Specific defaults for 8 kHz telephony are not provided. To achieve <400 ms perceived endpointing latency, you may need to reduce
min_endpointing_delay
below 500 ms, but this increases the risk of false barge-ins. Resampling to 16 kHz may also be necessary for reliable VAD and turn detection. 6. Preemptive Generation and Barge-Ins: To avoid frequent false interruptions, configure
false_interruption_timeout
(e.g., 1 second) and enable
resume_false_interruption
. Adjust
min_interruption_duration
and
min_interruption_words
to balance responsiveness and stability. Testing in noisy environments is recommended. 7. State Machine Diagram: A diagram or truth table of the internal state machine is not available. However, the flow involves VAD detecting speech, STT providing transcripts, optional turn detection influencing timing, and AgentSession finalizing or interrupting based on these inputs. Let me know if you'd like a reconstructed flowchart or pseudocode representation. Please feel free to reach out if you need any more details or if there's anything else I can help clarify! I'm here to support you. How to resample to 16kHZ
Also, when LiveKit emits transcripts — at what point is does when it has determined that it is EOT
t
I'm happy to help you with resampling audio to 16kHz in LiveKit! You can use the
AudioResampler
class available in the LiveKit SDKs. Here's a friendly example using the Python SDK:
Copy code
from livekit.rtc import AudioResampler, AudioResamplerQuality

# Create a resampler to convert audio to 16kHz
resampler = AudioResampler(input_sample_rate=48000, output_sample_rate=16000, quality=AudioResamplerQuality.MEDIUM)

# Push audio frames into the resampler
# `audio_data` should be a chunk of PCM audio data (e.g., from a file or stream)
output_frames = resampler.push(bytearray(audio_data))

# After all input is processed, flush the resampler to get remaining output
output_frames += resampler.flush()
If you’d like a bit more customization, you can also try libraries like
pydub
. Here’s a simple way to do that:
Copy code
from pydub import AudioSegment

def resample_audio(audio_bytes: bytes, original_sample_rate: int, target_sample_rate: int) -> bytes:
    resampled_audio = AudioSegment.from_raw(
        io.BytesIO(audio_bytes),
        sample_width=2,
        frame_rate=original_sample_rate,
        channels=1,
    ).set_frame_rate(target_sample_rate)
    return resampled_audio.raw_data
Now, about when LiveKit emits transcripts — it happens at two key moments: • Interim Transcripts: These come through during speech processing whenever interim results are ready. • Final Transcripts: These are sent when the system detects the end-of-turn (EOT), which is based on things like VAD silence, STT signals, or real-time model detection. The end-of-turn detection works by considering: • VAD Silence: It looks for sustained silence to know when to finalize a turn. • Endpointing Delays: It waits for either
min_endpointing_delay
or
max_endpointing_delay
depending on confidence levels. • EOU Detection: This happens when final transcripts or end-of-speech events are received. If anything here feels unclear or if you want to dive deeper into any part, please don’t hesitate to ask. I’m here to help you every step of the way! Sources: Processing raw media tracks | LiveKit docs | livekit-rtc/tests/test_resampler.py | AI Agents Framework | livekit/livekit_composite | DeepWiki | Real-time AI Models &amp; Streaming | livekit/livekit_composite | DeepWiki | livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/common.py
v
Is it by default available in LiveKit rooms
t
I'm happy to share that transcript emission is enabled by default in LiveKit rooms when you’re using
AgentSession
with voice capabilities. The transcriptions are published in real time to the room through the
lk.transcription
text stream topic and are perfectly synchronized with the agent's speech output. That said, audio resampling to 16kHz isn’t a default feature at the room level—the default audio sample rate is actually 24kHz. But don’t worry! Resampling to 16kHz happens automatically at the plugin level when it’s needed by specific services like Speech-to-Text (STT) integrations. For instance, the Google Realtime API plugin and Gladia STT plugin take care of this resampling seamlessly for you. If you have any questions or need a hand with enabling particular features, please feel free to reach out—I’m here to help! Sources: Text and transcriptions | LiveKit docs | AI Agents Framework | livekit/livekit_composite | DeepWiki | livekit-rtc/livekit/rtc/audio_resampler.py | livekit-agents/livekit/agents/voice/agent_activity.py
v
So we can resample the voice emitted by TTS
t
Absolutely, TTS output can be resampled in LiveKit, and the great news is that this process happens automatically whenever there’s a difference between your TTS instance’s native sample rate and the output rate you want. For instance, you can easily set up resampling by specifying the
sample_rate
parameter when you initialize the
FallbackAdapter
with multiple TTS instances:
Copy code
tts = FallbackAdapter(
    tts=[tts_instance1, tts_instance2],
    sample_rate=44100  # This triggers resampling if instances have different rates
)
If you don’t specify a
sample_rate
, no worries—the adapter will automatically pick the highest sample rate among your TTS instances. The resampling happens smoothly behind the scenes using the
AudioResampler
class during the audio processing pipeline. If you have any questions or need a hand with setting up TTS or resampling, just let me know—I’m here to help! Sources: examples/other/text-to-speech/rime_tts.py
v
But what does Twilio does Twilio resamples it to 8kHz
t
Thanks for reaching out! I wasn’t able to find specific information about whether Twilio resamples TTS output to 8kHz. If you have any more details or particular needs regarding how Twilio behaves, please feel free to share them. I’m here to help and can guide you through LiveKit configurations or suggest alternatives that might work best for you.