How to dynamically update instructions during runtime LiveKit Community #ask-ai

Join Slack

How to dynamically update instructions during runt...

# ask-ai

victorious-cat-76169

08/05/2025, 4:42 AM

How to dynamically update instructions during runtime

tall-airline-16756

08/05/2025, 4:42 AM

Thanks for reaching out, Sarthak. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

08/05/2025, 4:42 AM

To dynamically update instructions during runtime in LiveKit, you can use the

update_instructions

method available on the

Agent

class. This method allows you to modify the agent's instructions while it is running. Here's how you can do it: • Using the
update_instructions
method: Call this method with a new set of instructions. It works for both realtime and non-realtime modes.

Copy code

await agent.update_instructions("New instructions for the agent")

• Example Implementation: You can update instructions during processing or based on conditions. For instance:

Copy code

from livekit.agents import Agent
from livekit.agents.voice import AgentSession

class DynamicInstructionsAgent(Agent):
    async def on_enter(self):
        await self.update_instructions("New instructions for the agent")

    async def llm_node(...):
        await self.update_instructions("Updated instructions during processing")

async def entrypoint(ctx: JobContext):
    session = AgentSession()
    await session.start(
        agent=DynamicInstructionsAgent(),
        room=ctx.room
    )

• Realtime Updates: The method automatically updates instructions for the current realtime session. • Error Handling: A

RealtimeError

is raised if updating instructions fails. If you need further assistance or have a specific use case, let me know! Sources: examples/voice_agents/dynamic_tool_creation.py

victorious-cat-76169

08/05/2025, 4:43 AM

where is the soruce code for this

tall-airline-16756

08/05/2025, 4:44 AM

Hey there! Just wanted to let you know that the source code for the

Agent

class and the

update_instructions

method can be found in the LiveKit repository at this path: File:

livekit-agents/livekit/agents/voice/agent.py

You can check it out by visiting the LiveKit Agents repository and browsing to that file. If you have any questions or need a hand with how to use or implement it, just give me a shout—I’m here to help!

victorious-cat-76169

11/02/2025, 3:49 AM

Here are detailed answers to your questions about LiveKit AgentSession: 1. Turn Detection with Semantic Detector: When

turn_detection

is present, VAD silence still serves as the primary trigger for end-of-turn evaluation. The semantic detector modulates the timing by selecting a delay (

min_endpointing_delay

max_endpointing_delay

) based on its confidence level. VAD silence is required to finalize the turn. 2. VAD-Only Mode: In VAD-only mode, turn finalization is determined by VAD silence plus

min_endpointing_delay

. The

max_endpointing_delay

parameter does not apply in this mode. 3. Barge-In Triggers: VAD (audio) triggers first for barge-ins. If VAD detects speech but STT hasn't emitted enough words to meet

min_interruption_words

, the interruption proceeds, and the agent enters a false interruption recovery phase. This ensures responsiveness while managing false positives. 4. Interaction of
min_consecutive_speech_delay
and
min_silence_duration
: The sources do not specify how these parameters interact or which takes precedence. They appear to operate at different levels, with

min_silence_duration

affecting VAD silence detection and

min_consecutive_speech_delay

regulating timing between speech chunks. 5. Defaults for 8 kHz Telephony: Specific defaults for 8 kHz telephony are not provided. To achieve <400 ms perceived endpointing latency, you may need to reduce

min_endpointing_delay

below 500 ms, but this increases the risk of false barge-ins. Resampling to 16 kHz may also be necessary for reliable VAD and turn detection. 6. Preemptive Generation and Barge-Ins: To avoid frequent false interruptions, configure

false_interruption_timeout

(e.g., 1 second) and enable

resume_false_interruption

. Adjust

min_interruption_duration

and

min_interruption_words

to balance responsiveness and stability. Testing in noisy environments is recommended. 7. State Machine Diagram: A diagram or truth table of the internal state machine is not available. However, the flow involves VAD detecting speech, STT providing transcripts, optional turn detection influencing timing, and AgentSession finalizing or interrupting based on these inputs. Let me know if you'd like a reconstructed flowchart or pseudocode representation. Please feel free to reach out if you need any more details or if there's anything else I can help clarify! I'm here to support you. How to resample to 16kHZ

victorious-cat-76169

11/02/2025, 3:49 AM

Also, when LiveKit emits transcripts — at what point is does when it has determined that it is EOT

tall-airline-16756

11/02/2025, 3:50 AM

I'm happy to help you with resampling audio to 16kHz in LiveKit! You can use the

AudioResampler

class available in the LiveKit SDKs. Here's a friendly example using the Python SDK:

Copy code

from livekit.rtc import AudioResampler, AudioResamplerQuality

# Create a resampler to convert audio to 16kHz
resampler = AudioResampler(input_sample_rate=48000, output_sample_rate=16000, quality=AudioResamplerQuality.MEDIUM)

# Push audio frames into the resampler
# `audio_data` should be a chunk of PCM audio data (e.g., from a file or stream)
output_frames = resampler.push(bytearray(audio_data))

# After all input is processed, flush the resampler to get remaining output
output_frames += resampler.flush()

If you’d like a bit more customization, you can also try libraries like

pydub

. Here’s a simple way to do that:

Copy code

from pydub import AudioSegment

def resample_audio(audio_bytes: bytes, original_sample_rate: int, target_sample_rate: int) -> bytes:
    resampled_audio = AudioSegment.from_raw(
        io.BytesIO(audio_bytes),
        sample_width=2,
        frame_rate=original_sample_rate,
        channels=1,
    ).set_frame_rate(target_sample_rate)
    return resampled_audio.raw_data

Now, about when LiveKit emits transcripts — it happens at two key moments: • Interim Transcripts: These come through during speech processing whenever interim results are ready. • Final Transcripts: These are sent when the system detects the end-of-turn (EOT), which is based on things like VAD silence, STT signals, or real-time model detection. The end-of-turn detection works by considering: • VAD Silence: It looks for sustained silence to know when to finalize a turn. • Endpointing Delays: It waits for either

min_endpointing_delay

max_endpointing_delay

depending on confidence levels. • EOU Detection: This happens when final transcripts or end-of-speech events are received. If anything here feels unclear or if you want to dive deeper into any part, please don’t hesitate to ask. I’m here to help you every step of the way! Sources: Processing raw media tracks | LiveKit docs | livekit-rtc/tests/test_resampler.py | AI Agents Framework | livekit/livekit_composite | DeepWiki | Real-time AI Models & Streaming | livekit/livekit_composite | DeepWiki | livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/common.py

victorious-cat-76169

11/02/2025, 3:54 AM

Is it by default available in LiveKit rooms

tall-airline-16756

11/02/2025, 3:55 AM

I'm happy to share that transcript emission is enabled by default in LiveKit rooms when you’re using

AgentSession

with voice capabilities. The transcriptions are published in real time to the room through the

lk.transcription

text stream topic and are perfectly synchronized with the agent's speech output. That said, audio resampling to 16kHz isn’t a default feature at the room level—the default audio sample rate is actually 24kHz. But don’t worry! Resampling to 16kHz happens automatically at the plugin level when it’s needed by specific services like Speech-to-Text (STT) integrations. For instance, the Google Realtime API plugin and Gladia STT plugin take care of this resampling seamlessly for you. If you have any questions or need a hand with enabling particular features, please feel free to reach out—I’m here to help! Sources: Text and transcriptions | LiveKit docs | AI Agents Framework | livekit/livekit_composite | DeepWiki | livekit-rtc/livekit/rtc/audio_resampler.py | livekit-agents/livekit/agents/voice/agent_activity.py

victorious-cat-76169

11/02/2025, 3:56 AM

So we can resample the voice emitted by TTS

tall-airline-16756

11/02/2025, 3:56 AM

Absolutely, TTS output can be resampled in LiveKit, and the great news is that this process happens automatically whenever there’s a difference between your TTS instance’s native sample rate and the output rate you want. For instance, you can easily set up resampling by specifying the

sample_rate

parameter when you initialize the

FallbackAdapter

with multiple TTS instances:

Copy code

tts = FallbackAdapter(
    tts=[tts_instance1, tts_instance2],
    sample_rate=44100  # This triggers resampling if instances have different rates
)

If you don’t specify a

sample_rate

, no worries—the adapter will automatically pick the highest sample rate among your TTS instances. The resampling happens smoothly behind the scenes using the

AudioResampler

class during the audio processing pipeline. If you have any questions or need a hand with setting up TTS or resampling, just let me know—I’m here to help! Sources: examples/other/text-to-speech/rime_tts.py

victorious-cat-76169

11/02/2025, 3:57 AM

But what does Twilio does Twilio resamples it to 8kHz

tall-airline-16756

11/02/2025, 3:58 AM

Thanks for reaching out! I wasn’t able to find specific information about whether Twilio resamples TTS output to 8kHz. If you have any more details or particular needs regarding how Twilio behaves, please feel free to share them. I’m here to help and can guide you through LiveKit configurations or suggest alternatives that might work best for you.

3 Views

Open in Slack

Previous Next