victorious-cat-76169
08/05/2025, 4:42 AMtall-airline-16756
08/05/2025, 4:42 AMtall-airline-16756
08/05/2025, 4:42 AMupdate_instructions method available on the Agent class. This method allows you to modify the agent's instructions while it is running. Here's how you can do it:
• Using the update_instructions method: Call this method with a new set of instructions. It works for both realtime and non-realtime modes.
await agent.update_instructions("New instructions for the agent")
• Example Implementation: You can update instructions during processing or based on conditions. For instance:
from livekit.agents import Agent
from livekit.agents.voice import AgentSession
class DynamicInstructionsAgent(Agent):
async def on_enter(self):
await self.update_instructions("New instructions for the agent")
async def llm_node(...):
await self.update_instructions("Updated instructions during processing")
async def entrypoint(ctx: JobContext):
session = AgentSession()
await session.start(
agent=DynamicInstructionsAgent(),
room=ctx.room
)
• Realtime Updates: The method automatically updates instructions for the current realtime session.
• Error Handling: A RealtimeError is raised if updating instructions fails.
If you need further assistance or have a specific use case, let me know!
Sources: examples/voice_agents/dynamic_tool_creation.pyvictorious-cat-76169
08/05/2025, 4:43 AMtall-airline-16756
08/05/2025, 4:44 AMAgent class and the update_instructions method can be found in the LiveKit repository at this path:
File: livekit-agents/livekit/agents/voice/agent.py
You can check it out by visiting the LiveKit Agents repository and browsing to that file.
If you have any questions or need a hand with how to use or implement it, just give me a shout—I’m here to help!victorious-cat-76169
11/02/2025, 3:49 AMturn_detection is present, VAD silence still serves as the primary trigger for end-of-turn evaluation. The semantic detector modulates the timing by selecting a delay (min_endpointing_delay or max_endpointing_delay) based on its confidence level. VAD silence is required to finalize the turn. 2. VAD-Only Mode: In VAD-only mode, turn finalization is determined by VAD silence plus min_endpointing_delay. The max_endpointing_delay parameter does not apply in this mode. 3. Barge-In Triggers: VAD (audio) triggers first for barge-ins. If VAD detects speech but STT hasn't emitted enough words to meet min_interruption_words, the interruption proceeds, and the agent enters a false interruption recovery phase. This ensures responsiveness while managing false positives. 4. Interaction of min_consecutive_speech_delay and min_silence_duration: The sources do not specify how these parameters interact or which takes precedence. They appear to operate at different levels, with min_silence_duration affecting VAD silence detection and min_consecutive_speech_delay regulating timing between speech chunks. 5. Defaults for 8 kHz Telephony: Specific defaults for 8 kHz telephony are not provided. To achieve <400 ms perceived endpointing latency, you may need to reduce min_endpointing_delay below 500 ms, but this increases the risk of false barge-ins. Resampling to 16 kHz may also be necessary for reliable VAD and turn detection. 6. Preemptive Generation and Barge-Ins: To avoid frequent false interruptions, configure false_interruption_timeout (e.g., 1 second) and enable resume_false_interruption. Adjust min_interruption_duration and min_interruption_words to balance responsiveness and stability. Testing in noisy environments is recommended. 7. State Machine Diagram: A diagram or truth table of the internal state machine is not available. However, the flow involves VAD detecting speech, STT providing transcripts, optional turn detection influencing timing, and AgentSession finalizing or interrupting based on these inputs. Let me know if you'd like a reconstructed flowchart or pseudocode representation. Please feel free to reach out if you need any more details or if there's anything else I can help clarify! I'm here to support you.
How to resample to 16kHZvictorious-cat-76169
11/02/2025, 3:49 AMtall-airline-16756
11/02/2025, 3:50 AMAudioResampler class available in the LiveKit SDKs. Here's a friendly example using the Python SDK:
from livekit.rtc import AudioResampler, AudioResamplerQuality
# Create a resampler to convert audio to 16kHz
resampler = AudioResampler(input_sample_rate=48000, output_sample_rate=16000, quality=AudioResamplerQuality.MEDIUM)
# Push audio frames into the resampler
# `audio_data` should be a chunk of PCM audio data (e.g., from a file or stream)
output_frames = resampler.push(bytearray(audio_data))
# After all input is processed, flush the resampler to get remaining output
output_frames += resampler.flush()
If you’d like a bit more customization, you can also try libraries like pydub. Here’s a simple way to do that:
from pydub import AudioSegment
def resample_audio(audio_bytes: bytes, original_sample_rate: int, target_sample_rate: int) -> bytes:
resampled_audio = AudioSegment.from_raw(
io.BytesIO(audio_bytes),
sample_width=2,
frame_rate=original_sample_rate,
channels=1,
).set_frame_rate(target_sample_rate)
return resampled_audio.raw_data
Now, about when LiveKit emits transcripts — it happens at two key moments:
• Interim Transcripts: These come through during speech processing whenever interim results are ready.
• Final Transcripts: These are sent when the system detects the end-of-turn (EOT), which is based on things like VAD silence, STT signals, or real-time model detection.
The end-of-turn detection works by considering:
• VAD Silence: It looks for sustained silence to know when to finalize a turn.
• Endpointing Delays: It waits for either min_endpointing_delay or max_endpointing_delay depending on confidence levels.
• EOU Detection: This happens when final transcripts or end-of-speech events are received.
If anything here feels unclear or if you want to dive deeper into any part, please don’t hesitate to ask. I’m here to help you every step of the way!
Sources: Processing raw media tracks | LiveKit docs | livekit-rtc/tests/test_resampler.py | AI Agents Framework | livekit/livekit_composite | DeepWiki | Real-time AI Models & Streaming | livekit/livekit_composite | DeepWiki | livekit-plugins/livekit-plugins-clova/livekit/plugins/clova/common.pyvictorious-cat-76169
11/02/2025, 3:54 AMtall-airline-16756
11/02/2025, 3:55 AMAgentSession with voice capabilities. The transcriptions are published in real time to the room through the lk.transcription text stream topic and are perfectly synchronized with the agent's speech output.
That said, audio resampling to 16kHz isn’t a default feature at the room level—the default audio sample rate is actually 24kHz. But don’t worry! Resampling to 16kHz happens automatically at the plugin level when it’s needed by specific services like Speech-to-Text (STT) integrations. For instance, the Google Realtime API plugin and Gladia STT plugin take care of this resampling seamlessly for you.
If you have any questions or need a hand with enabling particular features, please feel free to reach out—I’m here to help!
Sources: Text and transcriptions | LiveKit docs | AI Agents Framework | livekit/livekit_composite | DeepWiki | livekit-rtc/livekit/rtc/audio_resampler.py | livekit-agents/livekit/agents/voice/agent_activity.pyvictorious-cat-76169
11/02/2025, 3:56 AMtall-airline-16756
11/02/2025, 3:56 AMsample_rate parameter when you initialize the FallbackAdapter with multiple TTS instances:
tts = FallbackAdapter(
tts=[tts_instance1, tts_instance2],
sample_rate=44100 # This triggers resampling if instances have different rates
)
If you don’t specify a sample_rate, no worries—the adapter will automatically pick the highest sample rate among your TTS instances. The resampling happens smoothly behind the scenes using the AudioResampler class during the audio processing pipeline.
If you have any questions or need a hand with setting up TTS or resampling, just let me know—I’m here to help!
Sources: examples/other/text-to-speech/rime_tts.pyvictorious-cat-76169
11/02/2025, 3:57 AMtall-airline-16756
11/02/2025, 3:58 AM