how do the agent taking voice input using what for...
# ask-ai
d
how do the agent taking voice input using what format and methodology is being used to process speech
t
Thanks for reaching out, Arjun. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
LiveKit handles voice input through a detailed audio pipeline that includes: • Voice Activity Detection (VAD): This helps spot when someone is actually speaking in the audio stream. • Speech-to-Text (STT): It turns what you say into text using STT models. • Text-to-Speech (TTS): Converts responses back into audio, so you can hear them, using TTS models. • Real-time Capabilities: Thanks to WebRTC, audio transfers happen super quickly with minimal delay. • Customization: Developers can pick their favorite STT providers, tweak TTS settings, and connect with different LLMs to fit their needs. Depending on the setup, audio formats like PCM, ALAW, and ULAW are used. The process involves streaming audio continuously, using VAD to efficiently detect speech, and ensuring low-latency processing for smooth, natural interactions. If you'd like, I’m here to help guide you through implementing any part of these features—just let me know! Sources: Agents Core Architecture | livekit/livekit_composite | DeepWiki | Voice Assistants | livekit/livekit_composite | DeepWiki | Voice Assistants | livekit/livekit_composite | DeepWiki | Agent speech | LiveKit Docs
d
tell me the function which is capturing audio from the microphone
t
Hey there! Capturing audio from the microphone in LiveKit depends on which SDK you’re working with. Here’s a quick rundown: • JavaScript: You can use
room.localParticipant.setMicrophoneEnabled(true)
to automatically create and publish a microphone track. • Python: Try
LocalAudioTrack.create_audio_track()
with
source=rtc.AudioSource
. • Unity: Use
LocalAudioTrack.CreateAudioTrack()
along with
RtcAudioSourceType.AudioSourceMicrophone
. • Swift: Go for
LocalAudioTrack.createTrack()
and set
options.source = .microphone
. If you let me know which SDK you’re using, I’d be happy to help with more detailed guidance or examples!