how do the agent taking voice input using what format and me LiveKit Community #ask-ai

Join Slack

how do the agent taking voice input using what for...

# ask-ai

dazzling-guitar-12115

08/14/2025, 5:12 PM

how do the agent taking voice input using what format and methodology is being used to process speech

tall-airline-16756

08/14/2025, 5:12 PM

Thanks for reaching out, Arjun. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

08/14/2025, 5:13 PM

LiveKit handles voice input through a detailed audio pipeline that includes: • Voice Activity Detection (VAD): This helps spot when someone is actually speaking in the audio stream. • Speech-to-Text (STT): It turns what you say into text using STT models. • Text-to-Speech (TTS): Converts responses back into audio, so you can hear them, using TTS models. • Real-time Capabilities: Thanks to WebRTC, audio transfers happen super quickly with minimal delay. • Customization: Developers can pick their favorite STT providers, tweak TTS settings, and connect with different LLMs to fit their needs. Depending on the setup, audio formats like PCM, ALAW, and ULAW are used. The process involves streaming audio continuously, using VAD to efficiently detect speech, and ensuring low-latency processing for smooth, natural interactions. If you'd like, I’m here to help guide you through implementing any part of these features—just let me know! Sources: Agents Core Architecture | livekit/livekit_composite | DeepWiki | Voice Assistants | livekit/livekit_composite | DeepWiki | Voice Assistants | livekit/livekit_composite | DeepWiki | Agent speech | LiveKit Docs

dazzling-guitar-12115

08/14/2025, 5:23 PM

tell me the function which is capturing audio from the microphone

tall-airline-16756

08/14/2025, 5:24 PM

Hey there! Capturing audio from the microphone in LiveKit depends on which SDK you’re working with. Here’s a quick rundown: • JavaScript: You can use

room.localParticipant.setMicrophoneEnabled(true)

to automatically create and publish a microphone track. • Python: Try

LocalAudioTrack.create_audio_track()

with

source=rtc.AudioSource

. • Unity: Use

LocalAudioTrack.CreateAudioTrack()

along with

RtcAudioSourceType.AudioSourceMicrophone

. • Swift: Go for

LocalAudioTrack.createTrack()

and set

options.source = .microphone

. If you let me know which SDK you’re using, I’d be happy to help with more detailed guidance or examples!

Open in Slack

Previous Next