LiveKit Community

Which part in particular is poor? The latency? Accuracy? One trade-off we made is optimizing for latency over accuracy. We send smaller audio context windows (I can’t remember exactly, but less than the recommended 100ms) to Google. The goal with KITT is to get as close to real, human-like interaction as possible. Depending on your needs, you may want to tweak the context window being sent to STT. We also may have some other configuration parameters set on the Google Cloud side which affect performance (latency or accuracy) in some way. <@U022KB72NFJ> knows more than I do here.

from the guideline, the audio window is 20ms as I remembered