<@U084APK40NS> I am using gemini realtime voice ag...
# ask-ai
m
@millions-hair-24956 I am using gemini realtime voice agent. the voice responses are good. but the transcriptions comming in log is not much accurate.how to get accurate transcriptions
g
To improve transcription accuracy with LiveKit's voice agents, consider the following: 1. Audio Quality: Ensure high-quality audio input. Adjust the audio settings to increase the sample rate or bitrate, as poor audio quality can lead to inaccurate transcriptions. According to this discussion, adjusting the audio quality settings can help. 2. Transcription Options: Use the
AgentTranscriptionOptions
to fine-tune transcription settings. You can adjust parameters like
agent_transcription_speed
and use appropriate tokenizers for better results, as detailed in the LiveKit documentation. Would you like more detailed guidance on configuring these settings? Sources: - LiveKit Slack Discussion - LiveKit Pipeline Agent Documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
m
can you give code for this
@millions-hair-24956 can you give code for this
g
To improve transcription accuracy using LiveKit's
AgentTranscriptionOptions
, you can configure the transcription settings as follows:
Copy code
from livekit.agents.pipeline import AgentTranscriptionOptions
from livekit.agents.pipeline import VoicePipelineAgent
from livekit.plugins import deepgram, openai, cartesia, silero

# Define transcription options
transcription_options = AgentTranscriptionOptions(
    user_transcription=True,
    agent_transcription=True,
    agent_transcription_speed=1.0,  # Adjust speed to match natural speech
    sentence_tokenizer=tokenize.basic.SentenceTokenizer(),
    word_tokenizer=tokenize.basic.WordTokenizer(ignore_punctuation=False)
)

# Initialize the VoicePipelineAgent with the transcription options
agent = VoicePipelineAgent(
    vad=silero.VAD.load(),
    stt=deepgram.STT(),
    llm=openai.LLM(),
    tts=cartesia.TTS(),
    agent_transcription_options=transcription_options
)

# Start the agent in a room
agent.start(room, participant)
This code sets up a
VoicePipelineAgent
with customized transcription options to improve accuracy. Adjust the
agent_transcription_speed
and tokenizers as needed for your specific use case. Would you like more details on any specific part of this setup? Sources: - LiveKit Pipeline Agent Documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.