< millions hair 24956> I am using gemini realtime voice agen LiveKit Community #ask-ai

<@U084APK40NS> I am using gemini realtime voice ag...

miniature-nail-35141

02/13/2025, 6:54 AM

@millions-hair-24956 I am using gemini realtime voice agent. the voice responses are good. but the transcriptions comming in log is not much accurate.how to get accurate transcriptions

gray-monkey-83525

02/13/2025, 6:55 AM

To improve transcription accuracy with LiveKit's voice agents, consider the following: 1. Audio Quality: Ensure high-quality audio input. Adjust the audio settings to increase the sample rate or bitrate, as poor audio quality can lead to inaccurate transcriptions. According to this discussion, adjusting the audio quality settings can help. 2. Transcription Options: Use the

AgentTranscriptionOptions

to fine-tune transcription settings. You can adjust parameters like

agent_transcription_speed

and use appropriate tokenizers for better results, as detailed in the LiveKit documentation. Would you like more detailed guidance on configuring these settings? Sources: - LiveKit Slack Discussion - LiveKit Pipeline Agent Documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

miniature-nail-35141

02/13/2025, 6:56 AM

can you give code for this

miniature-nail-35141

02/13/2025, 6:56 AM

@millions-hair-24956 can you give code for this

gray-monkey-83525

02/13/2025, 6:56 AM

To improve transcription accuracy using LiveKit's

AgentTranscriptionOptions

, you can configure the transcription settings as follows:

Copy code

from livekit.agents.pipeline import AgentTranscriptionOptions
from livekit.agents.pipeline import VoicePipelineAgent
from livekit.plugins import deepgram, openai, cartesia, silero

# Define transcription options
transcription_options = AgentTranscriptionOptions(
    user_transcription=True,
    agent_transcription=True,
    agent_transcription_speed=1.0,  # Adjust speed to match natural speech
    sentence_tokenizer=tokenize.basic.SentenceTokenizer(),
    word_tokenizer=tokenize.basic.WordTokenizer(ignore_punctuation=False)
)

# Initialize the VoicePipelineAgent with the transcription options
agent = VoicePipelineAgent(
    vad=silero.VAD.load(),
    stt=deepgram.STT(),
    llm=openai.LLM(),
    tts=cartesia.TTS(),
    agent_transcription_options=transcription_options
)

# Start the agent in a room
agent.start(room, participant)

This code sets up a

VoicePipelineAgent

with customized transcription options to improve accuracy. Adjust the

agent_transcription_speed

and tokenizers as needed for your specific use case. Would you like more details on any specific part of this setup? Sources: - LiveKit Pipeline Agent Documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

Open in Slack

Previous Next