stale-afternoon-67566
02/28/2025, 1:12 AMgray-monkey-83525
02/28/2025, 1:12 AMstale-afternoon-67566
02/28/2025, 1:14 AMgray-monkey-83525
02/28/2025, 1:14 AMpunctuate
option. According to the Deepgram STTOptions documentation, the punctuate
option is a boolean that, when enabled, adds punctuation to the transcription output. This could be causing the segmentation at periods.
If you want to adjust this behavior, you might consider modifying the punctuate
setting in your Deepgram STT configuration. Here's a basic example of how you might configure the STT options:
const sttOptions = {
model: "nova-2-general",
punctuate: false, // Disable punctuation to prevent segmentation at periods
// other options...
};
Would you like more detailed instructions on how to implement this change or explore other configuration options?
Sources:
- Deepgram STTOptions documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.stale-afternoon-67566
02/28/2025, 1:22 AMstt=deepgram.STT(
smart_format=False,
punctuate=False,
),
but still it's not workinggray-monkey-83525
02/28/2025, 1:22 AMpunctuate
or smart_format
options. According to the Deepgram STT documentation, other factors such as the endpointing_ms
setting could influence how segments are created. This setting controls the endpointing behavior, which determines when the STT service considers a segment to be complete.
You might want to try adjusting the endpointing_ms
parameter to see if it affects the segmentation behavior. For example, increasing the endpointing_ms
value might reduce the frequency of new segments being created:
stt = deepgram.STT(
smart_format=False,
punctuate=False,
endpointing_ms=2000, # Adjust this value as needed
)
Would you like more detailed guidance on how to adjust this setting or explore other potential solutions?
Sources:
- Deepgram STT documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.stale-afternoon-67566
02/28/2025, 1:28 AMgray-monkey-83525
02/28/2025, 1:28 AMstale-afternoon-67566
02/28/2025, 1:29 AMgray-monkey-83525
02/28/2025, 1:29 AMstale-afternoon-67566
02/28/2025, 1:30 AMimport logging
import json
import asyncio
from dotenv import load_dotenv
from livekit.agents import (
AutoSubscribe,
JobContext,
JobProcess,
WorkerOptions,
cli,
llm,
metrics,
)
from livekit.agents.pipeline import VoicePipelineAgent
from livekit.plugins import cartesia, openai, deepgram, silero, turn_detector
from livekit import rtc
load_dotenv(dotenv_path=".env.local")
logger = logging.getLogger("voice-agent")
def prewarm(proc: JobProcess):
proc.userdata["vad"] = silero.VAD.load()
async def entrypoint(ctx: JobContext):
initial_ctx = llm.ChatContext().append(
role="system",
text=(
"You are an oral exam instructor. Your role is to:"
"1. Ask questions from the predefined list one at a time"
"2. Listen to the student's response, dig deeper into the answer if needed but just once, don't answer the question yourself and don't repeat student's answer"
"3. Move to the next question after receiving the response."
),
)
<http://logger.info|logger.info>(f"connecting to room {ctx.room.name}")
await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
# Wait for the first participant to connect
participant = await ctx.wait_for_participant()
<http://logger.info|logger.info>(f"starting voice assistant for participant {participant.identity}")
# This project is configured to use Deepgram STT, OpenAI LLM and Cartesia TTS plugins
# Other great providers exist like Cerebras, ElevenLabs, Groq, <http://Play.ht|Play.ht>, Rime, and more
# Learn more and pick the best one for your app:
# <https://docs.livekit.io/agents/plugins>
agent = VoicePipelineAgent(
vad=ctx.proc.userdata["vad"],
stt=deepgram.STT(
punctuate=False,
endpointing_ms=10000
),
llm=openai.LLM(model="gpt-4o-mini"),
tts=deepgram.TTS(),
turn_detector=turn_detector.EOUModel(),
# minimum delay for endpointing, used when turn detector believes the user is done with their turn
min_endpointing_delay=0.5,
# maximum delay for endpointing, used when turn detector does not believe the user is done with their turn
max_endpointing_delay=5.0,
chat_ctx=initial_ctx,
)
usage_collector = metrics.UsageCollector()
@agent.on("metrics_collected")
def on_metrics_collected(agent_metrics: metrics.AgentMetrics):
metrics.log_metrics(agent_metrics)
usage_collector.collect(agent_metrics)
agent.start(ctx.room, participant)
@ctx.room.on("data_received")
def on_data_received(data: rtc.DataPacket):
if data.data: # Check if data exists
try:
message = data.data.decode("utf-8")
message_json = json.loads(message)
if message_json.get("type") == "QUESTIONS":
data_obj = message_json.get("data", {})
questions = data_obj.get("questions", [])
name = data_obj.get("name", "")
is_improvized = data_obj.get("isImprovized", False)
<http://logger.info|logger.info>(f"Received exam: {name}")
<http://logger.info|logger.info>(f"Number of questions: {len(questions)}")
<http://logger.info|logger.info>(f"Exam type: {'Personalized' if is_improvized else 'Standard'}")
# Create async task for the greeting
asyncio.create_task(
agent.say(
f"Hi there! I'm Coral, your AI instructor for today's oral exam. I'll be asking you several questions about {name}. After each of your responses, I'll move on to the next question. Are you ready to begin?",
allow_interruptions=True
)
)
if not is_improvized:
initial_ctx.append(
role="system",
text=(
"Questions: " + str(questions)
),
)
except json.JSONDecodeError as e:
logger.error(f"Failed to parse JSON data: {e}")
except Exception as e:
logger.error(f"Error processing data: {e}")
if __name__ == "__main__":
cli.run_app(
WorkerOptions(
entrypoint_fnc=entrypoint,
prewarm_fnc=prewarm,
),
)
@millions-hair-24956 here is my codegray-monkey-83525
02/28/2025, 1:30 AMstale-afternoon-67566
02/28/2025, 1:31 AMgray-monkey-83525
02/28/2025, 1:31 AM