Hey <@U084APK40NS> can you help me with STT I am g...
# ask-ai
s
Hey @millions-hair-24956 can you help me with STT I am getting line break or new segment at every period (.)
g
It seems likely that the issue you're experiencing with line breaks or new segments at every period in your Speech-to-Text (STT) output could be related to how the STT service is handling sentence segmentation. This behavior might be influenced by the configuration of the STT plugin or the specific STT provider you are using. My understanding is that you might want to check the configuration settings of your STT plugin, particularly if you are using a service like Deepgram or OpenAI. According to the LiveKit documentation, different STT providers might have different default behaviors for handling punctuation and sentence segmentation. You might want to look into the settings or options available for your specific STT provider to see if there is a way to adjust how periods are handled. Would you like more detailed guidance on how to adjust these settings or explore alternative STT providers that might better suit your needs? Sources: - LiveKit documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
s
@millions-hair-24956 can you check for deepgram I am using that
g
It seems likely that the behavior you're experiencing with Deepgram's STT, where a new segment is created at every period, might be related to the configuration of the
punctuate
option. According to the Deepgram STTOptions documentation, the
punctuate
option is a boolean that, when enabled, adds punctuation to the transcription output. This could be causing the segmentation at periods. If you want to adjust this behavior, you might consider modifying the
punctuate
setting in your Deepgram STT configuration. Here's a basic example of how you might configure the STT options:
Copy code
const sttOptions = {
  model: "nova-2-general",
  punctuate: false, // Disable punctuation to prevent segmentation at periods
  // other options...
};
Would you like more detailed instructions on how to implement this change or explore other configuration options? Sources: - Deepgram STTOptions documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
s
@millions-hair-24956 I did this
Copy code
stt=deepgram.STT(
            smart_format=False,
            punctuate=False,
        ),
but still it's not working
g
It might be possible that the segmentation issue you're experiencing is not solely due to the
punctuate
or
smart_format
options. According to the Deepgram STT documentation, other factors such as the
endpointing_ms
setting could influence how segments are created. This setting controls the endpointing behavior, which determines when the STT service considers a segment to be complete. You might want to try adjusting the
endpointing_ms
parameter to see if it affects the segmentation behavior. For example, increasing the
endpointing_ms
value might reduce the frequency of new segments being created:
Copy code
stt = deepgram.STT(
    smart_format=False,
    punctuate=False,
    endpointing_ms=2000,  # Adjust this value as needed
)
Would you like more detailed guidance on how to adjust this setting or explore other potential solutions? Sources: - Deepgram STT documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
s
@millions-hair-24956 I try this as well but didn't work
g
I'm not able to find any data sources to answer your question. You might want to provide details about the specific environment or version of the Deepgram plugin you are using. If this doesn't help, please reach out to the Livekit support team for further assistance. 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
s
@millions-hair-24956 I am using this livekit-plugins-deepgram>=0.6.17
g
I'm not able to find any data sources to answer your question. You might want to provide information about any custom configurations or additional plugins you are using with Livekit. If this doesn't help, please reach out to the Livekit support team for further assistance. 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
s
Copy code
import logging
import json
import asyncio

from dotenv import load_dotenv
from livekit.agents import (
    AutoSubscribe,
    JobContext,
    JobProcess,
    WorkerOptions,
    cli,
    llm,
    metrics,
)
from livekit.agents.pipeline import VoicePipelineAgent
from livekit.plugins import cartesia, openai, deepgram, silero, turn_detector
from livekit import rtc

load_dotenv(dotenv_path=".env.local")
logger = logging.getLogger("voice-agent")


def prewarm(proc: JobProcess):
    proc.userdata["vad"] = silero.VAD.load()


async def entrypoint(ctx: JobContext):
    initial_ctx = llm.ChatContext().append(
        role="system",
        text=(
            "You are an oral exam instructor. Your role is to:"
            "1. Ask questions from the predefined list one at a time"
            "2. Listen to the student's response, dig deeper into the answer if needed but just once, don't answer the question yourself and don't repeat student's answer"
            "3. Move to the next question after receiving the response."
        ),
    )

    <http://logger.info|logger.info>(f"connecting to room {ctx.room.name}")
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)

    # Wait for the first participant to connect
    participant = await ctx.wait_for_participant()
    <http://logger.info|logger.info>(f"starting voice assistant for participant {participant.identity}")

    # This project is configured to use Deepgram STT, OpenAI LLM and Cartesia TTS plugins
    # Other great providers exist like Cerebras, ElevenLabs, Groq, <http://Play.ht|Play.ht>, Rime, and more
    # Learn more and pick the best one for your app:
    # <https://docs.livekit.io/agents/plugins>
    agent = VoicePipelineAgent(
        vad=ctx.proc.userdata["vad"],
        stt=deepgram.STT(
            punctuate=False,
            endpointing_ms=10000
        ),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=deepgram.TTS(),
        turn_detector=turn_detector.EOUModel(),
        # minimum delay for endpointing, used when turn detector believes the user is done with their turn
        min_endpointing_delay=0.5,
        # maximum delay for endpointing, used when turn detector does not believe the user is done with their turn
        max_endpointing_delay=5.0,
        chat_ctx=initial_ctx,
    )

    usage_collector = metrics.UsageCollector()

    @agent.on("metrics_collected")
    def on_metrics_collected(agent_metrics: metrics.AgentMetrics):
        metrics.log_metrics(agent_metrics)
        usage_collector.collect(agent_metrics)

    agent.start(ctx.room, participant)

    @ctx.room.on("data_received")
    def on_data_received(data: rtc.DataPacket):
        if data.data:  # Check if data exists
            try:
                message = data.data.decode("utf-8")
                message_json = json.loads(message)
                
                if message_json.get("type") == "QUESTIONS":
                    data_obj = message_json.get("data", {})
                    questions = data_obj.get("questions", [])
                    name = data_obj.get("name", "")
                    is_improvized = data_obj.get("isImprovized", False)
                    
                    <http://logger.info|logger.info>(f"Received exam: {name}")
                    <http://logger.info|logger.info>(f"Number of questions: {len(questions)}")
                    <http://logger.info|logger.info>(f"Exam type: {'Personalized' if is_improvized else 'Standard'}")
                    
                    # Create async task for the greeting
                    asyncio.create_task(
                        agent.say(
                            f"Hi there! I'm Coral, your AI instructor for today's oral exam. I'll be asking you several questions about {name}. After each of your responses, I'll move on to the next question. Are you ready to begin?",
                            allow_interruptions=True
                        )
                    )

                    if not is_improvized:
                        initial_ctx.append(
                            role="system",
                            text=(
                                "Questions: " + str(questions)
                            ),
                        )
                    
            except json.JSONDecodeError as e:
                logger.error(f"Failed to parse JSON data: {e}")
            except Exception as e:
                logger.error(f"Error processing data: {e}")

if __name__ == "__main__":
    cli.run_app(
        WorkerOptions(
            entrypoint_fnc=entrypoint,
            prewarm_fnc=prewarm,
        ),
    )
@millions-hair-24956 here is my code
g
I'm here to help with questions related to Livekit. If you have any specific questions or issues related to Livekit, please let me know, and I'll do my best to assist you. If your question is about something else, please provide more details or rephrase your question. 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
s
@millions-hair-24956 can you look at my code shared above and tell me what mistake I did that makes my whole transcript to do line break on every oeriod in STT
g
I'm not able to find any data sources to answer your question. You might want to provide details about any specific error messages or logs you are encountering. If this doesn't help, please reach out to the Livekit support team for further assistance. 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.