Speechmatics stt is considering singles words as s...
# ask-ai
m
Speechmatics stt is considering singles words as seperate speeches 2025-08-07 053854,473 - DEBUG livekit.agents - received user transcript {"user_transcript": "what?", "language": "en"} 2025-08-07 053854,776 - DEBUG livekit.agents - received user transcript {"user_transcript": "Why", "language": "en"} 2025-08-07 053855,228 - DEBUG livekit.agents - received user transcript {"user_transcript": "are", "language": "en"} 2025-08-07 053855,514 - DEBUG livekit.agents - received user transcript {"user_transcript": "you", "language": "en"} 2025-08-07 053856,226 - DEBUG livekit.agents - received user transcript {"user_transcript": "generating", "language": "en"} 2025-08-07 053856,561 - DEBUG livekit.agents - received user transcript {"user_transcript": "so", "language": "en"} 2025-08-07 053856,988 - DEBUG livekit.agents - received user transcript {"user_transcript": "many", "language": "en"} 2025-08-07 053857,665 - DEBUG livekit.agents - received user transcript {"user_transcript": "requests", "language": "en"} 2025-08-07 053859,776 - DEBUG livekit.agents - received user transcript {"user_transcript": "?", "language": "en"} my_stt = speechmatics.STT( transcription_config=speechmatics.types.TranscriptionConfig( language="en", operating_point="enhanced", enable_partials=False, max_delay=0.7, diarization="speaker", speaker_diarization_config={"max_speakers": 2, "speaker_sensitivity": 1 , }, # Adjust as needed ) )
t
Thanks for reaching out, Usama. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
The issue of Speechmatics STT treating single words as separate speeches is likely due to the configuration of partial results and buffer delays. To address this: 1. Ensure that
enable_partials
is set to
False
to receive only final transcripts, preventing interim single-word events. 2. Consider increasing the
max_delay
value to allow more audio to be processed before finalizing transcripts, which can reduce fragmented words. 3. Optionally, modify your application to buffer interim results until a final transcript is received, then process the concatenated text. For example, you can adjust your configuration as follows:
Copy code
my_stt = speechmatics.STT(
    transcription_config=speechmatics.types.TranscriptionConfig(
        language="en",
        operating_point="enhanced",
        enable_partials=False,
        max_delay=1.0,
        diarization="speaker",
        speaker_diarization_config={"max_speakers": 2, "speaker_sensitivity": 1},
    )
)
These changes should help in receiving complete sentences instead of individual words.
m
it did not change it
t
If you’re still running into issues after turning off partials and tweaking
max_delay
, no worries—here are a few more things you can try: 1. Set
max_delay_mode="fixed"
to make sure the delay you set in
max_delay
is strictly followed. This often helps cut down on single-word segmentation. 2. Double-check that your app is correctly distinguishing between interim and final transcripts, and waits for the final one before processing. 3. Turn on the
enable_entities
parameter to get better formatting for numbers, currency, and other special entities. 4. Use
additional_vocab
to add any custom words or phrases that might be causing those single-word hiccups. 5. Play around with
max_speakers
and
speaker_sensitivity
in the
speaker_diarization_config
so it matches how many speakers you expect. Here’s an updated example config you can try out:
Copy code
my_stt = speechmatics.STT(
    transcription_config=speechmatics.types.TranscriptionConfig(
        language="en",
        operating_point="enhanced",
        enable_partials=False,
        max_delay=0.5,
        max_delay_mode="fixed",
        enable_entities=True,
        additional_vocab=[
            {"content": "example_word"}
        ],
        diarization="speaker",
        speaker_diarization_config={"max_speakers": 1, "speaker_sensitivity": 0.8},
    )
)
Also, it’s a good idea to make sure your audio is nice and clear, without background noise, since that can mess with segmentation. If you want to chat more or need a hand with anything else, just let me know—I’m here to help! Sources: Speechmatics integration guide | LiveKit Docs | livekit-plugins/livekit-plugins-aws/livekit/plugins/aws/stt.py | Speechmatics integration guide | LiveKit Docs | Speechmatics STT integration guide | LiveKit Docs
m
@session.on("user_input_transcribed") def on_transcript(event: UserInputTranscribedEvent): if event.is_final: transfer_done = getattr(session, 'agent_instance', None) if transfer_done is not None: is_transfer_done = transfer_done.transfer_done else: is_transfer_done = False timestamp = "warm" if is_transfer_done else "non-transfer" # Build the log line log_line = f"[{timestamp}]" if hasattr(event, "speaker_id") and event.speaker_id: log_line += f" (speaker: {event.speaker_id})" log_line += f" {event.transcript}\n" # Write to file with open("user_speech_log.txt", "a", encoding="utf-8") as f: f.write(log_line) i am using this to write transcript in file
t
It looks like your approach to logging final transcripts to a file is on the right track! To make sure everything runs smoothly with handling final transcripts from Speechmatics STT, here are a few helpful things to double-check: 1. Make sure you’re correctly checking the
event.is_final
property so you only log the final transcripts. 2. Use the
event.transcript
property to grab the transcript text. 3. Adding a timestamp for each log entry is a great idea for context—looks like you’re already doing that with the
timestamp
variable. 4. Double-check that the file path and permissions for
user_speech_log.txt
allow your program to write to it. Here’s a simple example to keep handy:
Copy code
@session.on("user_input_transcribed")
def on_transcript(event):
    if event.is_final:
        timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
        log_line = f"[{timestamp}] {event.transcript}\n"
        with open("user_speech_log.txt", "a", encoding="utf-8") as f:
            f.write(log_line)
If you hit any bumps or want to chat more about this, just let me know—I’m here to help!
m
even after all that it is still giving one word as a seperate speech 2025-08-07 063008,422 - DEBUG livekit.agents - received user transcript {"user_transcript": "Can", "language": "en"} 2025-08-07 063008,822 - DEBUG livekit.agents - received user transcript {"user_transcript": "you?", "language": "en"} 2025-08-07 063009,120 - DEBUG livekit.agents - received user transcript {"user_transcript": "Can I", "language": "en"} 2025-08-07 063009,522 - DEBUG livekit.agents - received user transcript {"user_transcript": "know the", "language": "en"} 2025-08-07 063009,904 - DEBUG livekit.agents - received user transcript {"user_transcript": "reason", "language": "en"} 2025-08-07 063010,204 - DEBUG livekit.agents - received user transcript {"user_transcript": "for", "language": "en"} 2025-08-07 063010,605 - DEBUG livekit.agents - received user transcript {"user_transcript": "that?", "language": "en"} 2025-08-07 063014,216 - DEBUG livekit.agents - received user transcript {"user_transcript": "You", "language": "en"} 2025-08-07 063016,311 - DEBUG livekit.agents - received user transcript {"user_transcript": ".", "language": "en"}
t
Hey there! Even after turning off partials and tweaking
max_delay
and
max_delay_mode
, if you’re still seeing single words treated as separate speeches, it could be because of a few things: 1. Partial Transcripts: Double-check that
enable_partials
is definitely set to
False
in your transcription config to stop those intermediate results from popping up. 2. Max Delay: Make sure
max_delay
is set high enough (like 1.0 seconds) so there’s enough time to process the audio before finalizing the transcript. 3. Speaker Diarization: If this is on, it might be splitting speech into chunks for different speakers. You might want to disable or adjust
speaker_diarization_config
if that’s not what you want. 4. Custom Vocabulary: Sometimes custom vocab or entity settings can cause unexpected splits, so it’s worth checking those too. Here’s a quick example of an updated configuration you can try out:
Copy code
my_stt = speechmatics.STT(
    transcription_config=speechmatics.types.TranscriptionConfig(
        enable_partials=False,
        max_delay=1.0,
        max_delay_mode="fixed",
        diarization="none",  # Disable speaker diarization if not needed
    )
)
If you’re still running into issues, it might be something happening after the transcripts come through. Feel free to share more about your setup or ask if you want me to help troubleshoot further! Sources: Speechmatics integration guide | LiveKit Docs | Speechmatics STT integration guide | LiveKit Docs
m
is there any way to apply diarization with other stt
t
Hey there! Just wanted to share that diarization can be used with some STT services in LiveKit. Here are a few options you might find helpful: 1. Deepgram: You can get diarization going by using the
diarize
option in the
STTOptions
interface. 2. Speechmatics: They make it easy by letting you set
diarization="speaker"
in the
transcription_config
. 3. Amazon Transcribe: Diarization is supported here too, though you might want to double-check the specific setup details. Just a heads-up, Google Cloud STT doesn’t support diarization right now. If you want a hand setting this up with any of these services, feel free to ask—I’m here to help! Sources: Speechmatics STT integration guide | LiveKit Docs | Speechmatics STT integration guide | LiveKit Docs | Speech-to-text (STT) integrations | LiveKit Docs | STTOptions | LiveKit Agents
m
write code for deepgram STT that applies diarization and writes transcription with speaker in txt file
t
I don't have the answer you're looking for. You could also try asking your question: • in one of the other Slack channels or • to https://deepwiki.com/livekit/livekit_composite which is trained on all LiveKit source code If you find the answer, please post it here to help others!
m
how to use diarize parameter with deepgram