explain how wordtokenizer and sentence tokenizer w...
# ask-ai
a
explain how wordtokenizer and sentence tokenizer works in transcription in voicepipeline
t
Thanks for reaching out, Harshita. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
In the LiveKit Agents JS transcription pipeline, the word tokenizer and sentence tokenizer play distinct roles: • Sentence Tokenizer: This splits speech into sentences and determines when to mark a transcript as final for agent transcription. It is specified in the
AgentTranscriptionOptions
interface as
sentenceTokenizer: tokenize.SentenceTokenizer
. • Word Tokenizer: This splits speech into words and is used to simulate the "interim results" of the agent transcription. It is specified as
wordTokenizer: tokenize.WordTokenizer
. Both tokenizers are essential for processing and finalizing transcriptions effectively. Sources: AgentTranscriptionOptions | LiveKit Agents | livekit.agents.pipeline.pipeline_agent API documentation
a
so is it use before llm
or after llm in voicepipeline
t
Yes, the word tokenizer and sentence tokenizer are used before the content is passed to a large language model (LLM) in the transcription pipeline. Their roles are as follows: • Sentence Tokenizer: Determines when a transcript is finalized by identifying complete sentences. • Word Tokenizer: Breaks speech into individual words to simulate interim transcription results before a complete sentence is formed. These processes help structure the transcription data effectively before any further processing, such as by an LLM. Sources: livekit.agents.multimodal API documentation
I'm sorry, but I don't have the information you need at the moment. If you would like a member of the LiveKit team to respond, please ask your question in one of the other channels.
a
agent = VoicePipelineAgent( vad=ctx.proc.userdata["vad"], stt=ctx.proc.userdata["stt"], llm=ctx.proc.userdata["llm"], tts=ctx.proc.userdata["tts"], # Using the prewarmed OpenAI TTS min_endpointing_delay=0.0001, interrupt_min_words=0, # noise_cancellation=noise_cancellation.NC(), # max_endpointing_delay=3, # allow_interruptions=True, # interrupt_speech_duration=0.5, chat_ctx=initial_ctx, preemptive_synthesis=True, # before_llm_cb=truncate_context, transcription=AgentTranscriptionOptions( user_transcription=True, # Enable user STT transcriptions agent_transcription=True, # Enable AI-generated transcriptions agent_transcription_speed=1.0, # Control transcription processing speed sentence_tokenizer=ctx.proc.userdata["sentence_tokenizer"], word_tokenizer=ctx.proc.userdata["word_tokenizer"], hyphenate_word=lambda word: word.replace("-", " "), # Optional: Modify word tokenization rules ), # before_tts_cb=_before_tts_cb, fnc_ctx=CallActions(api=ctx.api, participant=participant, room=ctx.room), ) is this the right way im using it
what is iterium results