I am using the voice pipeline agent But I already have a sys LiveKit Community #ask-ai

I am using the voice pipeline agent. But I already...

mammoth-smartphone-64668

02/28/2025, 3:57 PM

I am using the voice pipeline agent. But I already have a system that gets the live transcript. I don't want to use LiveKit's STT. I want to be able to manually feed in the text since it's coming from my other system, but then still use the LiveKit agent for the LLM and TTS. Is this possible? @millions-hair-24956

gray-monkey-83525

02/28/2025, 3:57 PM

Yes, it is possible to use your own system for live transcription and then feed the text into the LiveKit agent for further processing with LLM and TTS. You can achieve this by using the

before_llm_cb

callback in the

VoicePipelineAgent

to modify the chat context before it is sent to the LLM model. This allows you to inject your own transcriptions into the conversation flow. According to the VoicePipelineAgent documentation, you can set up a

before_llm_cb

callback to intercept and modify the chat context. This way, you can insert the text from your system into the chat context before it reaches the LLM. Would you like more details on how to implement this? Sources: - VoicePipelineAgent documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

mammoth-smartphone-64668

02/28/2025, 4:00 PM

But if I'm just throwing out the LiveKit transcript then isn't LiveKit doing all that transcription for nothing? It's duplicating the effort and wasting money transcribing when I'm just going to throw it out and use the transcription from my other system. @millions-hair-24956

gray-monkey-83525

02/28/2025, 4:00 PM

It might be possible to disable the STT component in the

VoicePipelineAgent

to avoid unnecessary transcription and associated costs. You could potentially create a custom subclass of the

VoicePipelineAgent

that bypasses the STT process, or configure the agent to not use STT at all. According to a Slack discussion, one approach is to remove or replace the TTS component with a no-op (no operation) subclass, which suggests a similar approach could be applied to the STT component. This would allow you to use your own transcription system without incurring costs from LiveKit's STT. Would you like more detailed guidance on how to implement this? Sources: - Slack discussion 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

mammoth-smartphone-64668

02/28/2025, 4:06 PM

Yes please, I would like more detailed guidance on how to implement this @millions-hair-24956

gray-monkey-83525

02/28/2025, 4:06 PM

To implement a solution where you bypass LiveKit's STT and use your own transcription system, you can customize the

VoicePipelineAgent

to exclude the STT component. Here's a step-by-step guide on how you might achieve this: 1. Subclass the VoicePipelineAgent: Create a custom subclass of

VoicePipelineAgent

that omits the STT component. This involves overriding the initialization to exclude STT-related setup. 2. Modify the Initialization: In your subclass, remove or bypass the STT initialization. You can set the STT parameter to

None

or a no-op equivalent if the framework allows. 3. Inject Custom Transcriptions: Use the

before_llm_cb

callback to inject your transcriptions into the chat context. This callback allows you to modify the chat context before it is sent to the LLM. 4. Example Code:

Copy code

from livekit.agents import VoicePipelineAgent

   class CustomVoicePipelineAgent(VoicePipelineAgent):
       def __init__(self, vad, llm, tts, chat_ctx=None, **kwargs):
           super().__init__(vad=vad, stt=None, llm=llm, tts=tts, chat_ctx=chat_ctx, **kwargs)

   def custom_before_llm_cb(agent, chat_ctx):
       # Inject your transcription here
       chat_ctx.append(role="user", text="Your custom transcription")
       return chat_ctx

   # Usage
   agent = CustomVoicePipelineAgent(
       vad=silero.VAD.load(),
       llm=openai.LLM(),
       tts=cartesia.TTS(),
       before_llm_cb=custom_before_llm_cb
   )

5. Test and Validate: Ensure that your custom agent correctly processes the transcriptions from your system and interacts with the LLM and TTS as expected. This approach allows you to leverage your existing transcription system while still utilizing LiveKit's LLM and TTS capabilities. If you encounter any specific issues during implementation, feel free to ask for further assistance. Sources: - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1729597029.187099 - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1729541882.418149 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

Open in Slack

Previous Next