I want to use a before_tts_cb to only respond with...
# ask-ai
a
I want to use a before_tts_cb to only respond with what is between some xml tags. (e.g. <response></response>). How can I apply that? Is there a tokenizer function to do so?
g
To achieve this, you can use the
before_tts_cb
callback to parse the text and extract content between specific XML tags before sending it to the Text-to-Speech (TTS) system. You can implement a custom function to handle this parsing. Here's a basic example:
Copy code
async def before_tts_cb(assistant, text):
    if isinstance(text, str):
        return extract_response_content(text)
    else:
        async def _process_stream():
            async for chunk in text:
                yield extract_response_content(chunk)
        return _process_stream()

def extract_response_content(text):
    import re
    match = re.search(r'<response>(.*?)</response>', text, re.DOTALL)
    return match.group(1) if match else ''
This code uses a regular expression to find and extract content between
<response>
and
</response>
tags. The
before_tts_cb
is set up to handle both string and asynchronous iterable inputs, which is necessary for handling streamed text. Would you like more details on how to integrate this into your LiveKit setup? Sources: - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1730644646.046579 - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1733426789.731689 👍 1 button 👎 0 button 💡 Hint: Mention @millions-hair-24956 in the thread for followups.
a
I need it to just say what is between response tags. I have other tags like thinking that it is still saying and i need it to not say them. Also, Inside that function i also need to do this to the end result: def *replace_variables*(_content_: str) -> str: return tokenize.utils.replace_words( text=content, replacements={ "monto_deuda": amount_in_words, "nombre_deudor": name, } ) @millions-hair-24956
g
It might be possible to achieve this by combining the extraction of content between
<response>
tags with the replacement of specific variables using the
replace_words
function. Here's how you could implement this:
Copy code
from livekit.agents.tokenize.utils import replace_words

async def before_tts_cb(assistant, text):
    if isinstance(text, str):
        content = extract_response_content(text)
        return replace_variables(content)
    else:
        async def _process_stream():
            async for chunk in text:
                content = extract_response_content(chunk)
                yield replace_variables(content)
        return _process_stream()

def extract_response_content(text):
    import re
    match = re.search(r'<response>(.*?)</response>', text, re.DOTALL)
    return match.group(1) if match else ''

def replace_variables(content):
    return replace_words(
        text=content,
        replacements={
            "monto_deuda": "amount_in_words",
            "nombre_deudor": "name",
        }
    )
This code first extracts the content between
<response>
tags and then replaces specified variables using the
replace_words
function. The
before_tts_cb
function is designed to handle both string and asynchronous iterable inputs. Would you like more detailed guidance on integrating this into your LiveKit setup? Sources: - https://docs.livekit.io/python/livekit/agents/tokenize/utils.html - https://docs.livekit.io/python/livekit/agents/pipeline/pipeline_agent.html#args Note: I wasn’t able to find highly relevant data sources, but above are a few potentially relevant links. 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
a
@millions-hair-24956 since the streaming audio does not have the complete tags it is just saying the full text
g
It seems likely that the issue arises because the text is being processed in chunks, and the
<response>
tags might not be fully contained within a single chunk. To handle this, you can buffer the incoming text until a complete
<response>
tag is detected. Here's an approach to achieve this:
Copy code
from livekit.agents.tokenize.utils import replace_words

async def before_tts_cb(assistant, text):
    buffer = ""

    async def _process_stream():
        nonlocal buffer
        async for chunk in text:
            buffer += chunk
            while True:
                start_idx = buffer.find('<response>')
                end_idx = buffer.find('</response>', start_idx)
                if start_idx != -1 and end_idx != -1:
                    response_content = buffer[start_idx + 10:end_idx]
                    buffer = buffer[end_idx + 11:]
                    yield replace_variables(response_content)
                else:
                    break

    return _process_stream()

def replace_variables(content):
    return replace_words(
        text=content,
        replacements={
            "monto_deuda": "amount_in_words",
            "nombre_deudor": "name",
        }
    )
This code buffers the incoming text and processes it only when a complete
<response>
tag is detected. It then extracts the content and applies the variable replacements. This should ensure that only the content within
<response>
tags is spoken. Would you like more details on how to implement this in your LiveKit setup? Sources: - https://docs.livekit.io/python/livekit/agents/pipeline/index.html#args - https://docs.livekit.io/python/livekit/agents/voice_assistant/index.html#args 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.