I want to use a before tts cb to only respond with what is b LiveKit Community #ask-ai

I want to use a before_tts_cb to only respond with...

ancient-judge-59849

02/24/2025, 5:38 PM

I want to use a before_tts_cb to only respond with what is between some xml tags. (e.g. <response></response>). How can I apply that? Is there a tokenizer function to do so?

gray-monkey-83525

02/24/2025, 5:38 PM

To achieve this, you can use the

before_tts_cb

callback to parse the text and extract content between specific XML tags before sending it to the Text-to-Speech (TTS) system. You can implement a custom function to handle this parsing. Here's a basic example:

Copy code

async def before_tts_cb(assistant, text):
    if isinstance(text, str):
        return extract_response_content(text)
    else:
        async def _process_stream():
            async for chunk in text:
                yield extract_response_content(chunk)
        return _process_stream()

def extract_response_content(text):
    import re
    match = re.search(r'<response>(.*?)</response>', text, re.DOTALL)
    return match.group(1) if match else ''

This code uses a regular expression to find and extract content between

<response>

and

</response>

tags. The

before_tts_cb

is set up to handle both string and asynchronous iterable inputs, which is necessary for handling streamed text. Would you like more details on how to integrate this into your LiveKit setup? Sources: - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1730644646.046579 - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1733426789.731689 👍 1 button 👎 0 button 💡 Hint: Mention @millions-hair-24956 in the thread for followups.

ancient-judge-59849

02/24/2025, 6:18 PM

I need it to just say what is between response tags. I have other tags like thinking that it is still saying and i need it to not say them. Also, Inside that function i also need to do this to the end result: def *replace_variables*(_content_: str) -> str: return tokenize.utils.replace_words( text=content, replacements={ "monto_deuda": amount_in_words, "nombre_deudor": name, } ) @millions-hair-24956

gray-monkey-83525

02/24/2025, 6:18 PM

It might be possible to achieve this by combining the extraction of content between

<response>

tags with the replacement of specific variables using the

replace_words

function. Here's how you could implement this:

Copy code

from livekit.agents.tokenize.utils import replace_words

async def before_tts_cb(assistant, text):
    if isinstance(text, str):
        content = extract_response_content(text)
        return replace_variables(content)
    else:
        async def _process_stream():
            async for chunk in text:
                content = extract_response_content(chunk)
                yield replace_variables(content)
        return _process_stream()

def extract_response_content(text):
    import re
    match = re.search(r'<response>(.*?)</response>', text, re.DOTALL)
    return match.group(1) if match else ''

def replace_variables(content):
    return replace_words(
        text=content,
        replacements={
            "monto_deuda": "amount_in_words",
            "nombre_deudor": "name",
        }
    )

This code first extracts the content between

<response>

tags and then replaces specified variables using the

replace_words

function. The

before_tts_cb

function is designed to handle both string and asynchronous iterable inputs. Would you like more detailed guidance on integrating this into your LiveKit setup? Sources: - https://docs.livekit.io/python/livekit/agents/tokenize/utils.html - https://docs.livekit.io/python/livekit/agents/pipeline/pipeline_agent.html#args Note: I wasn’t able to find highly relevant data sources, but above are a few potentially relevant links. 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

ancient-judge-59849

02/24/2025, 7:05 PM

@millions-hair-24956 since the streaming audio does not have the complete tags it is just saying the full text

gray-monkey-83525

02/24/2025, 7:05 PM

It seems likely that the issue arises because the text is being processed in chunks, and the

<response>

tags might not be fully contained within a single chunk. To handle this, you can buffer the incoming text until a complete

<response>

tag is detected. Here's an approach to achieve this:

Copy code

from livekit.agents.tokenize.utils import replace_words

async def before_tts_cb(assistant, text):
    buffer = ""

    async def _process_stream():
        nonlocal buffer
        async for chunk in text:
            buffer += chunk
            while True:
                start_idx = buffer.find('<response>')
                end_idx = buffer.find('</response>', start_idx)
                if start_idx != -1 and end_idx != -1:
                    response_content = buffer[start_idx + 10:end_idx]
                    buffer = buffer[end_idx + 11:]
                    yield replace_variables(response_content)
                else:
                    break

    return _process_stream()

def replace_variables(content):
    return replace_words(
        text=content,
        replacements={
            "monto_deuda": "amount_in_words",
            "nombre_deudor": "name",
        }
    )

This code buffers the incoming text and processes it only when a complete

<response>

tag is detected. It then extracts the content and applies the variable replacements. This should ensure that only the content within

<response>

tags is spoken. Would you like more details on how to implement this in your LiveKit setup? Sources: - https://docs.livekit.io/python/livekit/agents/pipeline/index.html#args - https://docs.livekit.io/python/livekit/agents/voice_assistant/index.html#args 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

Open in Slack

Previous Next