Hey I have to toggle between text and voice in the...
# ask-ai
s
Hey I have to toggle between text and voice in the client to allow text only chats with the agent or voice chats. I found an example in livekit's github:
Copy code
import logging

from dotenv import load_dotenv

from livekit import rtc
from livekit.agents import Agent, AgentSession, JobContext, RoomIO, WorkerOptions, cli
from livekit.plugins import openai

logger = logging.getLogger("toggle-io")
logger.setLevel(<http://logging.INFO|logging.INFO>)

load_dotenv()

## This example demonstrates a more complex application that allows the user to
## toggle audio and text input/output on the fly.
## The example makes use of LiveKit's RPC system to exchange messages between the
## client and the server.


async def entrypoint(ctx: JobContext):
    await ctx.connect()

    session = AgentSession(llm=openai.realtime.RealtimeModel())
    room_io = RoomIO(session, room=ctx.room)
    await room_io.start()
    await session.start(
        agent=Agent(
            instructions="You are a helpful assistant that interfaces with the user via voice."
        )
    )

    @ctx.room.local_participant.register_rpc_method("set_participant")
    async def on_set_participant(data: rtc.RpcInvocationData) -> None:
        target_identity = data.payload or data.caller_identity
        <http://logger.info|logger.info>(
            "set participant called",
            extra={
                "caller_identity": data.caller_identity,
                "payload": data.payload,
                "target_identity": target_identity,
            },
        )
        room_io.set_participant(target_identity)

    @ctx.room.local_participant.register_rpc_method("unset_participant")
    async def on_unset_participant(data: rtc.RpcInvocationData) -> None:
        <http://logger.info|logger.info>(
            "unset participant called",
            extra={"caller_identity": data.caller_identity, "payload": data.payload},
        )
        room_io.unset_participant()

    @ctx.room.local_participant.register_rpc_method("toggle_input")
    async def on_toggle_input(data: rtc.RpcInvocationData) -> None:
        <http://logger.info|logger.info>(
            "toggle input called",
            extra={"caller_identity": data.caller_identity, "payload": data.payload},
        )
        if data.payload == "audio_on":
            session.input.set_audio_enabled(True)
        elif data.payload == "audio_off":
            session.input.set_audio_enabled(False)

    @ctx.room.local_participant.register_rpc_method("toggle_output")
    async def on_toggle_output(data: rtc.RpcInvocationData) -> None:
        <http://logger.info|logger.info>(
            "toggle output called",
            extra={"caller_identity": data.caller_identity, "payload": data.payload},
        )
        if data.payload == "audio_on":
            session.output.set_audio_enabled(True)
        elif data.payload == "audio_off":
            session.output.set_audio_enabled(False)
        elif data.payload == "transcription_on":
            session.output.set_transcription_enabled(True)
        elif data.payload == "transcription_off":
            session.output.set_transcription_enabled(False)


if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
how should I manage this in the client?
t
Thanks for reaching out, Roman. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
The client should manage toggling between text and voice input/output by calling the RPC methods registered on the server with appropriate payloads: • To toggle voice input, call the "toggle_input" RPC with payloads "audio_on" or "audio_off" to enable or disable audio input. • To toggle voice output, call the "toggle_output" RPC with payloads "audio_on" or "audio_off" to enable or disable audio output. • To toggle transcription (text output), call the "toggle_output" RPC with payloads "transcription_on" or "transcription_off" to enable or disable transcription. This way, the client sends commands via RPC to control audio and text modes dynamically in the LiveKit session. Example: If the client wants to enable audio input, it would invoke the "toggle_input" RPC with "audio_on" as payload. This mechanism follows directly from the server example you shared. Sources: examples/voice_agents/push_to_talk.py
s
what are the methods used in livekit sdk client to do this?
t
Hey there! Let me break down how the LiveKit SDK client handles audio and transcription settings in a super friendly way. • toggle_input with payloads
"audio_on"
or
"audio_off"
to enable or disable audio input. • toggle_output with payloads
"audio_on"
or
"audio_off"
to enable or disable audio output. • toggle_output with payloads
"transcription_on"
or
"transcription_off"
to enable or disable transcription output. Think of these RPC calls like quick switches that let you control your audio and transcription settings. Behind the scenes, they're translating to server-side methods like session.input.set_audio_enabled(true/false) and session.output.set_audio_enabled(true/false), and even handling transcription with session.output.set_transcription_enabled(true/false). The cool part? You can change these settings on the fly during your session, giving you total flexibility. Pretty neat, right? 😊 Sources: examples/voice_agents/push_to_talk.py
s
How can I do this in iOS using the LiveKit SDK in Swift. Give me an example.
t
I don't have the answer you're looking for. You could also try asking your question: • in one of the other Slack channels or • to https://deepwiki.com/livekit/livekit_composite which is trained on all LiveKit source code If you find the answer, please post it here to help others!