how do cloud agents work? It's essentially hosted ...
# ask-ai
r
how do cloud agents work? It's essentially hosted / managed agent deployments right? How do I trigger a job for an outbound call agent that's hosted in this way?
t
Thanks for reaching out, Trevor. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
Cloud agents in LiveKit are pretty cool—they’re programmable, stateful processes that connect to a LiveKit server to help out with things like voice interactions or telephony. If you’re working with outbound call agents, you can kick off a job by creating a dispatch. This starts the agent’s job subprocess, letting it join a room and make the call. Here’s a simple way to trigger a job for an outbound call agent: 1. First, make sure your agent is up and running and registered with the LiveKit server. 2. Next, use the LiveKit CLI to dispatch the agent, including metadata like the phone number and any other details you want to pass along. 3. The agent will then grab that metadata from the job context and start the outbound call using the
CreateSIPParticipant
method. For example, to dispatch the agent via CLI, you’d run:
Copy code
lk dispatch create \
  --new-room \
  --agent-name outbound-caller \
  --metadata '{"phone_number": "+1234567890", "transfer_to": "+9876543210"}'
This command creates a new room, dispatches the agent named "outbound-caller," and passes along the phone number to call in the metadata. Then, the agent uses that info to place the call through your configured SIP trunk. Sources: Making calls using SIP | LiveKit Docs | agent.py | LiveKit Docs
r
I could do this through the api, right? How do I get the transcript out?
t
Hey there! You can easily trigger a dispatch for an outbound call agent using the LiveKit API. Here’s a quick rundown: 1. Start by using the
agent_dispatch.create_dispatch
method to get the agent going, including some metadata like the phone number you want to call. 2. Once your agent is dispatched, it’ll use the
createSIPParticipant
method to place the outbound call through your configured SIP trunk. Here’s a simple example in Python to illustrate:
Copy code
dispatch = await lkapi.agent_dispatch.create_dispatch(
    api.CreateAgentDispatchRequest(
        agent_name="outbound-caller",
        room="my-room",
        metadata='{"phone_number": "+1234567890"}'
    )
)

await lkapi.sip.create_sip_participant(
    api.CreateSIPParticipantRequest(
        room_name="my-room",
        sip_trunk_id="ST_xxxx",
        sip_call_to="+1234567890",
        participant_identity="caller"
    )
)
About transcripts — LiveKit doesn’t offer a direct API to grab them after the call ends. Instead, transcription happens live through the agent using a speech-to-text (STT) plugin like Deepgram, OpenAI, or Gladia. This plugin streams transcription data during the call, and you’ll get the final transcripts as messages marked with
is_final: true
. Here’s how you can get those transcripts: 1. Build some logic in your agent to catch and save the final utterances coming from the STT plugin. 2. Store those transcripts somewhere safe — a database, a file, or any external service — while the call is happening. For example, you might handle final transcripts in your agent like this:
Copy code
async for message in stt_stream:
    if message.type == "transcript" and message.data.is_final:
        transcript += message.data.utterance
    elif message.type == "post_final_transcript":
        await save_to_storage(transcript)
        break
If you want me to walk you through any part of this or need more details, just let me know — I’m here to help! Sources: livekit-plugins/livekit-plugins-gladia/livekit/plugins/gladia/stt.py | README.md | telephony/make_call/make_call.py | packages/sip--make-a-call/README.md | Making calls using SIP | LiveKit Docs
r
Does livekit have its own host that runs our agent? and can send the transcript to my api?
t
Just a quick heads-up: LiveKit doesn’t directly host agents unless you’re using LiveKit Cloud, which takes care of managing and scaling your agent deployments for you. If you prefer, you can always self-host agents on your own setup. About sending transcripts to your API—LiveKit has your back with real-time transcription through speech-to-text (STT) plugins. You can add your own custom logic in your agent to forward those transcripts to an external API. Here’s a simple way to do it: • Tap into the
user_input_transcribed
event or the
conversation_item_added
callback to grab transcriptions as they happen. • Then, send those transcripts to your API using HTTP requests or webhooks. Plus, if you want, you can grab the full conversation history using
session.history
and send it over once the session wraps up. If you’d like, I’m here to help you get this all set up—just let me know! Sources: Transcriptions | LiveKit Docs | Transcription Applications | livekit/livekit_composite | DeepWiki | Session recording and transcripts | LiveKit Docs | LiveKit | AI voice agents | LiveKit Docs
r
ok got it, so agents hosted on livekit cloud would just need to send the data over to my api
t
That's right! If your agents are hosted on LiveKit Cloud, you can set things up so the agent sends transcription data or any other info straight to your API using HTTP requests or webhooks. This way, everything connects smoothly with your other systems. Feel free to reach out if you'd like some help with setting this up or if you have any other questions—I’m here for you!
r
so cloud agents is like a wrapper around compute, so i wouldn't need my own compute to run agents
t
Exactly! LiveKit Cloud takes care of all the heavy lifting by providing the infrastructure to host and manage your agents. That means you don’t have to worry about setting up or managing your own compute resources. It’s like having a managed environment where your agents can run smoothly, scale up when needed, and handle tasks like calls or transcriptions without a hitch. If you have any more questions or need a hand getting your agents set up, just let me know—I’m here to help!
r
on average, how many concurrent calls can a single agent have before needing another instance? I guess it depends on what each agent does
t
Just so you know, a single LiveKit agent instance usually handles 10 to 25 concurrent jobs when it's set up with 4 CPU cores and 8GB of memory. That said, the exact number really depends on what the agent is doing: • Audio-only agents: These can often support up to 30 concurrent sessions if the processing load is lighter. • Video agents: They usually handle fewer sessions since video tasks need more resources, especially with browser automation or video generation involved. • Agents with advanced features: If you’re using things like noise cancellation, voice activity detection (VAD), or real-time LLMs, expect concurrency to go down a bit. Good news—LiveKit Cloud has auto-scaling! So, new instances pop up automatically when the current ones get busy. If you want to plan things out carefully, tools like
lk load-test
can help you benchmark your specific agent setup. Feel free to reach out if you want a hand with load testing or figuring out the best scaling approach! Sources: Deploying to custom environments | LiveKit Docs | An open source stack for real-time multimodal AI | An open source stack for real-time multimodal AI | An open source stack for real-time multimodal AI | LiveKit Pricing