how to make a parallel llm call from a livekit python worker LiveKit Community #ask-ai

Join Slack

how to make a parallel llm call from a livekit pyt...

# ask-ai

flaky-hospital-13625

08/14/2025, 5:40 AM

how to make a parallel llm call from a livekit python worker sub-process

tall-airline-16756

08/14/2025, 5:40 AM

Thanks for reaching out, Amjad. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

08/14/2025, 5:40 AM

Hey there! If you want to make parallel LLM calls from a LiveKit Python worker sub-process, you can totally do that using the

parallel_tool_calls

parameter in your LLM setup along with Python's handy

asyncio

library for asynchronous execution. Let me walk you through it: 1. Enable Parallel Tool Calls: First up, when you create your LLM instance, just set

parallel_tool_calls=True

. This lets you run multiple tool calls at the same time, which is super handy:

Copy code

from livekit.plugins import openai

llm = openai.LLM.with_fireworks(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    temperature=0.7,
    parallel_tool_calls=True  # Enables parallel tool calls
)

2. Define Asynchronous Functions for LLM Calls: Next, create async functions to handle each LLM call individually. This keeps things neat and efficient:

Copy code

async def perform_llm_call(prompt: str):
    async with await llm.agents_llm.chat(messages=[{"role": "user", "content": prompt}]) as chat:
        response = ""
        async for chunk in chat:
            if chunk.content:
                response += chunk.content
        return response

3. Execute Multiple Calls in Parallel: Now, use

asyncio.gather()

to run all those async calls together. It’s like having a team of helpers working at once:

Copy code

import asyncio

async def main():
    prompts = ["Question 1", "Question 2", "Question 3"]
    tasks = [perform_llm_call(prompt) for prompt in prompts]
    results = await asyncio.gather(*tasks)
    return results

if __name__ == "__main__":
    results = asyncio.run(main())
    print(results)  # Contains responses from all prompts

And that’s it! This way, you’re making the most of both the LLM’s parallel capabilities and Python’s async features to get things done efficiently. Just make sure your environment supports async execution, and keep an eye on resource usage so everything runs smoothly. If you have any questions or want to chat more about this, I’m here to help! Sources: Fireworks AI LLM integration guide | LiveKit Docs | livekit-plugins/livekit-plugins-baseten/livekit/plugins/baseten/llm.py

Open in Slack

Previous Next