how to make a parallel llm call from a livekit pyt...
# ask-ai
f
how to make a parallel llm call from a livekit python worker sub-process
t
Thanks for reaching out, Amjad. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
Hey there! If you want to make parallel LLM calls from a LiveKit Python worker sub-process, you can totally do that using the
parallel_tool_calls
parameter in your LLM setup along with Python's handy
asyncio
library for asynchronous execution. Let me walk you through it: 1. Enable Parallel Tool Calls: First up, when you create your LLM instance, just set
parallel_tool_calls=True
. This lets you run multiple tool calls at the same time, which is super handy:
Copy code
from livekit.plugins import openai

llm = openai.LLM.with_fireworks(
    model="accounts/fireworks/models/llama-v3p3-70b-instruct",
    temperature=0.7,
    parallel_tool_calls=True  # Enables parallel tool calls
)
2. Define Asynchronous Functions for LLM Calls: Next, create async functions to handle each LLM call individually. This keeps things neat and efficient:
Copy code
async def perform_llm_call(prompt: str):
    async with await llm.agents_llm.chat(messages=[{"role": "user", "content": prompt}]) as chat:
        response = ""
        async for chunk in chat:
            if chunk.content:
                response += chunk.content
        return response
3. Execute Multiple Calls in Parallel: Now, use
asyncio.gather()
to run all those async calls together. It’s like having a team of helpers working at once:
Copy code
import asyncio

async def main():
    prompts = ["Question 1", "Question 2", "Question 3"]
    tasks = [perform_llm_call(prompt) for prompt in prompts]
    results = await asyncio.gather(*tasks)
    return results

if __name__ == "__main__":
    results = asyncio.run(main())
    print(results)  # Contains responses from all prompts
And that’s it! This way, you’re making the most of both the LLM’s parallel capabilities and Python’s async features to get things done efficiently. Just make sure your environment supports async execution, and keep an eye on resource usage so everything runs smoothly. If you have any questions or want to chat more about this, I’m here to help! Sources: Fireworks AI LLM integration guide | LiveKit Docs | livekit-plugins/livekit-plugins-baseten/livekit/plugins/baseten/llm.py