flaky-hospital-13625
08/14/2025, 5:40 AMtall-airline-16756
08/14/2025, 5:40 AMtall-airline-16756
08/14/2025, 5:40 AMparallel_tool_calls
parameter in your LLM setup along with Python's handy asyncio
library for asynchronous execution. Let me walk you through it:
1. Enable Parallel Tool Calls: First up, when you create your LLM instance, just set parallel_tool_calls=True
. This lets you run multiple tool calls at the same time, which is super handy:
from livekit.plugins import openai
llm = openai.LLM.with_fireworks(
model="accounts/fireworks/models/llama-v3p3-70b-instruct",
temperature=0.7,
parallel_tool_calls=True # Enables parallel tool calls
)
2. Define Asynchronous Functions for LLM Calls: Next, create async functions to handle each LLM call individually. This keeps things neat and efficient:
async def perform_llm_call(prompt: str):
async with await llm.agents_llm.chat(messages=[{"role": "user", "content": prompt}]) as chat:
response = ""
async for chunk in chat:
if chunk.content:
response += chunk.content
return response
3. Execute Multiple Calls in Parallel: Now, use asyncio.gather()
to run all those async calls together. It’s like having a team of helpers working at once:
import asyncio
async def main():
prompts = ["Question 1", "Question 2", "Question 3"]
tasks = [perform_llm_call(prompt) for prompt in prompts]
results = await asyncio.gather(*tasks)
return results
if __name__ == "__main__":
results = asyncio.run(main())
print(results) # Contains responses from all prompts
And that’s it! This way, you’re making the most of both the LLM’s parallel capabilities and Python’s async features to get things done efficiently. Just make sure your environment supports async execution, and keep an eye on resource usage so everything runs smoothly. If you have any questions or want to chat more about this, I’m here to help!
Sources: Fireworks AI LLM integration guide | LiveKit Docs | livekit-plugins/livekit-plugins-baseten/livekit/plugins/baseten/llm.py