< millions hair 24956> How to use a CUSTOM LLM Server for my LiveKit Community #ask-ai

Join Slack

<@U084APK40NS> How to use a CUSTOM LLM Server for ...

# ask-ai

helpful-sugar-58738

02/25/2025, 9:30 PM

@millions-hair-24956 How to use a CUSTOM LLM Server for my livekit agent

gray-monkey-83525

02/25/2025, 9:30 PM

To use a custom LLM server with your LiveKit agent, you can create a custom LLM class that integrates with your server's API. This involves defining a new LLM class that extends the existing LLM framework provided by LiveKit. You can then configure this custom LLM to handle your specific business logic and API calls. 1. Create a Custom LLM Class: Define a class that inherits from

llm.LLM

and implement the necessary methods to interact with your custom LLM server. This includes setting up the API client and handling requests and responses. 2. Integrate with LiveKit: Use this custom LLM class in your LiveKit agent setup. You can pass it as the

llm

parameter to the

VoicePipelineAgent

or any other agent that requires LLM integration. For more detailed guidance, you can refer to the LiveKit documentation on creating custom LLM integrations and the example code provided in the LiveKit community discussions. Would you like more detailed steps or a code example? Sources: - LiveKit documentation - example code 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

helpful-sugar-58738

02/25/2025, 9:31 PM

@millions-hair-24956 yes i want a detailed steps and a code example

gray-monkey-83525

02/25/2025, 9:31 PM

To integrate a custom LLM server with your LiveKit agent, follow these detailed steps: Step-by-Step Guide 1. Define a Custom LLM Class: - Create a new class that inherits from

llm.LLM

. - Implement the

chat

method to handle communication with your custom LLM server. 2. Set Up API Client: - Use an HTTP client like

aiohttp

to send requests to your LLM server. - Ensure you handle authentication and any required headers. 3. Override the
chat
Method: - In the

chat

method, send the input text to your LLM server and receive the response. - Wrap the response in a

ChatChunk

object and return it. 4. Integrate with LiveKit: - Use your custom LLM class in the

VoicePipelineAgent

or any other agent setup. Code Example Here's a basic example of how you might implement this:

Copy code

import aiohttp
from livekit.agents import llm
from livekit.agents.pipeline import VoicePipelineAgent

class CustomLLM(llm.LLM):
    def __init__(self, api_url, api_key):
        super().__init__()
        self.api_url = api_url
        self.api_key = api_key
        self.client = aiohttp.ClientSession()

    async def chat(self, *, chat_ctx: llm.ChatContext, fnc_ctx: llm.FunctionContext = None, temperature: float = None, n: int = 1, parallel_tool_calls: bool = None) -> llm.LLMStream:
        # Prepare the request payload
        payload = {
            "input": chat_ctx.messages[-1].text,
            "temperature": temperature or 0.5
        }
        headers = {
            "Authorization": f"Bearer {self.api_key}"
        }
        
        # Send request to the custom LLM server
        async with self.client.post(self.api_url, json=payload, headers=headers) as response:
            if response.status == 200:
                data = await response.json()
                # Create a ChatChunk from the response
                chat_chunk = llm.ChatChunk(request_id="custom_llm", choices=[llm.Choice(delta=llm.ChoiceDelta(role="assistant", content=data['output']))])
                return llm.LLMStream(chat_chunk)
            else:
                raise Exception(f"Failed to get response from LLM server: {response.status}")

    async def aclose(self):
        await self.client.close()

# Usage in a VoicePipelineAgent
custom_llm = CustomLLM(api_url="https://your-llm-server.com/api", api_key="your_api_key")
agent = VoicePipelineAgent(
    vad=silero.VAD.load(),
    stt=deepgram.STT(model="nova-2-general"),
    llm=custom_llm,
    tts=cartesia.TTS(),
    chat_ctx=llm.ChatContext().append(role="system", text="Your initial prompt"),
)

gray-monkey-83525

02/25/2025, 9:31 PM

Explanation - CustomLLM Class: This class handles the interaction with your custom LLM server. It uses

aiohttp

to send requests and receive responses. - chat Method: This method sends the latest message from the

ChatContext

to the LLM server and processes the response. - Integration: The

CustomLLM

instance is used in the

VoicePipelineAgent

, allowing it to process conversations using your custom LLM. This setup allows you to integrate any LLM server that provides an API, giving you flexibility in how you handle language processing. Sources: - Example code 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

helpful-sugar-58738

02/25/2025, 10:50 PM

@millions-hair-24956 im getting an error in this line return llm.LLMStream(chat_chunk) that can initizalize a abtract class with abstract method

gray-monkey-83525

02/25/2025, 10:50 PM

It seems likely that the error you're encountering is due to attempting to instantiate an abstract class directly. The

LLMStream

class is abstract and requires an implementation of its abstract methods before it can be instantiated. You need to create a subclass of

LLMStream

and implement the required abstract methods. Step-by-Step Solution 1. Create a Subclass of
LLMStream
: - Define a new class that inherits from

llm.LLMStream

. - Implement the abstract methods, such as

_main_task

. 2. Implement the
_main_task
Method: - This method should handle the logic for processing the LLM response stream. Code Example Here's how you can modify the code to include a custom

LLMStream

subclass:

Copy code

import aiohttp
from livekit.agents import llm
from livekit.agents.pipeline import VoicePipelineAgent

class CustomLLMStream(llm.LLMStream):
    def __init__(self, llm, chat_ctx, fnc_ctx, response_content):
        super().__init__(llm=llm, chat_ctx=chat_ctx, fnc_ctx=fnc_ctx)
        self.response_content = response_content

    async def _main_task(self):
        # Send the response content as a ChatChunk
        chat_chunk = llm.ChatChunk(
            request_id="custom_llm",
            choices=[llm.Choice(delta=llm.ChoiceDelta(role="assistant", content=self.response_content))]
        )
        self._event_ch.send_nowait(chat_chunk)

class CustomLLM(llm.LLM):
    def __init__(self, api_url, api_key):
        super().__init__()
        self.api_url = api_url
        self.api_key = api_key
        self.client = aiohttp.ClientSession()

    async def chat(self, *, chat_ctx: llm.ChatContext, fnc_ctx: llm.FunctionContext = None, temperature: float = None, n: int = 1, parallel_tool_calls: bool = None) -> llm.LLMStream:
        # Prepare the request payload
        payload = {
            "input": chat_ctx.messages[-1].text,
            "temperature": temperature or 0.5
        }
        headers = {
            "Authorization": f"Bearer {self.api_key}"
        }
        
        # Send request to the custom LLM server
        async with self.client.post(self.api_url, json=payload, headers=headers) as response:
            if response.status == 200:
                data = await response.json()
                # Return a CustomLLMStream instance
                return CustomLLMStream(self, chat_ctx, fnc_ctx, data['output'])
            else:
                raise Exception(f"Failed to get response from LLM server: {response.status}")

    async def aclose(self):
        await self.client.close()

# Usage in a VoicePipelineAgent
custom_llm = CustomLLM(api_url="https://your-llm-server.com/api", api_key="your_api_key")
agent = VoicePipelineAgent(
    vad=silero.VAD.load(),
    stt=deepgram.STT(model="nova-2-general"),
    llm=custom_llm,
    tts=cartesia.TTS(),
    chat_ctx=llm.ChatContext().append(role="system", text="Your initial prompt"),
)

gray-monkey-83525

02/25/2025, 10:51 PM

Explanation - CustomLLMStream Class: This subclass of

LLMStream

implements the

_main_task

method, which sends the response content as a

ChatChunk

. - chat Method: The

chat

method in

CustomLLM

now returns an instance of

CustomLLMStream

, ensuring that the abstract methods are implemented. This approach should resolve the error and allow you to use your custom LLM server with LiveKit. Sources: - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1732699668.482419 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

helpful-sugar-58738

02/25/2025, 10:52 PM

@millions-hair-24956 im getting an error in this line return llm.LLMStream(chat_chunk) that can initizalize a abtract class with abstract method _run

gray-monkey-83525

02/25/2025, 10:52 PM

The error you're encountering is because

LLMStream

is an abstract class and cannot be instantiated directly. You need to create a subclass of

LLMStream

and implement the abstract method

_main_task

. Solution 1. Create a Subclass of
LLMStream
: - Define a new class that inherits from

llm.LLMStream

. - Implement the

_main_task

method, which is required by the abstract class. 2. Implement the
_main_task
Method: - This method should handle the logic for processing the LLM response stream. Code Example Here's how you can modify the code to include a custom

LLMStream

subclass:

Copy code

import aiohttp
from livekit.agents import llm
from livekit.agents.pipeline import VoicePipelineAgent

class CustomLLMStream(llm.LLMStream):
    def __init__(self, llm, chat_ctx, fnc_ctx, response_content):
        super().__init__(llm=llm, chat_ctx=chat_ctx, fnc_ctx=fnc_ctx)
        self.response_content = response_content

    async def _main_task(self):
        # Send the response content as a ChatChunk
        chat_chunk = llm.ChatChunk(
            request_id="custom_llm",
            choices=[llm.Choice(delta=llm.ChoiceDelta(role="assistant", content=self.response_content))]
        )
        self._event_ch.send_nowait(chat_chunk)

class CustomLLM(llm.LLM):
    def __init__(self, api_url, api_key):
        super().__init__()
        self.api_url = api_url
        self.api_key = api_key
        self.client = aiohttp.ClientSession()

    async def chat(self, *, chat_ctx: llm.ChatContext, fnc_ctx: llm.FunctionContext = None, temperature: float = None, n: int = 1, parallel_tool_calls: bool = None) -> llm.LLMStream:
        # Prepare the request payload
        payload = {
            "input": chat_ctx.messages[-1].text,
            "temperature": temperature or 0.5
        }
        headers = {
            "Authorization": f"Bearer {self.api_key}"
        }
        
        # Send request to the custom LLM server
        async with self.client.post(self.api_url, json=payload, headers=headers) as response:
            if response.status == 200:
                data = await response.json()
                # Return a CustomLLMStream instance
                return CustomLLMStream(self, chat_ctx, fnc_ctx, data['output'])
            else:
                raise Exception(f"Failed to get response from LLM server: {response.status}")

    async def aclose(self):
        await self.client.close()

# Usage in a VoicePipelineAgent
custom_llm = CustomLLM(api_url="https://your-llm-server.com/api", api_key="your_api_key")
agent = VoicePipelineAgent(
    vad=silero.VAD.load(),
    stt=deepgram.STT(model="nova-2-general"),
    llm=custom_llm,
    tts=cartesia.TTS(),
    chat_ctx=llm.ChatContext().append(role="system", text="Your initial prompt"),
)

gray-monkey-83525

02/25/2025, 10:53 PM

Explanation - CustomLLMStream Class: This subclass of

LLMStream

implements the

_main_task

method, which sends the response content as a

ChatChunk

. - chat Method: The

chat

method in

CustomLLM

now returns an instance of

CustomLLMStream

, ensuring that the abstract methods are implemented. This approach should resolve the error and allow you to use your custom LLM server with LiveKit. Sources: - LiveKit documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

21 Views

Open in Slack

Previous Next