<@U084APK40NS> How to use a CUSTOM LLM Server for ...
# ask-ai
h
@millions-hair-24956 How to use a CUSTOM LLM Server for my livekit agent
g
To use a custom LLM server with your LiveKit agent, you can create a custom LLM class that integrates with your server's API. This involves defining a new LLM class that extends the existing LLM framework provided by LiveKit. You can then configure this custom LLM to handle your specific business logic and API calls. 1. Create a Custom LLM Class: Define a class that inherits from
llm.LLM
and implement the necessary methods to interact with your custom LLM server. This includes setting up the API client and handling requests and responses. 2. Integrate with LiveKit: Use this custom LLM class in your LiveKit agent setup. You can pass it as the
llm
parameter to the
VoicePipelineAgent
or any other agent that requires LLM integration. For more detailed guidance, you can refer to the LiveKit documentation on creating custom LLM integrations and the example code provided in the LiveKit community discussions. Would you like more detailed steps or a code example? Sources: - LiveKit documentation - example code 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
h
@millions-hair-24956 yes i want a detailed steps and a code example
g
To integrate a custom LLM server with your LiveKit agent, follow these detailed steps: Step-by-Step Guide 1. Define a Custom LLM Class: - Create a new class that inherits from
llm.LLM
. - Implement the
chat
method to handle communication with your custom LLM server. 2. Set Up API Client: - Use an HTTP client like
aiohttp
to send requests to your LLM server. - Ensure you handle authentication and any required headers. 3. Override the
chat
Method
: - In the
chat
method, send the input text to your LLM server and receive the response. - Wrap the response in a
ChatChunk
object and return it. 4. Integrate with LiveKit: - Use your custom LLM class in the
VoicePipelineAgent
or any other agent setup. Code Example Here's a basic example of how you might implement this:
Copy code
import aiohttp
from livekit.agents import llm
from livekit.agents.pipeline import VoicePipelineAgent

class CustomLLM(llm.LLM):
    def __init__(self, api_url, api_key):
        super().__init__()
        self.api_url = api_url
        self.api_key = api_key
        self.client = aiohttp.ClientSession()

    async def chat(self, *, chat_ctx: llm.ChatContext, fnc_ctx: llm.FunctionContext = None, temperature: float = None, n: int = 1, parallel_tool_calls: bool = None) -> llm.LLMStream:
        # Prepare the request payload
        payload = {
            "input": chat_ctx.messages[-1].text,
            "temperature": temperature or 0.5
        }
        headers = {
            "Authorization": f"Bearer {self.api_key}"
        }
        
        # Send request to the custom LLM server
        async with self.client.post(self.api_url, json=payload, headers=headers) as response:
            if response.status == 200:
                data = await response.json()
                # Create a ChatChunk from the response
                chat_chunk = llm.ChatChunk(request_id="custom_llm", choices=[llm.Choice(delta=llm.ChoiceDelta(role="assistant", content=data['output']))])
                return llm.LLMStream(chat_chunk)
            else:
                raise Exception(f"Failed to get response from LLM server: {response.status}")

    async def aclose(self):
        await self.client.close()

# Usage in a VoicePipelineAgent
custom_llm = CustomLLM(api_url="https://your-llm-server.com/api", api_key="your_api_key")
agent = VoicePipelineAgent(
    vad=silero.VAD.load(),
    stt=deepgram.STT(model="nova-2-general"),
    llm=custom_llm,
    tts=cartesia.TTS(),
    chat_ctx=llm.ChatContext().append(role="system", text="Your initial prompt"),
)
Explanation - CustomLLM Class: This class handles the interaction with your custom LLM server. It uses
aiohttp
to send requests and receive responses. - chat Method: This method sends the latest message from the
ChatContext
to the LLM server and processes the response. - Integration: The
CustomLLM
instance is used in the
VoicePipelineAgent
, allowing it to process conversations using your custom LLM. This setup allows you to integrate any LLM server that provides an API, giving you flexibility in how you handle language processing. Sources: - Example code 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
h
@millions-hair-24956 im getting an error in this line return llm.LLMStream(chat_chunk) that can initizalize a abtract class with abstract method
g
It seems likely that the error you're encountering is due to attempting to instantiate an abstract class directly. The
LLMStream
class is abstract and requires an implementation of its abstract methods before it can be instantiated. You need to create a subclass of
LLMStream
and implement the required abstract methods. Step-by-Step Solution 1. Create a Subclass of
LLMStream
: - Define a new class that inherits from
llm.LLMStream
. - Implement the abstract methods, such as
_main_task
. 2. Implement the
_main_task
Method
: - This method should handle the logic for processing the LLM response stream. Code Example Here's how you can modify the code to include a custom
LLMStream
subclass:
Copy code
import aiohttp
from livekit.agents import llm
from livekit.agents.pipeline import VoicePipelineAgent

class CustomLLMStream(llm.LLMStream):
    def __init__(self, llm, chat_ctx, fnc_ctx, response_content):
        super().__init__(llm=llm, chat_ctx=chat_ctx, fnc_ctx=fnc_ctx)
        self.response_content = response_content

    async def _main_task(self):
        # Send the response content as a ChatChunk
        chat_chunk = llm.ChatChunk(
            request_id="custom_llm",
            choices=[llm.Choice(delta=llm.ChoiceDelta(role="assistant", content=self.response_content))]
        )
        self._event_ch.send_nowait(chat_chunk)

class CustomLLM(llm.LLM):
    def __init__(self, api_url, api_key):
        super().__init__()
        self.api_url = api_url
        self.api_key = api_key
        self.client = aiohttp.ClientSession()

    async def chat(self, *, chat_ctx: llm.ChatContext, fnc_ctx: llm.FunctionContext = None, temperature: float = None, n: int = 1, parallel_tool_calls: bool = None) -> llm.LLMStream:
        # Prepare the request payload
        payload = {
            "input": chat_ctx.messages[-1].text,
            "temperature": temperature or 0.5
        }
        headers = {
            "Authorization": f"Bearer {self.api_key}"
        }
        
        # Send request to the custom LLM server
        async with self.client.post(self.api_url, json=payload, headers=headers) as response:
            if response.status == 200:
                data = await response.json()
                # Return a CustomLLMStream instance
                return CustomLLMStream(self, chat_ctx, fnc_ctx, data['output'])
            else:
                raise Exception(f"Failed to get response from LLM server: {response.status}")

    async def aclose(self):
        await self.client.close()

# Usage in a VoicePipelineAgent
custom_llm = CustomLLM(api_url="https://your-llm-server.com/api", api_key="your_api_key")
agent = VoicePipelineAgent(
    vad=silero.VAD.load(),
    stt=deepgram.STT(model="nova-2-general"),
    llm=custom_llm,
    tts=cartesia.TTS(),
    chat_ctx=llm.ChatContext().append(role="system", text="Your initial prompt"),
)
Explanation - CustomLLMStream Class: This subclass of
LLMStream
implements the
_main_task
method, which sends the response content as a
ChatChunk
. - chat Method: The
chat
method in
CustomLLM
now returns an instance of
CustomLLMStream
, ensuring that the abstract methods are implemented. This approach should resolve the error and allow you to use your custom LLM server with LiveKit. Sources: - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1732699668.482419 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
h
@millions-hair-24956 im getting an error in this line return llm.LLMStream(chat_chunk) that can initizalize a abtract class with abstract method _run
g
The error you're encountering is because
LLMStream
is an abstract class and cannot be instantiated directly. You need to create a subclass of
LLMStream
and implement the abstract method
_main_task
. Solution 1. Create a Subclass of
LLMStream
: - Define a new class that inherits from
llm.LLMStream
. - Implement the
_main_task
method, which is required by the abstract class. 2. Implement the
_main_task
Method
: - This method should handle the logic for processing the LLM response stream. Code Example Here's how you can modify the code to include a custom
LLMStream
subclass:
Copy code
import aiohttp
from livekit.agents import llm
from livekit.agents.pipeline import VoicePipelineAgent

class CustomLLMStream(llm.LLMStream):
    def __init__(self, llm, chat_ctx, fnc_ctx, response_content):
        super().__init__(llm=llm, chat_ctx=chat_ctx, fnc_ctx=fnc_ctx)
        self.response_content = response_content

    async def _main_task(self):
        # Send the response content as a ChatChunk
        chat_chunk = llm.ChatChunk(
            request_id="custom_llm",
            choices=[llm.Choice(delta=llm.ChoiceDelta(role="assistant", content=self.response_content))]
        )
        self._event_ch.send_nowait(chat_chunk)

class CustomLLM(llm.LLM):
    def __init__(self, api_url, api_key):
        super().__init__()
        self.api_url = api_url
        self.api_key = api_key
        self.client = aiohttp.ClientSession()

    async def chat(self, *, chat_ctx: llm.ChatContext, fnc_ctx: llm.FunctionContext = None, temperature: float = None, n: int = 1, parallel_tool_calls: bool = None) -> llm.LLMStream:
        # Prepare the request payload
        payload = {
            "input": chat_ctx.messages[-1].text,
            "temperature": temperature or 0.5
        }
        headers = {
            "Authorization": f"Bearer {self.api_key}"
        }
        
        # Send request to the custom LLM server
        async with self.client.post(self.api_url, json=payload, headers=headers) as response:
            if response.status == 200:
                data = await response.json()
                # Return a CustomLLMStream instance
                return CustomLLMStream(self, chat_ctx, fnc_ctx, data['output'])
            else:
                raise Exception(f"Failed to get response from LLM server: {response.status}")

    async def aclose(self):
        await self.client.close()

# Usage in a VoicePipelineAgent
custom_llm = CustomLLM(api_url="https://your-llm-server.com/api", api_key="your_api_key")
agent = VoicePipelineAgent(
    vad=silero.VAD.load(),
    stt=deepgram.STT(model="nova-2-general"),
    llm=custom_llm,
    tts=cartesia.TTS(),
    chat_ctx=llm.ChatContext().append(role="system", text="Your initial prompt"),
)
Explanation - CustomLLMStream Class: This subclass of
LLMStream
implements the
_main_task
method, which sends the response content as a
ChatChunk
. - chat Method: The
chat
method in
CustomLLM
now returns an instance of
CustomLLMStream
, ensuring that the abstract methods are implemented. This approach should resolve the error and allow you to use your custom LLM server with LiveKit. Sources: - LiveKit documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.