< millions hair 24956> how to use my own custom LLM Server i LiveKit Community #ask-ai

Join Slack

<@U084APK40NS> how to use my own custom LLM Server...

# ask-ai

helpful-sugar-58738

02/25/2025, 6:20 PM

@millions-hair-24956 how to use my own custom LLM Server in Livekit

gray-monkey-83525

02/25/2025, 6:20 PM

To use your own custom LLM server with LiveKit, you need to create a custom LLM class that integrates with your server's API. According to LiveKit documentation, you can write a Python or Node.js application to define your agent, including custom plugins for LLMs. You can then configure the

VoicePipelineAgent

to use this custom LLM by passing it as the

llm

parameter. Here's a basic example of how you might structure a custom LLM class in Python:

Copy code

class CustomLLM(llm.LLM):
    def __init__(self, api_url, api_key):
        super().__init__()
        self.api_url = api_url
        self.api_key = api_key
        self.client = aiohttp.ClientSession()

    async def chat(self, chat_ctx, fnc_ctx=None, temperature=None, n=1, parallel_tool_calls=None):
        # Implement your API call here
        response = await self.client.post(self.api_url, json={"key": self.api_key, "context": chat_ctx})
        return response.json()

    async def aclose(self):
        await self.client.close()

Would you like more detailed guidance on integrating this with LiveKit's

VoicePipelineAgent

? Sources: - LiveKit documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

helpful-sugar-58738

02/25/2025, 6:24 PM

Yes I need more guidance

helpful-sugar-58738

02/25/2025, 6:25 PM

@millions-hair-24956 yes i need more guidance

gray-monkey-83525

02/25/2025, 6:25 PM

To integrate your custom LLM server with LiveKit's

VoicePipelineAgent

, follow these steps: 1. Create a Custom LLM Class: Define a class that inherits from

llm.LLM

and implements the

chat

method to interact with your LLM server. This method should handle sending requests to your server and processing the responses. 2. Initialize the Custom LLM: When setting up the

VoicePipelineAgent

, pass an instance of your custom LLM class as the

llm

parameter. 3. Configure the Agent: Set up the

VoicePipelineAgent

with the necessary components like VAD, STT, and TTS, along with your custom LLM. Here's a step-by-step breakdown with code: Step 1: Create a Custom LLM Class

Copy code

import aiohttp
from livekit.agents import llm

class CustomLLM(llm.LLM):
    def __init__(self, api_url, api_key):
        super().__init__()
        self.api_url = api_url
        self.api_key = api_key
        self.client = aiohttp.ClientSession()

    async def chat(self, chat_ctx, fnc_ctx=None, temperature=None, n=1, parallel_tool_calls=None):
        # Prepare the request payload
        payload = {
            "api_key": self.api_key,
            "context": chat_ctx.messages
        }
        # Send the request to your LLM server
        async with self.client.post(self.api_url, json=payload) as response:
            if response.status == 200:
                data = await response.json()
                # Process and return the response as an LLMStream
                return llm.LLMStream(data['response'])
            else:
                raise Exception(f"Failed to get response: {response.status}")

    async def aclose(self):
        await self.client.close()

Step 2: Initialize the Custom LLM

Copy code

custom_llm = CustomLLM(api_url="https://your-llm-server.com/api", api_key="your_api_key")

Step 3: Configure the VoicePipelineAgent

Copy code

from livekit.agents.pipeline import VoicePipelineAgent
from livekit.plugins import silero, deepgram, cartesia

# Initialize other components
vad = silero.VAD.load()
stt = deepgram.STT(model="nova-2-general")
tts = cartesia.TTS()

# Create the agent
agent = VoicePipelineAgent(
    vad=vad,
    stt=stt,
    llm=custom_llm,
    tts=tts,
    chat_ctx=llm.ChatContext().append(role="system", text="Your initial prompt"),
    allow_interruptions=True,
    interrupt_speech_duration=0.5,
    interrupt_min_words=0,
    min_endpointing_delay=0.5
)

# Start the agent in a room with a participant
# agent.start(room, participant)

gray-monkey-83525

02/25/2025, 6:25 PM

Explanation - Custom LLM Class: This class handles communication with your LLM server. The

chat

method sends the chat context to your server and processes the response. - VoicePipelineAgent: This agent is configured with your custom LLM and other necessary components like VAD, STT, and TTS. - Agent Initialization: The agent is initialized with the custom LLM and other components, and can be started in a LiveKit room. This setup allows you to use your custom LLM server within the LiveKit framework, leveraging the

VoicePipelineAgent

for voice interactions. Sources: - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1732128458.518759 - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1728788979.238089 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

helpful-sugar-58738

02/25/2025, 7:01 PM

@millions-hair-24956 I'm getting this error raise TypeError(f'Object of type {o.class.name} '

gray-monkey-83525

02/25/2025, 7:01 PM

It seems likely that you're encountering a

TypeError

related to an object that cannot be serialized, possibly when trying to convert it to JSON. This error often occurs when using Python's

json

module to serialize objects that are not natively serializable, such as custom classes or complex data types. To resolve this, you can implement a custom JSON encoder by subclassing

json.JSONEncoder

and overriding the

default

method to handle the serialization of your custom objects. Here's a basic example:

Copy code

import json

class CustomEncoder(json.JSONEncoder):
    def default(self, obj):
        # Check if the object is an instance of a custom class
        if isinstance(obj, YourCustomClass):
            return obj.to_dict()  # Convert the object to a dictionary
        # Call the base class method for other types
        return super().default(obj)

# Usage
data = {
    'key': YourCustomClass()  # Replace with your actual object
}

json_data = json.dumps(data, cls=CustomEncoder)

Would you like more detailed guidance on implementing this in your specific context? Sources: - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1732128458.518759 - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1731158203.654409 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

helpful-sugar-58738

02/25/2025, 7:07 PM

@millions-hair-24956 im getting this errorr 2025-02-25 150727,959 - ERROR livekit.agents.pipeline - Error in _synthesize_answer_task Traceback (most recent call last): File "/home/victor/Sundevs/live-kit-demo/livekit-demo/venv/lib/python3.10/site-packages/livekit/agents/utils/log.py", line 16, in async_fn_logs return await fn(*args, **kwargs) File "/home/victor/Sundevs/live-kit-demo/livekit-demo/venv/lib/python3.10/site-packages/livekit/agents/pipeline/pipeline_agent.py", line 755, in _synthesize_answer_task llm_stream = await llm_stream File "/home/victor/Sundevs/live-kit-demo/livekit-demo/agent.py", line 40, in chat async with self.client.post(self.api_url, json=payload) as response: File "/home/victor/Sundevs/live-kit-demo/livekit-demo/venv/lib/python3.10/site-packages/aiohttp/client.py", line 1425, in aenter self._resp: _RetType = await self._coro File "/home/victor/Sundevs/live-kit-demo/livekit-demo/venv/lib/python3.10/site-packages/aiohttp/client.py", line 521, in _request data = payload.JsonPayload(json, dumps=self._json_serialize) File "/home/victor/Sundevs/live-kit-demo/livekit-demo/venv/lib/python3.10/site-packages/aiohttp/payload.py", line 448, in init dumps(value).encode(encoding), File "/usr/lib/python3.10/json/__init__.py", line 231, in dumps return _default_encoder.encode(obj) File "/usr/lib/python3.10/json/encoder.py", line 199, in encode chunks = self.iterencode(o, _one_shot=True) File "/usr/lib/python3.10/json/encoder.py", line 257, in iterencode return _iterencode(o, 0) File "/usr/lib/python3.10/json/encoder.py", line 179, in default raise TypeError(f'Object of type {o.class.name} ' TypeError: Object of type ChatMessage is not JSON serializable

gray-monkey-83525

02/25/2025, 7:07 PM

The error you're encountering indicates that the

ChatMessage

object is not JSON serializable. This typically happens when you try to serialize an object that Python's

json

module doesn't know how to handle. To resolve this, you need to convert the

ChatMessage

object into a format that can be serialized, such as a dictionary. Here's how you can modify your code to handle this: 1. Convert
ChatMessage
to a Dictionary: Ensure that the

ChatMessage

class has a method to convert its instances to a dictionary. If it doesn't, you can add a method like

as_dict()

to do this. 2. Use the Conversion Method: Before sending the

ChatMessage

object in your API request, convert it to a dictionary. Here's an example of how you might implement this:

Copy code

# Assuming ChatMessage is a class you have defined
class ChatMessage:
    def __init__(self, message, role):
        self.message = message
        self.role = role

    def as_dict(self):
        return {
            "message": self.message,
            "role": self.role
        }

# When preparing the payload
chat_message = ChatMessage("Hello, how can I help you?", "user")
payload = {
    "api_key": "your_api_key",
    "context": [chat_message.as_dict()]  # Convert to dict
}

# Sending the request
async with self.client.post(self.api_url, json=payload) as response:
    # Handle the response

This approach ensures that the

ChatMessage

object is converted into a JSON-serializable format before being included in the request payload. Would you like more detailed guidance on implementing this in your specific context? Sources: - https://docs.livekit.io/python/livekit/rtc/chat.html#livekit.rtc.chat.ChatMessage 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

helpful-sugar-58738

02/25/2025, 8:53 PM

@millions-hair-24956 now im getting this error TypeError: Can't instantiate abstract class LLMStream with abstract method _run

gray-monkey-83525

02/25/2025, 8:53 PM

The error you're encountering indicates that you're trying to instantiate an abstract class

LLMStream

without providing an implementation for its abstract method

_run

. In Python, abstract classes are meant to be subclassed, and any abstract methods must be implemented in the subclass. Here's how you can resolve this: 1. Create a Subclass: Define a subclass of

LLMStream

and implement the

_run

method. This method should contain the logic for processing the LLM stream. 2. Implement the
_run
Method: Ensure that your implementation of

_run

handles the specific tasks you need, such as processing data from your LLM server. Here's an example of how you might implement this:

Copy code

from livekit.agents.llm import LLMStream

class CustomLLMStream(LLMStream):
    def __init__(self, llm, chat_ctx, fnc_ctx=None):
        super().__init__(llm=llm, chat_ctx=chat_ctx, fnc_ctx=fnc_ctx)

    async def _run(self):
        # Implement your logic here
        # For example, process the LLM response and yield results
        async for chunk in self._llm_response():
            # Process each chunk
            yield chunk

    async def _llm_response(self):
        # Simulate an LLM response stream
        # Replace this with actual logic to get data from your LLM server
        yield "response_chunk_1"
        yield "response_chunk_2"

# Usage
custom_stream = CustomLLMStream(llm=your_llm_instance, chat_ctx=your_chat_context)

Explanation - CustomLLMStream: This class inherits from

LLMStream

and provides an implementation for the

_run

method. - _run Method: This method should contain the logic for handling the LLM stream, such as processing responses from your LLM server. By implementing the

_run

method, you ensure that the

LLMStream

can be instantiated without errors. Would you like more detailed guidance on implementing this in your specific context? Sources: - LLMStream Documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

helpful-sugar-58738

02/25/2025, 8:59 PM

@millions-hair-24956 how should my custom LLM response be formatted so that it can be processed by livekit?

gray-monkey-83525

02/25/2025, 8:59 PM

To ensure that your custom LLM response can be processed by LiveKit, you need to format it in a way that aligns with the expected structure of a

ChatChunk

. According to the LiveKit documentation, a

ChatChunk

consists of a

request_id

, a list of

Choice

objects, and optional

CompletionUsage

data. Here's how you can format your LLM response: 1. Structure the Response: Your LLM response should be structured as a

ChatChunk

object. This includes a

request_id

to uniquely identify the request, a list of

Choice

objects that contain the response content, and optionally, usage statistics. 2. Create
Choice
Objects: Each

Choice

object should include a

ChoiceDelta

with the role (e.g., "assistant") and the content of the response. 3. Include Usage Data: If applicable, include

CompletionUsage

data to provide information about token usage. Here's an example of how you might format the response:

Copy code

from livekit.agents.llm import ChatChunk, Choice, ChoiceDelta, CompletionUsage

def format_llm_response(request_id, response_content):
    # Create a ChoiceDelta with the response content
    choice_delta = ChoiceDelta(role="assistant", content=response_content)

    # Create a Choice object
    choice = Choice(delta=choice_delta)

    # Optionally, include usage data
    usage = CompletionUsage(completion_tokens=10, prompt_tokens=5, total_tokens=15)

    # Create a ChatChunk with the request_id, choices, and usage
    chat_chunk = ChatChunk(request_id=request_id, choices=[choice], usage=usage)

    return chat_chunk

# Example usage
formatted_response = format_llm_response("unique_request_id", "This is the response content.")

Explanation - ChatChunk: This is the main object that encapsulates the response. It includes a

request_id

, a list of

Choice

objects, and optional

CompletionUsage

. - Choice and ChoiceDelta: These objects represent individual response options. The

ChoiceDelta

contains the role and content of the response. - CompletionUsage: This optional data provides information about the number of tokens used in the response. By formatting your LLM response in this way, you ensure compatibility with LiveKit's processing requirements. Would you like more detailed guidance on implementing this in your specific context? Sources: - LiveKit LLM Documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

helpful-sugar-58738

02/25/2025, 9:06 PM

@millions-hair-24956 generate a example of a response that allings with chatchunk

gray-monkey-83525

02/25/2025, 9:06 PM

To generate an example of a response that aligns with a

ChatChunk

in LiveKit, you need to structure the response to include a

request_id

, a list of

Choice

objects, and optionally,

CompletionUsage

data. Here's a complete example:

Copy code

from livekit.agents.llm import ChatChunk, Choice, ChoiceDelta, CompletionUsage

# Example function to create a ChatChunk response
def create_chat_chunk_example():
    # Unique identifier for the request
    request_id = "example_request_id"

    # Content of the response
    response_content = "This is an example response from the assistant."

    # Create a ChoiceDelta with the response content
    choice_delta = ChoiceDelta(role="assistant", content=response_content)

    # Create a Choice object
    choice = Choice(delta=choice_delta)

    # Optionally, include usage data
    usage = CompletionUsage(completion_tokens=10, prompt_tokens=5, total_tokens=15)

    # Create a ChatChunk with the request_id, choices, and usage
    chat_chunk = ChatChunk(request_id=request_id, choices=[choice], usage=usage)

    return chat_chunk

# Generate the example ChatChunk
example_chat_chunk = create_chat_chunk_example()

# Print the example ChatChunk
print(example_chat_chunk)

Explanation - ChatChunk: This object encapsulates the entire response. It includes a

request_id

to uniquely identify the request, a list of

Choice

objects, and optional

CompletionUsage

data. - Choice and ChoiceDelta: These objects represent the response options. The

ChoiceDelta

contains the role (e.g., "assistant") and the content of the response. - CompletionUsage: This optional data provides information about the number of tokens used in the response, which can be useful for tracking and billing purposes. This example demonstrates how to structure a response in a way that is compatible with LiveKit's processing requirements. If you need further customization or have specific requirements, feel free to ask for more details. Sources: - https://docs.livekit.io/python/livekit/plugins/anthropic/index.html#livekit.plugins.anthropic.LLMStream 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

helpful-sugar-58738

02/25/2025, 9:10 PM

@millions-hair-24956 generate the response in json

gray-monkey-83525

02/25/2025, 9:10 PM

To generate a JSON representation of a

ChatChunk

response, you need to convert the

ChatChunk

and its components (

Choice

ChoiceDelta

, and

CompletionUsage

) into a JSON-serializable format. Here's how you can do it:

Copy code

import json
from livekit.agents.llm import ChatChunk, Choice, ChoiceDelta, CompletionUsage

# Example function to create a ChatChunk response
def create_chat_chunk_example():
    # Unique identifier for the request
    request_id = "example_request_id"

    # Content of the response
    response_content = "This is an example response from the assistant."

    # Create a ChoiceDelta with the response content
    choice_delta = ChoiceDelta(role="assistant", content=response_content)

    # Create a Choice object
    choice = Choice(delta=choice_delta)

    # Optionally, include usage data
    usage = CompletionUsage(completion_tokens=10, prompt_tokens=5, total_tokens=15)

    # Create a ChatChunk with the request_id, choices, and usage
    chat_chunk = ChatChunk(request_id=request_id, choices=[choice], usage=usage)

    return chat_chunk

# Convert the ChatChunk to a JSON-serializable dictionary
def chat_chunk_to_dict(chat_chunk):
    return {
        "request_id": chat_chunk.request_id,
        "choices": [
            {
                "delta": {
                    "role": choice.delta.role,
                    "content": choice.delta.content
                }
            } for choice in chat_chunk.choices
        ],
        "usage": {
            "completion_tokens": chat_chunk.usage.completion_tokens,
            "prompt_tokens": chat_chunk.usage.prompt_tokens,
            "total_tokens": chat_chunk.usage.total_tokens
        } if chat_chunk.usage else None
    }

# Generate the example ChatChunk
example_chat_chunk = create_chat_chunk_example()

# Convert to JSON
example_chat_chunk_json = json.dumps(chat_chunk_to_dict(example_chat_chunk), indent=4)

# Print the JSON representation
print(example_chat_chunk_json)

Explanation - chat_chunk_to_dict: This function converts a

ChatChunk

object into a dictionary that can be serialized to JSON. It extracts the

request_id

choices

, and

usage

data. - JSON Serialization: The

json.dumps

function is used to convert the dictionary into a JSON string, with

indent=4

for pretty-printing. This JSON representation can be used to send the response over a network or store it in a database. If you need further customization, feel free to ask for more details. Sources: - LiveKit LLM Documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

22 Views

Open in Slack

Previous Next