<@U084APK40NS> how to use my own custom LLM Server...
# ask-ai
h
@millions-hair-24956 how to use my own custom LLM Server in Livekit
g
To use your own custom LLM server with LiveKit, you need to create a custom LLM class that integrates with your server's API. According to LiveKit documentation, you can write a Python or Node.js application to define your agent, including custom plugins for LLMs. You can then configure the
VoicePipelineAgent
to use this custom LLM by passing it as the
llm
parameter. Here's a basic example of how you might structure a custom LLM class in Python:
Copy code
class CustomLLM(llm.LLM):
    def __init__(self, api_url, api_key):
        super().__init__()
        self.api_url = api_url
        self.api_key = api_key
        self.client = aiohttp.ClientSession()

    async def chat(self, chat_ctx, fnc_ctx=None, temperature=None, n=1, parallel_tool_calls=None):
        # Implement your API call here
        response = await self.client.post(self.api_url, json={"key": self.api_key, "context": chat_ctx})
        return response.json()

    async def aclose(self):
        await self.client.close()
Would you like more detailed guidance on integrating this with LiveKit's
VoicePipelineAgent
? Sources: - LiveKit documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
h
Yes I need more guidance
@millions-hair-24956 yes i need more guidance
g
To integrate your custom LLM server with LiveKit's
VoicePipelineAgent
, follow these steps: 1. Create a Custom LLM Class: Define a class that inherits from
llm.LLM
and implements the
chat
method to interact with your LLM server. This method should handle sending requests to your server and processing the responses. 2. Initialize the Custom LLM: When setting up the
VoicePipelineAgent
, pass an instance of your custom LLM class as the
llm
parameter. 3. Configure the Agent: Set up the
VoicePipelineAgent
with the necessary components like VAD, STT, and TTS, along with your custom LLM. Here's a step-by-step breakdown with code: Step 1: Create a Custom LLM Class
Copy code
import aiohttp
from livekit.agents import llm

class CustomLLM(llm.LLM):
    def __init__(self, api_url, api_key):
        super().__init__()
        self.api_url = api_url
        self.api_key = api_key
        self.client = aiohttp.ClientSession()

    async def chat(self, chat_ctx, fnc_ctx=None, temperature=None, n=1, parallel_tool_calls=None):
        # Prepare the request payload
        payload = {
            "api_key": self.api_key,
            "context": chat_ctx.messages
        }
        # Send the request to your LLM server
        async with self.client.post(self.api_url, json=payload) as response:
            if response.status == 200:
                data = await response.json()
                # Process and return the response as an LLMStream
                return llm.LLMStream(data['response'])
            else:
                raise Exception(f"Failed to get response: {response.status}")

    async def aclose(self):
        await self.client.close()
Step 2: Initialize the Custom LLM
Copy code
custom_llm = CustomLLM(api_url="https://your-llm-server.com/api", api_key="your_api_key")
Step 3: Configure the VoicePipelineAgent
Copy code
from livekit.agents.pipeline import VoicePipelineAgent
from livekit.plugins import silero, deepgram, cartesia

# Initialize other components
vad = silero.VAD.load()
stt = deepgram.STT(model="nova-2-general")
tts = cartesia.TTS()

# Create the agent
agent = VoicePipelineAgent(
    vad=vad,
    stt=stt,
    llm=custom_llm,
    tts=tts,
    chat_ctx=llm.ChatContext().append(role="system", text="Your initial prompt"),
    allow_interruptions=True,
    interrupt_speech_duration=0.5,
    interrupt_min_words=0,
    min_endpointing_delay=0.5
)

# Start the agent in a room with a participant
# agent.start(room, participant)
Explanation - Custom LLM Class: This class handles communication with your LLM server. The
chat
method sends the chat context to your server and processes the response. - VoicePipelineAgent: This agent is configured with your custom LLM and other necessary components like VAD, STT, and TTS. - Agent Initialization: The agent is initialized with the custom LLM and other components, and can be started in a LiveKit room. This setup allows you to use your custom LLM server within the LiveKit framework, leveraging the
VoicePipelineAgent
for voice interactions. Sources: - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1732128458.518759 - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1728788979.238089 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
h
@millions-hair-24956 I'm getting this error raise TypeError(f'Object of type {o.class.name} '
g
It seems likely that you're encountering a
TypeError
related to an object that cannot be serialized, possibly when trying to convert it to JSON. This error often occurs when using Python's
json
module to serialize objects that are not natively serializable, such as custom classes or complex data types. To resolve this, you can implement a custom JSON encoder by subclassing
json.JSONEncoder
and overriding the
default
method to handle the serialization of your custom objects. Here's a basic example:
Copy code
import json

class CustomEncoder(json.JSONEncoder):
    def default(self, obj):
        # Check if the object is an instance of a custom class
        if isinstance(obj, YourCustomClass):
            return obj.to_dict()  # Convert the object to a dictionary
        # Call the base class method for other types
        return super().default(obj)

# Usage
data = {
    'key': YourCustomClass()  # Replace with your actual object
}

json_data = json.dumps(data, cls=CustomEncoder)
Would you like more detailed guidance on implementing this in your specific context? Sources: - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1732128458.518759 - https://livekit-users.slack.com/archives/C07FY8WHGPM/p1731158203.654409 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
h
@millions-hair-24956 im getting this errorr 2025-02-25 150727,959 - ERROR livekit.agents.pipeline - Error in _synthesize_answer_task Traceback (most recent call last): File "/home/victor/Sundevs/live-kit-demo/livekit-demo/venv/lib/python3.10/site-packages/livekit/agents/utils/log.py", line 16, in async_fn_logs return await fn(*args, **kwargs) File "/home/victor/Sundevs/live-kit-demo/livekit-demo/venv/lib/python3.10/site-packages/livekit/agents/pipeline/pipeline_agent.py", line 755, in _synthesize_answer_task llm_stream = await llm_stream File "/home/victor/Sundevs/live-kit-demo/livekit-demo/agent.py", line 40, in chat async with self.client.post(self.api_url, json=payload) as response: File "/home/victor/Sundevs/live-kit-demo/livekit-demo/venv/lib/python3.10/site-packages/aiohttp/client.py", line 1425, in aenter self._resp: _RetType = await self._coro File "/home/victor/Sundevs/live-kit-demo/livekit-demo/venv/lib/python3.10/site-packages/aiohttp/client.py", line 521, in _request data = payload.JsonPayload(json, dumps=self._json_serialize) File "/home/victor/Sundevs/live-kit-demo/livekit-demo/venv/lib/python3.10/site-packages/aiohttp/payload.py", line 448, in init dumps(value).encode(encoding), File "/usr/lib/python3.10/json/__init__.py", line 231, in dumps return _default_encoder.encode(obj) File "/usr/lib/python3.10/json/encoder.py", line 199, in encode chunks = self.iterencode(o, _one_shot=True) File "/usr/lib/python3.10/json/encoder.py", line 257, in iterencode return _iterencode(o, 0) File "/usr/lib/python3.10/json/encoder.py", line 179, in default raise TypeError(f'Object of type {o.class.name} ' TypeError: Object of type ChatMessage is not JSON serializable
g
The error you're encountering indicates that the
ChatMessage
object is not JSON serializable. This typically happens when you try to serialize an object that Python's
json
module doesn't know how to handle. To resolve this, you need to convert the
ChatMessage
object into a format that can be serialized, such as a dictionary. Here's how you can modify your code to handle this: 1. Convert
ChatMessage
to a Dictionary
: Ensure that the
ChatMessage
class has a method to convert its instances to a dictionary. If it doesn't, you can add a method like
as_dict()
to do this. 2. Use the Conversion Method: Before sending the
ChatMessage
object in your API request, convert it to a dictionary. Here's an example of how you might implement this:
Copy code
# Assuming ChatMessage is a class you have defined
class ChatMessage:
    def __init__(self, message, role):
        self.message = message
        self.role = role

    def as_dict(self):
        return {
            "message": self.message,
            "role": self.role
        }

# When preparing the payload
chat_message = ChatMessage("Hello, how can I help you?", "user")
payload = {
    "api_key": "your_api_key",
    "context": [chat_message.as_dict()]  # Convert to dict
}

# Sending the request
async with self.client.post(self.api_url, json=payload) as response:
    # Handle the response
This approach ensures that the
ChatMessage
object is converted into a JSON-serializable format before being included in the request payload. Would you like more detailed guidance on implementing this in your specific context? Sources: - https://docs.livekit.io/python/livekit/rtc/chat.html#livekit.rtc.chat.ChatMessage 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
h
@millions-hair-24956 now im getting this error TypeError: Can't instantiate abstract class LLMStream with abstract method _run
g
The error you're encountering indicates that you're trying to instantiate an abstract class
LLMStream
without providing an implementation for its abstract method
_run
. In Python, abstract classes are meant to be subclassed, and any abstract methods must be implemented in the subclass. Here's how you can resolve this: 1. Create a Subclass: Define a subclass of
LLMStream
and implement the
_run
method. This method should contain the logic for processing the LLM stream. 2. Implement the
_run
Method
: Ensure that your implementation of
_run
handles the specific tasks you need, such as processing data from your LLM server. Here's an example of how you might implement this:
Copy code
from livekit.agents.llm import LLMStream

class CustomLLMStream(LLMStream):
    def __init__(self, llm, chat_ctx, fnc_ctx=None):
        super().__init__(llm=llm, chat_ctx=chat_ctx, fnc_ctx=fnc_ctx)

    async def _run(self):
        # Implement your logic here
        # For example, process the LLM response and yield results
        async for chunk in self._llm_response():
            # Process each chunk
            yield chunk

    async def _llm_response(self):
        # Simulate an LLM response stream
        # Replace this with actual logic to get data from your LLM server
        yield "response_chunk_1"
        yield "response_chunk_2"

# Usage
custom_stream = CustomLLMStream(llm=your_llm_instance, chat_ctx=your_chat_context)
Explanation - CustomLLMStream: This class inherits from
LLMStream
and provides an implementation for the
_run
method. - _run Method: This method should contain the logic for handling the LLM stream, such as processing responses from your LLM server. By implementing the
_run
method, you ensure that the
LLMStream
can be instantiated without errors. Would you like more detailed guidance on implementing this in your specific context? Sources: - LLMStream Documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
h
@millions-hair-24956 how should my custom LLM response be formatted so that it can be processed by livekit?
g
To ensure that your custom LLM response can be processed by LiveKit, you need to format it in a way that aligns with the expected structure of a
ChatChunk
. According to the LiveKit documentation, a
ChatChunk
consists of a
request_id
, a list of
Choice
objects, and optional
CompletionUsage
data. Here's how you can format your LLM response: 1. Structure the Response: Your LLM response should be structured as a
ChatChunk
object. This includes a
request_id
to uniquely identify the request, a list of
Choice
objects that contain the response content, and optionally, usage statistics. 2. Create
Choice
Objects
: Each
Choice
object should include a
ChoiceDelta
with the role (e.g., "assistant") and the content of the response. 3. Include Usage Data: If applicable, include
CompletionUsage
data to provide information about token usage. Here's an example of how you might format the response:
Copy code
from livekit.agents.llm import ChatChunk, Choice, ChoiceDelta, CompletionUsage

def format_llm_response(request_id, response_content):
    # Create a ChoiceDelta with the response content
    choice_delta = ChoiceDelta(role="assistant", content=response_content)

    # Create a Choice object
    choice = Choice(delta=choice_delta)

    # Optionally, include usage data
    usage = CompletionUsage(completion_tokens=10, prompt_tokens=5, total_tokens=15)

    # Create a ChatChunk with the request_id, choices, and usage
    chat_chunk = ChatChunk(request_id=request_id, choices=[choice], usage=usage)

    return chat_chunk

# Example usage
formatted_response = format_llm_response("unique_request_id", "This is the response content.")
Explanation - ChatChunk: This is the main object that encapsulates the response. It includes a
request_id
, a list of
Choice
objects, and optional
CompletionUsage
. - Choice and ChoiceDelta: These objects represent individual response options. The
ChoiceDelta
contains the role and content of the response. - CompletionUsage: This optional data provides information about the number of tokens used in the response. By formatting your LLM response in this way, you ensure compatibility with LiveKit's processing requirements. Would you like more detailed guidance on implementing this in your specific context? Sources: - LiveKit LLM Documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
h
@millions-hair-24956 generate a example of a response that allings with chatchunk
g
To generate an example of a response that aligns with a
ChatChunk
in LiveKit, you need to structure the response to include a
request_id
, a list of
Choice
objects, and optionally,
CompletionUsage
data. Here's a complete example:
Copy code
from livekit.agents.llm import ChatChunk, Choice, ChoiceDelta, CompletionUsage

# Example function to create a ChatChunk response
def create_chat_chunk_example():
    # Unique identifier for the request
    request_id = "example_request_id"

    # Content of the response
    response_content = "This is an example response from the assistant."

    # Create a ChoiceDelta with the response content
    choice_delta = ChoiceDelta(role="assistant", content=response_content)

    # Create a Choice object
    choice = Choice(delta=choice_delta)

    # Optionally, include usage data
    usage = CompletionUsage(completion_tokens=10, prompt_tokens=5, total_tokens=15)

    # Create a ChatChunk with the request_id, choices, and usage
    chat_chunk = ChatChunk(request_id=request_id, choices=[choice], usage=usage)

    return chat_chunk

# Generate the example ChatChunk
example_chat_chunk = create_chat_chunk_example()

# Print the example ChatChunk
print(example_chat_chunk)
Explanation - ChatChunk: This object encapsulates the entire response. It includes a
request_id
to uniquely identify the request, a list of
Choice
objects, and optional
CompletionUsage
data. - Choice and ChoiceDelta: These objects represent the response options. The
ChoiceDelta
contains the role (e.g., "assistant") and the content of the response. - CompletionUsage: This optional data provides information about the number of tokens used in the response, which can be useful for tracking and billing purposes. This example demonstrates how to structure a response in a way that is compatible with LiveKit's processing requirements. If you need further customization or have specific requirements, feel free to ask for more details. Sources: - https://docs.livekit.io/python/livekit/plugins/anthropic/index.html#livekit.plugins.anthropic.LLMStream 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.
h
@millions-hair-24956 generate the response in json
g
To generate a JSON representation of a
ChatChunk
response, you need to convert the
ChatChunk
and its components (
Choice
,
ChoiceDelta
, and
CompletionUsage
) into a JSON-serializable format. Here's how you can do it:
Copy code
import json
from livekit.agents.llm import ChatChunk, Choice, ChoiceDelta, CompletionUsage

# Example function to create a ChatChunk response
def create_chat_chunk_example():
    # Unique identifier for the request
    request_id = "example_request_id"

    # Content of the response
    response_content = "This is an example response from the assistant."

    # Create a ChoiceDelta with the response content
    choice_delta = ChoiceDelta(role="assistant", content=response_content)

    # Create a Choice object
    choice = Choice(delta=choice_delta)

    # Optionally, include usage data
    usage = CompletionUsage(completion_tokens=10, prompt_tokens=5, total_tokens=15)

    # Create a ChatChunk with the request_id, choices, and usage
    chat_chunk = ChatChunk(request_id=request_id, choices=[choice], usage=usage)

    return chat_chunk

# Convert the ChatChunk to a JSON-serializable dictionary
def chat_chunk_to_dict(chat_chunk):
    return {
        "request_id": chat_chunk.request_id,
        "choices": [
            {
                "delta": {
                    "role": choice.delta.role,
                    "content": choice.delta.content
                }
            } for choice in chat_chunk.choices
        ],
        "usage": {
            "completion_tokens": chat_chunk.usage.completion_tokens,
            "prompt_tokens": chat_chunk.usage.prompt_tokens,
            "total_tokens": chat_chunk.usage.total_tokens
        } if chat_chunk.usage else None
    }

# Generate the example ChatChunk
example_chat_chunk = create_chat_chunk_example()

# Convert to JSON
example_chat_chunk_json = json.dumps(chat_chunk_to_dict(example_chat_chunk), indent=4)

# Print the JSON representation
print(example_chat_chunk_json)
Explanation - chat_chunk_to_dict: This function converts a
ChatChunk
object into a dictionary that can be serialized to JSON. It extracts the
request_id
,
choices
, and
usage
data. - JSON Serialization: The
json.dumps
function is used to convert the dictionary into a JSON string, with
indent=4
for pretty-printing. This JSON representation can be used to send the response over a network or store it in a database. If you need further customization, feel free to ask for more details. Sources: - LiveKit LLM Documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.