I want to integrate D ID to VoiceAssistant I want to replace LiveKit Community #ask-ai

I want to integrate D-ID to VoiceAssistant(), I wa...

fierce-telephone-74426

02/15/2025, 7:54 PM

I want to integrate D-ID to VoiceAssistant(), I want to replace the tts for the interface provided by D-ID, the d-I’d interface is a text-to-video. Here is the link https://docs.d-id.com/reference/clips-streams-overview. The intention is to use clips streams. What changes do I need to do in my livekit agent to add video streaming per d-Id interface?

gray-monkey-83525

02/15/2025, 7:54 PM

It seems likely that integrating D-ID's text-to-video interface with LiveKit's

VoiceAssistant

would involve replacing the TTS component with D-ID's API for generating video clips. According to the D-ID documentation, you would need to handle the creation of a peer connection, generate an SDP answer, and manage ICE candidates to establish a WebRTC connection for streaming video. To integrate this with LiveKit, you might need to modify the

VoiceAssistant

setup to include a mechanism for sending text to D-ID's API and receiving the video stream. This could involve creating a custom plugin or modifying the existing agent to handle the video stream as a track in the LiveKit room. You would also need to ensure that the video stream is synchronized with any audio or other media being handled by the agent. Would you like more detailed guidance on how to implement these changes? Sources: - D-ID documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

fierce-telephone-74426

02/15/2025, 7:56 PM

@millions-hair-24956 yes. Please provide an implementation for it in python

gray-monkey-83525

02/15/2025, 7:56 PM

My understanding is that integrating D-ID's text-to-video interface with LiveKit's

VoiceAssistant

would require setting up a WebRTC connection to stream video clips generated by D-ID. Here's a step-by-step breakdown of how you might implement this in Python: 1. Set Up WebRTC Connection: Use the D-ID API to create a peer connection and handle SDP offers/answers. You'll need to manage ICE candidates to establish a stable connection. 2. Modify VoiceAssistant: Replace the TTS component with a function that sends text to D-ID's API and receives a video stream. This involves creating a custom function within the

VoiceAssistant

class. 3. Stream Video: Once the video is generated, publish it as a video track in the LiveKit room. This requires handling the video stream as a track and ensuring it is synchronized with any audio or other media. Here's a basic outline of how you might start implementing this:

Copy code

import asyncio
from livekit import rtc
from livekit.agents import VoiceAssistant
from livekit.plugins import openai

async def send_text_to_did(text):
    # Implement the API call to D-ID to generate video
    # Handle SDP offer/answer and ICE candidates
    # Return the video stream URL or object
    pass

async def entrypoint(ctx):
    await ctx.connect()
    assistant = VoiceAssistant(
        llm=openai.LLM(model="gpt-4o"),
        # Replace TTS with D-ID integration
        tts=None,  # No TTS, using D-ID for video
    )

    async def _answer(text):
        video_stream = await send_text_to_did(text)
        # Publish video stream to LiveKit room
        # This might involve creating a LocalVideoTrack
        pass

    @assistant.on("message_received")
    async def on_message_received(msg):
        if msg.message:
            await _answer(msg.message)

    assistant.start(ctx.room)

if __name__ == "__main__":
    # Run the application
    pass

This is a high-level overview and would need to be fleshed out with specific API calls and error handling. Would you like more detailed guidance on any specific part of this process? Sources: - D-ID documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

fierce-telephone-74426

02/16/2025, 12:19 AM

@millions-hair-24956 provide a full implementation of the _answer() function

gray-monkey-83525

02/16/2025, 12:19 AM

To provide a full implementation of the

_answer()

function that integrates D-ID's text-to-video API with LiveKit's

VoiceAssistant

, you need to handle the API call to D-ID, manage the WebRTC connection, and publish the video stream to the LiveKit room. Here's a detailed implementation:

Copy code

import asyncio
import aiohttp
from livekit import rtc
from livekit.agents import VoiceAssistant
from livekit.plugins import openai

async def send_text_to_did(text):
    # Example API call to D-ID to generate video
    async with aiohttp.ClientSession() as session:
        response = await session.post(
            'https://api.d-id.com/clips/streams',
            headers={
                'Authorization': 'Basic YOUR_DID_API_KEY',
                'Content-Type': 'application/json',
            },
            json={
                'presenter_id': 'YOUR_PRESENTER_ID',
                'driver_id': 'YOUR_DRIVER_ID',
                'text': text
            }
        )
        data = await response.json()
        return data['stream_id'], data['session_id']

async def handle_webrtc_connection(stream_id, session_id):
    # Handle SDP offer/answer and ICE candidates
    # This is a placeholder for the actual WebRTC connection setup
    pass

async def _answer(text, ctx):
    stream_id, session_id = await send_text_to_did(text)
    await handle_webrtc_connection(stream_id, session_id)

    # Create a LocalVideoTrack and publish it to the room
    video_source = rtc.VideoSource()
    video_track = rtc.LocalVideoTrack.create_video_track("video", video_source)
    await ctx.room.local_participant.publish_track(video_track)

    # Simulate video streaming
    while True:
        # Fetch video frames and feed them to the video source
        # This is a placeholder for actual video frame handling
        await asyncio.sleep(1)

async def entrypoint(ctx):
    await ctx.connect()
    assistant = VoiceAssistant(
        llm=openai.LLM(model="gpt-4o"),
        tts=None,  # No TTS, using D-ID for video
    )

    @assistant.on("message_received")
    async def on_message_received(msg):
        if msg.message:
            await _answer(msg.message, ctx)

    assistant.start(ctx.room)

if __name__ == "__main__":
    # Run the application
    pass

gray-monkey-83525

02/16/2025, 12:19 AM

Explanation: 1. send_text_to_did: This function sends a request to D-ID's API to generate a video based on the provided text. It returns the

stream_id

and

session_id

needed for the WebRTC connection. 2. handle_webrtc_connection: This function is a placeholder where you would handle the WebRTC connection setup, including SDP offer/answer and ICE candidates. 3. _answer: This function integrates the above components, sets up the video stream, and publishes it to the LiveKit room. This implementation is a starting point and would need to be adapted to your specific requirements, including handling video frames and managing the WebRTC connection. Would you like more details on any specific part of this implementation? Sources: - D-ID documentation 👍 0 button 👎 0 button 💡 Hint: Mention @runllm in the thread for followups.

Open in Slack

Previous Next