I’ve retrieved the conversation history (from sess...
# ask-ai
p
I’ve retrieved the conversation history (from session logs) containing all the user and agent messages with timestamps(created_at). What’s the best way to sync this session history with the audio recording?
t
Thanks for reaching out, harish. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
I'm happy to guide you through syncing session history with audio recordings! Just follow these simple steps: 1. Record Audio: Start by using Room Composite Egress to capture the session's audio. This way, the entire interaction is safely recorded and stored in your cloud storage. 2. Enable TTS-Aligned Transcription: Make sure to activate the
use_tts_aligned_transcript
feature in your agent configuration. This will give you word- or sentence-level timestamps for the agent's speech, which is super helpful. 3. Access Timed Transcripts: Use the
transcription_node
method to get precise timing for each word or sentence. This helps you map the session history right onto the audio timeline. 4. Export Session History: Save the session history with timestamps using
session.history.to_dict()
. This captures all messages along with their timing metadata. 5. Align Timestamps: Finally, match the timestamps from the session history (like
start_time
from
TimedString
) with the audio recording’s timeline to keep everything perfectly synchronized. Also, for real-time updates, you can listen to events such as
conversation_item_added
or
user_input_transcribed
, which include timestamps for each message. If you have any questions or need a hand with any part of this, just let me know—I’m here to help! Sources: Text and transcriptions | LiveKit Docs | Session recording and transcripts | LiveKit Docs | Text and transcriptions | LiveKit Docs
p
can you give an example code
t
I'd love to share an example code snippet that helps you record audio and enable TTS-aligned transcription, so you can sync your session history beautifully with audio recordings. Let’s walk through it together: 1. Record Audio via Room Composite Egress
Copy code
from livekit import api

async def start_audio_recording():
    req = api.RoomCompositeEgressRequest(
        room_name="example-room",
        audio_only=True,
        file_outputs=[api.EncodedFileOutput(
            file_type=api.EncodedFileType.OGG,
            filepath="recordings/session.ogg",
            s3=api.S3Upload(
                bucket="my-bucket",
                region="us-west-2",
                access_key="my-access-key",
                secret="my-secret-key",
            ),
        )],
    )
    lkapi = api.LiveKitAPI()
    await lkapi.egress.start_room_composite_egress(req)
    await lkapi.aclose()
2. Enable TTS-Aligned Transcription
Copy code
from livekit import AgentSession

session = AgentSession(
    stt=my_stt,
    llm=my_llm,
    tts=my_tts,
    use_tts_aligned_transcript=True,  # This enables word/sentence-level timestamps
)
3. Access Timed Transcripts
Copy code
async def transcription_node(self, text, model_settings):
    async for chunk in text:
        if isinstance(chunk, TimedString):
            print(f"[{chunk.start_time:.2f}s - {chunk.end_time:.2f}s] {chunk.text}")
        yield chunk
4. Export Session History
Copy code
async def save_session_history():
    import json
    from datetime import datetime

    filename = f"/tmp/transcript_{session.room_name}_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json"
    with open(filename, 'w') as f:
        json.dump(session.history.to_dict(), f, indent=2)
Please feel free to reach out if you'd like me to clarify any part or provide more examples—I’m here to help!
p
now to map the timedString to session history?
t
To map
TimedString
objects to session history, you can use the timestamps provided by
TimedString
(e.g.,
start_time
and
end_time
) and align them with the
created_at
timestamps in the session history. Here's a friendly guide on how you can do it: 1. Extract TimedString Data: You can use the
transcription_node
method to capture
TimedString
objects along with their timing information. Here’s a little snippet to help you out:`async def transcription_node(self, text, model_settings): async for chunk in text: if isinstance(chunk, TimedString): print(f"[{chunk.start_time:.2f}s - {chunk.end_time:.2f}s] {chunk.text}") # Store or process the timing data here yield chunk` 2. Retrieve Session History: Next, you’ll want to access the session history using
session.history.to_dict()
. This will give you all the messages along with their
created_at
timestamps. Here’s how you can do that:`history = session.history.to_dict() for item in history['items']: print(f"[{item['created_at']}] {item['role']}: {item['text_content']}")` 3. Align TimedString with Session History: Finally, you can compare the
start_time
and
end_time
of each
TimedString
with the
created_at
timestamps in the session history. A simple matching algorithm can help find the closest timestamps. Here’s an example:`for item in history['items']: for timed_string in timed_strings: # List of TimedString objects if abs(timed_string.start_time - item['created_at']) < threshold: print(f"Matched: {item['text_content']} with {timed_string.text}")` If you have any questions or want me to walk you through any part of this, please don’t hesitate to ask. I’m here to help!