boundless-battery-65494
08/12/2025, 4:16 PMtall-airline-16756
08/12/2025, 4:16 PMtall-airline-16756
08/12/2025, 4:17 PMlogMetrics(metrics)
Just use the logMetrics
function from the metrics module to log your metrics easily.
2. For General Logging:
setLogExtension((level: LogLevel, msg: string, context: object) => {
// Use your log provider's API to send this log message and context
});
This setLogExtension
function lets you plug into the logging system effortlessly.
The metrics system can track all sorts of useful info, like:
• Speech-to-text (STT) metrics: audio duration and streaming status
• LLM metrics: time to first token, input/output tokens
• Text-to-speech (TTS) metrics: time to first byte, audio duration
If you have any questions or want a hand with anything else, just let me know—I’m here to help!
Sources: Capturing metrics | LiveKit Docs | logMetrics | LiveKit Agents | src/logger.ts | How to Send Logs from the JavaScript SDK to Providers like Datadogboundless-battery-65494
08/12/2025, 5:04 PMtall-airline-16756
08/12/2025, 5:05 PMRoomEvent.TranscriptionReceived
event to catch transcription segments. Just peek at the final
property on each segment to know when it’s all wrapped up.
useEffect(() => {
if (!room) {
return;
}
const handleTranscription = (event: RoomEvent.TranscriptionReceived) => {
const { transcriptionSegment } = event;
if (transcriptionSegment.final) {
// Here’s where you can handle the final transcript
console.log('Final transcript:', transcriptionSegment.text);
}
};
room.on(RoomEvent.TranscriptionReceived, handleTranscription);
return () => {
room.off(RoomEvent.TranscriptionReceived, handleTranscription);
};
}, [room]);
2. Agent Integration:
You can call the flush()
method on the transcription output to send off the final transcription. This makes sure the transcription is published with final=True
.
async def flush(self) -> None:
if self._participant_identity is None or self._track_id is None or not self._capturing:
return
self._flush_task = asyncio.create_task(
self._publish_transcription(self._current_id, self._pushed_text, final=True)
)
self._reset_state()
Just a few quick tips:
• Always check that final
property so you’re working with the complete transcript.
• Use flush()
to wrap things up and send the transcription once you’ve got the text.
• Make sure your backend is set up to forward transcriptions over to the frontend.
If you want me to walk you through more examples or have any questions, just give me a shout—I’m here to help!boundless-battery-65494
08/12/2025, 5:05 PMtall-airline-16756
08/12/2025, 5:06 PMimport {
Room,
RoomEvent,
TranscriptionSegment,
Participant,
TrackPublication,
} from '@livekit/rtc-node';
// Initialize the LiveKit room
const room = new Room();
// Connect to the room (just swap in your actual token)
const token = 'your_room_token';
room.connect(token).then(() => {
console.log('Connected to room');
});
// Listen for transcription events
room.on(RoomEvent.TranscriptionReceived, (segments, participant, publication) => {
segments.forEach((segment) => {
if (segment.final) {
console.log(`Final transcription from ${participant?.identity}: ${segment.text}`);
// Here’s where you can handle the final segment (like saving or displaying it)
} else {
console.log(`Interim transcription from ${participant?.identity}: ${segment.text}`);
// You can also handle interim segments if you want
}
});
});
// Catch any connection errors
room.on('error', (err) => {
console.error('Room connection error:', err);
});
// Disconnect from the room when you’re done
room.disconnect().then(() => {
console.log('Disconnected from room');
});
Quick rundown:
• Initialization: Sets up a LiveKit room connection using your token.
• Event Handling: Listens for RoomEvent.TranscriptionReceived
to get transcription segments.
• Segment Handling: Logs the transcription segments, giving special attention to the final ones.
• Error Handling: Includes simple error handling for connection hiccups.
If you want me to walk you through anything else or need more examples, just give me a shout!boundless-battery-65494
08/12/2025, 5:09 PMtall-airline-16756
08/12/2025, 5:10 PMSTTSegmentsForwarder
class with various STT plugins (like OpenAI, Google, Silero, Amazon Transcribe) to get transcriptions in real-time. This way, you can pick the STT service that works best for you.
2. LiveKit's Built-in Transcription Capabilities:
LiveKit has built-in transcription support in its VoicePipelineAgent
and MultimodalAgent
classes. These agents can automatically send transcriptions to clients if you set it up on the frontend.
3. Transcription through Pipeline Nodes:
You can also use LiveKit's pipeline nodes, like the transcription_node
, to capture and process transcripts as they happen.
4. Session History:
After the session ends, you can access the session history programmatically. The session.history
property holds the whole conversation history, including transcripts.
from datetime import datetime
import json
from livekit import agents
async def on_shutdown(cbx: agents.ShutdownCallback):
history = cbx.session.history
with open(f"transcripts_{datetime.now()}.json", "w") as f:
json.dump(history.to_dict(), f)
5. Event-Based Transcription:
You can listen to events like conversation_item_added
or user_input_transcribed
to grab transcripts as they come in.
@session.on("conversation_item_added")
def on_conversation_item_added(ev: agentsConversationItemAddedEvent):
if ev.item.type == agents.ConversationItemType.USER_INPUT:
# Capture the transcribed user input if available
...
If you want, I’d be happy to walk you through any of these options or help you get started!boundless-battery-65494
08/12/2025, 5:12 PMtall-airline-16756
08/12/2025, 5:13 PMimport { STT } from '@livekit/agents-plugin-deepgram';
// Initialize STT with your API key
const stt = new STT({
apiKey: 'your_deepgram_api_key',
interimResults: true,
sampleRate: 48000,
endpointing: 25,
});
// Create a stream to process audio frames
const sttStream = stt.stream();
// Handle transcription events
sttStream.on(SpeechEventType.INTERIM_TRANSCRIPT, (event) => {
console.log('Interim transcript:', event.alternatives[0].text);
});
sttStream.on(SpeechEventType.FINAL_TRANSCRIPT, (event) => {
console.log('Final transcript:', event.alternatives[0].text);
});
// Start processing audio frames
const audioStream = new AudioStream(track);
for await (const frame of audioStream) {
sttStream.pushFrame(frame);
}
2. AssemblyAI STT Plugin:
This one’s pretty similar to Deepgram, offering real-time transcription with both interim and final results. You can tweak it to fit your needs.
3. Azure Speech Services:
Another great option! Azure Speech Services supports Node.js and provides real-time transcription as well.
4. Custom STT Integration:
If you want something more tailored, you can extend the STT
class from the LiveKit Agents framework to plug in any STT service you prefer.
Frontend Integration:
To catch transcription events on your frontend, just listen for RoomEvent.TranscriptionReceived
and update your UI with the segments you get.
If you want me to walk you through any of these or need a specific example, just let me know—I’m happy to help!
Sources: LiveKit integration guides | LiveKit Docs | agents/src/multimodal/multimodal_agent.tsboundless-battery-65494
08/12/2025, 5:46 PMtall-airline-16756
08/12/2025, 5:46 PMsttStream.on
isn’t available, no worries! You can work directly with the SpeechStream
iterable using a for-await
loop to handle those transcription events. Here’s a quick example to get you started:
const transcribeTrack = async (track) => {
const audioStream = new AudioStream(track);
const sttStream = new STT({ sampleRate: 48000 }).stream();
const sendTask = async () => {
for await (const event of audioStream) {
sttStream.pushFrame(event);
}
};
const recvTask = async () => {
for await (const event of sttStream) {
if (event.type === SpeechEventType.FINAL_TRANSCRIPT) {
console.log(event.alternatives![0].text);
}
}
};
await Promise.all([sendTask(), recvTask()]);
};
Explanation:
• Iterable Usage: Instead of sttStream.on
, we use a for-await
loop to go through the SpeechStream
.
• Event Handling: This loop listens for transcription events like FINAL_TRANSCRIPT
and handles them as they come in.
• Cleanup: Don’t forget to close the stream when you’re done to keep things tidy and free up resources.
If anything’s unclear or you want to dive deeper, just let me know—I’m here to help!
Sources: agents/src/multimodal/multimodal_agent.ts | SpeechEventType | LiveKit Agentsboundless-battery-65494
08/12/2025, 5:48 PMtall-airline-16756
08/12/2025, 5:49 PMtranscribeTrack
method. Usually, this happens inside a LiveKit room where you're managing participant tracks.
Here’s a friendly example of where to call it:
room.on(RoomEvent.TrackSubscribed, (track, publication, participant) => {
if (track.kind === Track.Kind.Audio) {
// Give transcribeTrack a call for the audio track
transcribeTrack(track).catch((err) => {
console.error('Oops, something went wrong transcribing the track:', err);
});
}
});
Quick rundown:
• Track Subscribed Event: This event fires when you subscribe to a participant’s track in the room.
• Audio Track Check: Make sure the track is an Audio
track before you try to transcribe it.
• Error Handling: Wrapping it in a catch
block helps you catch any hiccups during transcription.
Feel free to reach out if you want me to walk you through anything else!boundless-battery-65494
08/12/2025, 5:49 PMtall-airline-16756
08/12/2025, 5:50 PMVoicePipelineAgent
along with the TextAudioSynchronizer
to process and forward your transcription data smoothly. Here’s a friendly example to get you started:
import {
defineAgent,
pipeline,
} from '@livekit/agents';
import { SipClient } from 'livekit-server-sdk';
import { TextAudioSynchronizer } from './transcription';
import * as deepgram from '@livekit/agents-plugin-deepgram';
import * as openai from '@livekit/agents-plugin-openai';
import * as elevenlabs from '@livekit/agents-plugin-elevenlabs';
export default defineAgent({
async entry(ctx) {
await ctx.connect();
const sipClient = new SipClient(
process.env.LIVEKIT_URL ?? '',
process.env.LIVEKIT_API_KEY,
process.env.LIVEKIT_API_SECRET,
);
// Initialize STT, LLM, and TTS
const stt = new deepgram.STT();
const llm = new openai.LLM();
const tts = new elevenlabs.TTS();
// Initialize the voice pipeline agent
const agent = new pipeline.VoicePipelineAgent(stt, llm, tts);
// Create a transcription forwarder
const forwarder = new TextAudioSynchronizer({
language: 'en-US',
speed: 1,
newSentenceDelay: 400,
});
await agent.start(ctx.room, participant);
// Handle transcription events
forwarder.on('textUpdated', (segment) => {
// Publish transcription to client
ctx.room.localParticipant?.publishTranscription({
participantIdentity: 'agent',
trackSid: 'main-audio-track',
segments: [
{
text: segment.text,
final: segment.final,
id: segment.id,
startTime: segment.startTime,
endTime: segment.endTime,
language: 'en-US',
},
],
});
});
// Start the call
await sipClient.createSipParticipant(trunkId, phoneNumber, roomName, options);
}
});
Just a few key things to keep in mind:
• The VoicePipelineAgent
is your go-to for handling transcription, TTS, and STT all in one place.
• TextAudioSynchronizer
helps keep your audio frames perfectly in sync with the transcription text.
• Make sure transcription is enabled in your agent’s configuration so everything runs smoothly.
If you have any questions or want me to walk you through any part of this setup, just let me know—I’m here to help!
Sources: Transcriptions | LiveKit Docs | agents/src/multimodal/multimodal_agent.ts | TranscriptionForwarder | LiveKit Agents | BasicTranscriptionForwarder | LiveKit Agentsboundless-battery-65494
08/12/2025, 6:12 PMtall-airline-16756
08/12/2025, 6:13 PMTextAudioSynchronizer
and the forwarder.on('textUpdated')
event. You can grab the transcription text from the segment.text
property inside the event handler.
Here’s the key part of the code:
forwarder.on('textUpdated', (segment) => {
// Access the transcription text here
console.log('Transcription:', segment.text);
// Publish transcription to client
ctx.room.localParticipant?.publishTranscription({
participantIdentity: 'agent',
trackSid: 'main-audio-track',
segments: [
{
text: segment.text,
final: segment.final,
id: segment.id,
startTime: segment.startTime,
endTime: segment.endTime,
language: 'en-US',
},
],
});
});
Here’s a quick rundown:
• Accessing Transcripts: The transcription text lives in segment.text
.
• Final vs Interim: You can check segment.final
to see if the transcription is complete or still in progress.
• Publishing: The transcription gets sent to the client using publishTranscription
.
If you want me to walk you through storing or showing the transcripts, just let me know—I’m happy to help!boundless-battery-65494
08/12/2025, 6:15 PMtall-airline-16756
08/12/2025, 6:15 PMTextAudioSynchronizer
class is part of LiveKit's agents library, but if you don’t see it in your setup, no worries—you can create your own custom solution to sync transcription with audio. Here’s a simple way to get started:
1. Implement a Custom Synchronizer:
class CustomTextSynchronizer {
constructor() {
this.audioQueue = [];
this.textQueue = [];
}
pushText(text) {
this.textQueue.push(text);
}
pushAudio(frame) {
this.audioQueue.push(frame);
}
async play() {
while (this.textQueue.length > 0 && this.audioQueue.length > 0) {
const textSegment = this.textQueue.shift();
const audioSegment = this.audioQueue.shift();
console.log('Playing audio segment');
console.log('Text:', textSegment);
}
}
}
2. Use the Custom Synchronizer:
const synchronizer = new CustomTextSynchronizer();
synchronizer.pushText('Hello, this is a test transcription.');
synchronizer.pushAudio(audioFrame); // Replace with actual audio frame
synchronizer.play();
3. A Few Things to Keep in Mind:
• Manual Synchronization: You’ll need to line up the text and audio frames yourself.
• Event Handling: It’s a good idea to add custom events to let you know when text updates happen.
• Integration: Make sure this synchronizer fits smoothly into your LiveKit pipeline for outbound calls.
Feel free to reach out if you want me to help you build this out in more detail or if you have any questions—I’m here to help!