Hey how to add the livekit metrics & logs in n...
# ask-ai
b
Hey how to add the livekit metrics & logs in node js
t
Thanks for reaching out, MOHAN. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
Hey there! If you want to add LiveKit metrics and logs in Node.js, here’s a simple way to do it: 1. For Metrics Logging:
Copy code
logMetrics(metrics)
Just use the
logMetrics
function from the metrics module to log your metrics easily. 2. For General Logging:
Copy code
setLogExtension((level: LogLevel, msg: string, context: object) => {
    // Use your log provider's API to send this log message and context
});
This
setLogExtension
function lets you plug into the logging system effortlessly. The metrics system can track all sorts of useful info, like: • Speech-to-text (STT) metrics: audio duration and streaming status • LLM metrics: time to first token, input/output tokens • Text-to-speech (TTS) metrics: time to first byte, audio duration If you have any questions or want a hand with anything else, just let me know—I’m here to help! Sources: Capturing metrics | LiveKit Docs | logMetrics | LiveKit Agents | src/logger.ts | How to Send Logs from the JavaScript SDK to Providers like Datadog
b
Hey, How can I get the final trascripts from livekit its their any way to capture the trascripts ?
t
Hey there! To grab those final transcripts from LiveKit, here are a couple of friendly ways you can do it: 1. Frontend Integration: Keep an ear out for the
RoomEvent.TranscriptionReceived
event to catch transcription segments. Just peek at the
final
property on each segment to know when it’s all wrapped up.
Copy code
useEffect(() => {
    if (!room) {
        return;
    }
    const handleTranscription = (event: RoomEvent.TranscriptionReceived) => {
        const { transcriptionSegment } = event;
        if (transcriptionSegment.final) {
            // Here’s where you can handle the final transcript
            console.log('Final transcript:', transcriptionSegment.text);
        }
    };

    room.on(RoomEvent.TranscriptionReceived, handleTranscription);
    return () => {
        room.off(RoomEvent.TranscriptionReceived, handleTranscription);
    };
}, [room]);
2. Agent Integration: You can call the
flush()
method on the transcription output to send off the final transcription. This makes sure the transcription is published with
final=True
.
Copy code
async def flush(self) -> None:
    if self._participant_identity is None or self._track_id is None or not self._capturing:
        return

    self._flush_task = asyncio.create_task(
        self._publish_transcription(self._current_id, self._pushed_text, final=True)
    )
    self._reset_state()
Just a few quick tips: • Always check that
final
property so you’re working with the complete transcript. • Use
flush()
to wrap things up and send the transcription once you’ve got the text. • Make sure your backend is set up to forward transcriptions over to the frontend. If you want me to walk you through more examples or have any questions, just give me a shout—I’m here to help!
b
I am using livekit server in node.js Please provide me the code snippet for get the trascripts
t
Here’s a handy Node.js code snippet to help you grab transcripts from a LiveKit server:
Copy code
import {
  Room,
  RoomEvent,
  TranscriptionSegment,
  Participant,
  TrackPublication,
} from '@livekit/rtc-node';

// Initialize the LiveKit room
const room = new Room();

// Connect to the room (just swap in your actual token)
const token = 'your_room_token';
room.connect(token).then(() => {
  console.log('Connected to room');
});

// Listen for transcription events
room.on(RoomEvent.TranscriptionReceived, (segments, participant, publication) => {
  segments.forEach((segment) => {
    if (segment.final) {
      console.log(`Final transcription from ${participant?.identity}: ${segment.text}`);
      // Here’s where you can handle the final segment (like saving or displaying it)
    } else {
      console.log(`Interim transcription from ${participant?.identity}: ${segment.text}`);
      // You can also handle interim segments if you want
    }
  });
});

// Catch any connection errors
room.on('error', (err) => {
  console.error('Room connection error:', err);
});

// Disconnect from the room when you’re done
room.disconnect().then(() => {
  console.log('Disconnected from room');
});
Quick rundown:Initialization: Sets up a LiveKit room connection using your token. • Event Handling: Listens for
RoomEvent.TranscriptionReceived
to get transcription segments. • Segment Handling: Logs the transcription segments, giving special attention to the final ones. • Error Handling: Includes simple error handling for connection hiccups. If you want me to walk you through anything else or need more examples, just give me a shout!
b
do we have any other alternatives ?
t
Here are some alternatives to capture transcripts from a LiveKit server: 1. Using STTSegmentsForwarder with STT Plugins: You can use the
STTSegmentsForwarder
class with various STT plugins (like OpenAI, Google, Silero, Amazon Transcribe) to get transcriptions in real-time. This way, you can pick the STT service that works best for you. 2. LiveKit's Built-in Transcription Capabilities: LiveKit has built-in transcription support in its
VoicePipelineAgent
and
MultimodalAgent
classes. These agents can automatically send transcriptions to clients if you set it up on the frontend. 3. Transcription through Pipeline Nodes: You can also use LiveKit's pipeline nodes, like the
transcription_node
, to capture and process transcripts as they happen. 4. Session History: After the session ends, you can access the session history programmatically. The
session.history
property holds the whole conversation history, including transcripts.
Copy code
from datetime import datetime
import json
from livekit import agents

async def on_shutdown(cbx: agents.ShutdownCallback):
    history = cbx.session.history
    with open(f"transcripts_{datetime.now()}.json", "w") as f:
        json.dump(history.to_dict(), f)
5. Event-Based Transcription: You can listen to events like
conversation_item_added
or
user_input_transcribed
to grab transcripts as they come in.
Copy code
@session.on("conversation_item_added")
def on_conversation_item_added(ev: agentsConversationItemAddedEvent):
    if ev.item.type == agents.ConversationItemType.USER_INPUT:
        # Capture the transcribed user input if available
        ...
If you want, I’d be happy to walk you through any of these options or help you get started!
b
I want in node Js
t
Hey there! Here are some Node.js alternatives you can use to capture transcripts from a LiveKit server: 1. Using Deepgram STT Plugin: Deepgram offers real-time transcription that works really well. Here’s a quick example to get you started:
Copy code
import { STT } from '@livekit/agents-plugin-deepgram';

// Initialize STT with your API key
const stt = new STT({
  apiKey: 'your_deepgram_api_key',
  interimResults: true,
  sampleRate: 48000,
  endpointing: 25,
});

// Create a stream to process audio frames
const sttStream = stt.stream();

// Handle transcription events
sttStream.on(SpeechEventType.INTERIM_TRANSCRIPT, (event) => {
  console.log('Interim transcript:', event.alternatives[0].text);
});

sttStream.on(SpeechEventType.FINAL_TRANSCRIPT, (event) => {
  console.log('Final transcript:', event.alternatives[0].text);
});

// Start processing audio frames
const audioStream = new AudioStream(track);
for await (const frame of audioStream) {
  sttStream.pushFrame(frame);
}
2. AssemblyAI STT Plugin: This one’s pretty similar to Deepgram, offering real-time transcription with both interim and final results. You can tweak it to fit your needs. 3. Azure Speech Services: Another great option! Azure Speech Services supports Node.js and provides real-time transcription as well. 4. Custom STT Integration: If you want something more tailored, you can extend the
STT
class from the LiveKit Agents framework to plug in any STT service you prefer. Frontend Integration: To catch transcription events on your frontend, just listen for
RoomEvent.TranscriptionReceived
and update your UI with the segments you get. If you want me to walk you through any of these or need a specific example, just let me know—I’m happy to help! Sources: LiveKit integration guides | LiveKit Docs | agents/src/multimodal/multimodal_agent.ts
b
sttStream.on not found
t
If
sttStream.on
isn’t available, no worries! You can work directly with the
SpeechStream
iterable using a
for-await
loop to handle those transcription events. Here’s a quick example to get you started:
Copy code
const transcribeTrack = async (track) => {
  const audioStream = new AudioStream(track);
  const sttStream = new STT({ sampleRate: 48000 }).stream();

  const sendTask = async () => {
    for await (const event of audioStream) {
      sttStream.pushFrame(event);
    }
  };

  const recvTask = async () => {
    for await (const event of sttStream) {
      if (event.type === SpeechEventType.FINAL_TRANSCRIPT) {
        console.log(event.alternatives![0].text);
      }
    }
  };

  await Promise.all([sendTask(), recvTask()]);
};
Explanation:Iterable Usage: Instead of
sttStream.on
, we use a
for-await
loop to go through the
SpeechStream
. • Event Handling: This loop listens for transcription events like
FINAL_TRANSCRIPT
and handles them as they come in. • Cleanup: Don’t forget to close the stream when you’re done to keep things tidy and free up resources. If anything’s unclear or you want to dive deeper, just let me know—I’m here to help! Sources: agents/src/multimodal/multimodal_agent.ts | SpeechEventType | LiveKit Agents
b
where should i call this method
t
Once you have access to the audio track you want to transcribe, you can call the
transcribeTrack
method. Usually, this happens inside a LiveKit room where you're managing participant tracks. Here’s a friendly example of where to call it:
Copy code
room.on(RoomEvent.TrackSubscribed, (track, publication, participant) => {
  if (track.kind === Track.Kind.Audio) {
    // Give transcribeTrack a call for the audio track
    transcribeTrack(track).catch((err) => {
      console.error('Oops, something went wrong transcribing the track:', err);
    });
  }
});
Quick rundown:Track Subscribed Event: This event fires when you subscribe to a participant’s track in the room. • Audio Track Check: Make sure the track is an
Audio
track before you try to transcribe it. • Error Handling: Wrapping it in a
catch
block helps you catch any hiccups during transcription. Feel free to reach out if you want me to walk you through anything else!
b
but I am making outbound calls. will it work ?
t
When it comes to handling transcription in outbound calls using Node.js, a great way to do it is by using the
VoicePipelineAgent
along with the
TextAudioSynchronizer
to process and forward your transcription data smoothly. Here’s a friendly example to get you started:
Copy code
import {
  defineAgent,
  pipeline,
} from '@livekit/agents';
import { SipClient } from 'livekit-server-sdk';
import { TextAudioSynchronizer } from './transcription';
import * as deepgram from '@livekit/agents-plugin-deepgram';
import * as openai from '@livekit/agents-plugin-openai';
import * as elevenlabs from '@livekit/agents-plugin-elevenlabs';

export default defineAgent({
  async entry(ctx) {
    await ctx.connect();

    const sipClient = new SipClient(
      process.env.LIVEKIT_URL ?? '',
      process.env.LIVEKIT_API_KEY,
      process.env.LIVEKIT_API_SECRET,
    );

    // Initialize STT, LLM, and TTS
    const stt = new deepgram.STT();
    const llm = new openai.LLM();
    const tts = new elevenlabs.TTS();

    // Initialize the voice pipeline agent
    const agent = new pipeline.VoicePipelineAgent(stt, llm, tts);

    // Create a transcription forwarder
    const forwarder = new TextAudioSynchronizer({
      language: 'en-US',
      speed: 1,
      newSentenceDelay: 400,
    });

    await agent.start(ctx.room, participant);

    // Handle transcription events
    forwarder.on('textUpdated', (segment) => {
      // Publish transcription to client
      ctx.room.localParticipant?.publishTranscription({
        participantIdentity: 'agent',
        trackSid: 'main-audio-track',
        segments: [
          {
            text: segment.text,
            final: segment.final,
            id: segment.id,
            startTime: segment.startTime,
            endTime: segment.endTime,
            language: 'en-US',
          },
        ],
      });
    });

    // Start the call
    await sipClient.createSipParticipant(trunkId, phoneNumber, roomName, options);
  }
});
Just a few key things to keep in mind: • The
VoicePipelineAgent
is your go-to for handling transcription, TTS, and STT all in one place. •
TextAudioSynchronizer
helps keep your audio frames perfectly in sync with the transcription text. • Make sure transcription is enabled in your agent’s configuration so everything runs smoothly. If you have any questions or want me to walk you through any part of this setup, just let me know—I’m here to help! Sources: Transcriptions | LiveKit Docs | agents/src/multimodal/multimodal_agent.ts | TranscriptionForwarder | LiveKit Agents | BasicTranscriptionForwarder | LiveKit Agents
b
where can i get the trascripts in your code snippet ?
t
Hey there! In the code snippet you shared, the transcripts are managed and published through the
TextAudioSynchronizer
and the
forwarder.on('textUpdated')
event. You can grab the transcription text from the
segment.text
property inside the event handler. Here’s the key part of the code:
Copy code
forwarder.on('textUpdated', (segment) => {
  // Access the transcription text here
  console.log('Transcription:', segment.text);

  // Publish transcription to client
  ctx.room.localParticipant?.publishTranscription({
    participantIdentity: 'agent',
    trackSid: 'main-audio-track',
    segments: [
      {
        text: segment.text,
        final: segment.final,
        id: segment.id,
        startTime: segment.startTime,
        endTime: segment.endTime,
        language: 'en-US',
      },
    ],
  });
});
Here’s a quick rundown:Accessing Transcripts: The transcription text lives in
segment.text
. • Final vs Interim: You can check
segment.final
to see if the transcription is complete or still in progress. • Publishing: The transcription gets sent to the client using
publishTranscription
. If you want me to walk you through storing or showing the transcripts, just let me know—I’m happy to help!
b
TextAudioSynchronizer I didnt find this package
t
Hey there! The
TextAudioSynchronizer
class is part of LiveKit's agents library, but if you don’t see it in your setup, no worries—you can create your own custom solution to sync transcription with audio. Here’s a simple way to get started: 1. Implement a Custom Synchronizer:
Copy code
class CustomTextSynchronizer {
  constructor() {
    this.audioQueue = [];
    this.textQueue = [];
  }

  pushText(text) {
    this.textQueue.push(text);
  }

  pushAudio(frame) {
    this.audioQueue.push(frame);
  }

  async play() {
    while (this.textQueue.length > 0 && this.audioQueue.length > 0) {
      const textSegment = this.textQueue.shift();
      const audioSegment = this.audioQueue.shift();

      console.log('Playing audio segment');
      console.log('Text:', textSegment);
    }
  }
}
2. Use the Custom Synchronizer:
Copy code
const synchronizer = new CustomTextSynchronizer();
synchronizer.pushText('Hello, this is a test transcription.');
synchronizer.pushAudio(audioFrame); // Replace with actual audio frame
synchronizer.play();
3. A Few Things to Keep in Mind:Manual Synchronization: You’ll need to line up the text and audio frames yourself. • Event Handling: It’s a good idea to add custom events to let you know when text updates happen. • Integration: Make sure this synchronizer fits smoothly into your LiveKit pipeline for outbound calls. Feel free to reach out if you want me to help you build this out in more detail or if you have any questions—I’m here to help!