how can i use transcription feature with node js LiveKit Community #ask-ai

Join Slack

how can i use transcription feature with node js?

# ask-ai

lemon-elephant-62047

08/15/2025, 4:48 PM

how can i use transcription feature with node js?

tall-airline-16756

08/15/2025, 4:48 PM

Thanks for reaching out, Odd. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

08/15/2025, 4:49 PM

Hey there! If you want to use the transcription feature with Node.js in LiveKit, here’s a friendly step-by-step guide to help you get started: 1. Set Up the STT Agent: First, create an agent that listens for audio tracks and connects with a speech-to-text service like Deepgram. You’ll want to use the

TextAudioSynchronizer

class to keep your audio and text in sync, and implement the

TranscriptionForwarder

interface to handle the transcription data smoothly. 2. Initialize the Transcription Service: Here’s a quick snippet to get your STT service up and running:

const sttStream = new STT({ sampleRate: 48000 }).stream();

3. Process Audio and Text: Pipe your audio stream into the STT service, and use the

pushAudio

and

pushText

methods from

TextAudioSynchronizer

to feed in the data. 4. Handle Transcription Events: Keep an ear out for transcription events from the STT service and let

TextAudioSynchronizer

handle updating the text in real-time. 5. Publish Transcriptions: Finally, implement the

TranscriptionForwarder

to share those transcription segments with everyone in the room. Here’s an example code snippet to give you a clearer picture:

Copy code

const { STT } = require('@livekit/agents-plugin-deepgram');
const { defineAgent, cli } = require('@livekit/agents');
const { Track, RoomEvent, TrackKind } = require('@livekit/rtc-node');
const { AudioStream } = require('@livekit/rtc-node');

export default defineAgent({
  entry: async (ctx) => {
    await ctx.connect();
    console.log('starting STT example agent');

    const transcribeTrack = async (track) => {
      const audioStream = new AudioStream(track);
      const sttStream = new STT({ sampleRate: 48000 }).stream();

      const sendTask = async () => {
        for await (const event of audioStream) {
          sttStream.pushFrame(event);
        }
      };

      const recvTask = async () => {
        for await (const event of sttStream) {
          if (event.type === STT.SpeechEventType.FINAL_TRANSCRIPT) {
            console.log(event.alternatives[0].text);
          }
        }
      };

      await Promise.all([sendTask(), recvTask()]);
    };

    ctx.room.on(RoomEvent.TrackSubscribed, async (track) => {
      if (track.kind === TrackKind.KIND_AUDIO) {
        transcribeTrack(track);
      }
    });
  },
});

cli.runApp(new WorkerOptions({ agent: fileURLToPath(import.meta.url) }));

And if you’re working on the frontend, React hooks like

useTrackTranscription

are super handy for listening to transcription events and showing them on screen:

Copy code

const { useTrackTranscription } = require('@livekit/components-react');

function TranscriptionDisplay() {
  const { transcriptions } = useTrackTranscription();

  return (
    
      {Object.values(transcriptions).map((segment) => (
        {segment.text}

      ))}
    
  );
}

Just make sure you’ve got all the right dependencies installed and your environment variables set up with your API keys and service URLs. If you want to dive deeper, the LiveKit Documentation is a fantastic resource. Happy coding!

lemon-elephant-62047

08/15/2025, 5:01 PM

will i get the pure text or the raw audio by doing this?

tall-airline-16756

08/15/2025, 5:01 PM

Just a heads-up: the transcription feature in LiveKit gives you pure text, not raw audio. The transcriptions come through in real-time as text streams that sync up nicely with the audio playback. If you're looking for raw audio, you'll want to use LiveKit’s audio streaming features separately, since the transcription system is all about delivering text. Feel free to reach out if you want a hand with managing both audio and transcription streams together—I’m here to help! Sources: Text and transcriptions | LiveKit Docs | agents/src/multimodal/multimodal_agent.ts

lemon-elephant-62047

08/15/2025, 5:08 PM

provide me the code to receive the transcription of the user's speech

tall-airline-16756

08/15/2025, 5:08 PM

Here’s a friendly example code snippet to help you receive user transcriptions in LiveKit using Node.js:

Copy code

// Import necessary modules
import { type JobContext, WorkerOptions, cli, defineAgent, stt } from '@livekit/agents';
import { STT } from '@livekit/agents-plugin-deepgram';
import type { Track } from '@livekit/rtc-node';
import { AudioStream, RoomEvent, TrackKind } from '@livekit/rtc-node';
import { fileURLToPath } from 'node:url';

export default defineAgent({
  entry: async (ctx: JobContext) => {
    await ctx.connect();
    console.log('starting STT example agent');

    const transcribeTrack = async (track: Track) => {
      const audioStream = new AudioStream(track);
      const sttStream = new STT({ sampleRate: 48000 }).stream();

      const sendTask = async () => {
        for await (const event of audioStream) {
          sttStream.pushFrame(event);
        }
      };

      const recvTask = async () => {
        for await (const event of sttStream) {
          if (event.type === stt.SpeechEventType.FINAL_TRANSCRIPT) {
            console.log(event.alternatives![0].text);
          }
        }
      };

      Promise.all([sendTask(), recvTask()]);
    };

    ctx.room.on(RoomEvent.TrackSubscribed, async (track: Track) => {
      if (track.kind === TrackKind.KIND_AUDIO) {
        transcribeTrack(track);
      }
    });
  },
});

cli.runApp(new WorkerOptions({ agent: fileURLToPath(import.meta.url) }));

This snippet listens for audio tracks, processes them with the Deepgram STT plugin, and logs the transcriptions as they happen. Just make sure you have all the right dependencies installed and a LiveKit server up and running. If you hit any bumps or want to chat more about it, I’m here to help!

lemon-elephant-62047

08/15/2025, 5:10 PM

i am using the livekit cloud server, so provide the code for that case

tall-airline-16756

08/15/2025, 5:10 PM

Here’s the updated code example for receiving user transcriptions using the LiveKit Cloud server with Node.js:

Copy code

// Import necessary modules
import { type JobContext, WorkerOptions, defineAgent, stt } from '@livekit/agents';
import { STT } from '@livekit/agents-plugin-deepgram';
import type { Track } from '@livekit/rtc-node';
import { AudioStream, RoomEvent, TrackKind } from '@livekit/rtc-node';
import { fileURLToPath } from 'node:url';

export default defineAgent({
  entry: async (ctx: JobContext) => {
    await ctx.connect();
    console.log('starting STT example agent');

    const transcribeTrack = async (track: Track) => {
      const audioStream = new AudioStream(track);
      const sttStream = new STT({ sampleRate: 48000 }).stream();

      const sendTask = async () => {
        for await (const event of audioStream) {
          sttStream.pushFrame(event);
        }
      };

      const recvTask = async () => {
        for await (const event of sttStream) {
          if (event.type === stt.SpeechEventType.FINAL_TRANSCRIPT) {
            console.log(event.alternatives![0].text);
          }
        }
      };

      Promise.all([sendTask(), recvTask()]);
    };

    ctx.room.on(RoomEvent.TrackSubscribed, async (track: Track) => {
      if (track.kind === TrackKind.KIND_AUDIO) {
        transcribeTrack(track);
      }
    });
  },
});

cli.runApp(new WorkerOptions({ agent: fileURLToPath(import.meta.url) }));

This code connects to the LiveKit Cloud server, listens for audio tracks, and uses the Deepgram STT plugin to process them. You’ll see the transcription results logged right in the console. Just make sure you’ve set up your LiveKit Cloud server credentials and environment variables for authentication. If you hit any bumps or want to chat more about it, I’m here to help!

tall-airline-16756

08/15/2025, 5:19 PM

To help you successfully receive transcriptions using

RoomEvent.TranscriptionReceived

, here’s an updated way to modify your code:

Copy code

import { useEffect, useState } from "react";
import {
  useMaybeRoomContext,
  TranscriptionSegment,
  RoomEvent,
  TrackPublication,
  Participant,
} from "livekit-client";
import { useMaybeRoomContext } from "@livekit/components-react";

export default function TranscriptionComponent() {
  const room = useMaybeRoomContext();
  const [transcriptions, setTranscriptions] = useState<
    { [id: string]: TranscriptionSegment }
  >({});

  useEffect(() => {
    if (!room) {
      return;
    }

    const updateTranscriptions = (
      segments: TranscriptionSegment[],
      participant?: Participant,
      publication?: TrackPublication,
    ) => {
      setTranscriptions((prev) => {
        const newTranscriptions = { ...prev };
        for (const segment of segments) {
          newTranscriptions[segment.id] = segment;
        }
        return newTranscriptions;
      });
    };

    room.on(RoomEvent.TranscriptionReceived, updateTranscriptions);
    return () => {
      room.off(RoomEvent.TranscriptionReceived, updateTranscriptions);
    };
  }, [room]);

  return (
    
  );
}

Key Changes: • We added proper state management with

useState

to keep track of transcription segments. • The event listener now updates the state whenever new transcriptions come in. • Transcriptions are nicely sorted and displayed in a list by the time they were received. Just make sure your room is set up and connected before using this component. If you hit any snags or want to chat more about it, I’m here to help!

lemon-elephant-62047

08/15/2025, 5:58 PM

provide me the simple code which provides the transcription of all the joined user's speech as i want to use them to build a live caption functionality

tall-airline-16756

08/15/2025, 5:58 PM

Here’s a simple React component that helps you provide live captions by transcribing the speech of all users who join:

Copy code

import { useEffect, useState } from "react";
import {
  TranscriptionSegment,
  Participant,
  TrackPublication,
  RoomEvent,
} from "livekit-client";
import { useMaybeRoomContext } from "@livekit/components-react";

export default function LiveCaptions() {
  const room = useMaybeRoomContext();
  const [transcriptions, setTranscriptions] = useState<{
    [id: string]: TranscriptionSegment;
  }>({});

  useEffect(() => {
    if (!room) {
      return;
    }

    const updateTranscriptions = (
      segments: TranscriptionSegment[],
      participant?: Participant,
      publication?: TrackPublication
    ) => {
      setTranscriptions((prev) => {
        const newTranscriptions = { ...prev };
        for (const segment of segments) {
          newTranscriptions[segment.id] = segment;
        }
        return newTranscriptions;
      });
    };

    room.on(RoomEvent.TranscriptionReceived, updateTranscriptions);
    return () => {
      room.off(RoomEvent.TranscriptionReceived, updateTranscriptions);
    };
  }, [room]);

  return (
    
  );
}

How it works: • It listens for

RoomEvent.TranscriptionReceived

to grab transcriptions as they happen. • Keeps those transcription segments in state and updates them on the fly. • Shows the transcriptions in a neat, time-sorted list. Just make sure your LiveKit server is set up with transcription agents to get this running smoothly. If you hit any bumps or want to chat more about it, I’m here to help!

lemon-elephant-62047

08/15/2025, 5:59 PM

what do you mean by transcription agents

tall-airline-16756

08/15/2025, 6:00 PM

Transcription agents in LiveKit are pretty cool tools that work behind the scenes to turn speech into text in real-time. They listen to audio streams from LiveKit rooms and use external speech-to-text (STT) services like Deepgram, Google Speech-to-Text, or Groq to get the job done, then send those transcriptions right back to you as it happens. Key Features: • They tune into audio tracks in a LiveKit room and handle the transcription process. • Transcriptions come through as text segments that you can display or use however you like. • They’re super flexible, working with multiple STT services through plugins, so you can customize them to fit your needs. • Plus, they support cool extras like synchronized transcription forwarding, multilingual translation, and even AI-powered summarization or analysis. Use Cases: • Live captions during meetings or conferences to keep everyone on the same page. • Making video calls more accessible for everyone. • Transcribing and translating in multiple languages for a global crowd. • Teaming up with AI to get even smarter insights from your transcriptions. Getting started is easy—just make sure transcription agents are set up on your LiveKit server and connected to your room. If you want, I’m here to help you get them up and running or to integrate them into your app! Sources: Transcription Applications | livekit/livekit_composite | DeepWiki | Agents Overview | LiveKit Docs | Transcription Applications | livekit/livekit_composite | DeepWiki | Transcriptions | LiveKit Docs | Text and transcriptions | LiveKit Docs

lemon-elephant-62047

08/15/2025, 6:01 PM

yes provide me the code of this type of agent

tall-airline-16756

08/15/2025, 6:01 PM

Here’s a handy code example for a transcription agent in LiveKit using Node.js with Deepgram STT integration:

Copy code

import { type JobContext, WorkerOptions, cli, defineAgent } from '@livekit/agents';
import { STT } from '@livekit/agents-plugin-deepgram';
import type { Track } from '@livekit/rtc-node';
import { AudioStream, RoomEvent, TrackKind } from '@livekit/rtc-node';

export default defineAgent({
  entry: async (ctx: JobContext) => {
    await ctx.connect();
    console.log('starting STT example agent');

    const transcribeTrack = async (track: Track) => {
      const audioStream = new AudioStream(track);
      const sttStream = new STT({ sampleRate: 48000 }).stream();

      const sendTask = async () => {
        for await (const event of audioStream) {
          sttStream.pushFrame(event);
        }
      };

      const recvTask = async () => {
        for await (const event of sttStream) {
          if (event.type === STT.SpeechEventType.FINAL_TRANSCRIPT) {
            console.log(event.alternatives![0].text);
          }
        }
      };

      Promise.all([sendTask(), recvTask()]);
    };

    ctx.room.on(RoomEvent.TrackSubscribed, async (track: Track) => {
      if (track.kind === TrackKind.KIND_AUDIO) {
        transcribeTrack(track);
      }
    });
  },
});

cli.runApp(new WorkerOptions({ 
  agent: fileURLToPath(import.meta.url) 
}));

How it works: • Connects to a LiveKit room and listens for audio tracks. • Uses Deepgram STT to transcribe the audio in real-time. • Prints the transcription results right to your console. Setup Instructions: 1. Install the necessary packages:

@livekit/agents

@livekit/agents-plugin-deepgram

, and

@livekit/rtc-node

. 2. Make sure your Deepgram API credentials are set up correctly. 3. Create a LiveKit room and start the agent. If you have any questions or want a hand getting this up and running, just let me know—I’m here to help! Sources: agents/src/pipeline/pipeline_agent.ts

lemon-elephant-62047

08/16/2025, 5:50 AM

give me the package.json for the installation of the provided packages by you

tall-airline-16756

08/16/2025, 5:51 AM

Here’s a handy sample

package.json

file to get you started with installing the necessary packages:

Copy code

{
  "name": "my-livekit-agent",
  "version": "1.0.0",
  "type": "module",
  "scripts": {
    "build": "tsc",
    "start": "node dist/index.js start",
    "dev": "tsx src/index.ts dev"
  },
  "dependencies": {
    "@livekit/agents": "^0.7.9",
    "@livekit/agents-plugin-deepgram": "^0.5.6",
    "@livekit/rtc-node": "^0.13.18"
  },
  "devDependencies": {
    "@types/node": "^22.5.5",
    "tsx": "^4.19.2",
    "typescript": "^5.0.0"
  }
}

Installation Instructions: 1. Pop the above content into a

package.json

file. 2. Run

npm install

to get all the dependencies set up. If you hit any snags or want to chat more about this, just let me know—I’m here to help! Sources: pnpm-lock.yaml | packages/basics/package.json

lemon-elephant-62047

08/16/2025, 5:52 AM

E:\MeetAI\server>npm run start:agent

server@1.0.0 start:agent

node agents/transcriptionAgent.js

Usage: agents [options] [command] LiveKit Agents CLI Options: -V, --version output the version number --log-level <level> Set the logging level (choices: "trace", "debug", "info", "warn", "error", "fatal", default: "info", env: LOG_LEVEL) --url <string> LiveKit server or Cloud project websocket URL (env: LIVEKIT_URL) --api-key <string> LiveKit server or Cloud project's API key (env: LIVEKIT_API_KEY) --api-secret <string> LiveKit server or Cloud project's API secret (env: LIVEKIT_API_SECRET) -h, --help display help for command Commands: start Start the worker in production mode dev [options] Start the worker in development mode connect [options] Connect to a specific room got this when tried to start the agent

tall-airline-16756

08/16/2025, 5:52 AM

Hey there! It looks like the error is happening because some important details needed to connect to your LiveKit server or Cloud project are missing. No worries, you just need to provide a few things: • LiveKit URL: This is the WebSocket URL for your LiveKit server or Cloud project. • API Key: Your API key for the LiveKit server or Cloud project. • API Secret: The API secret for your LiveKit server or Cloud project. You can share these details as environment variables or directly in your command. Here’s an example:

Copy code

LIVEKIT_URL=<wss://your-livekit-server-url> \
LIVEKIT_API_KEY=your-api-key \
LIVEKIT_API_SECRET=your-api-secret \
npm run start:agent

Another option is to put these variables in a

.env

file and load them with a package like

dotenv

. If you want, I’m happy to help you set this up—just let me know!

lemon-elephant-62047

08/19/2025, 1:29 PM

latest npm version of livekit sdk, livekit agent

tall-airline-16756

08/19/2025, 1:31 PM

Here are the latest versions of the LiveKit packages, based on what’s available right now: • LiveKit Client SDK (
livekit-client
): 2.15.5 • LiveKit Server SDK (
livekit-server-sdk
): 2.13.2 • LiveKit Agents Framework (
@livekit/agents
): 0.7.0 If you want to double-check you’re using the most recent versions, you can run these commands on npm:

Copy code

npm view livekit-client version
npm view livekit-server-sdk version
npm view @livekit/agents version

Feel free to reach out if you need any more help with these packages—I’m here for you! Sources: package-lock.json | package-lock.json | frontend/package-lock.json | pnpm-lock.yaml | example/yarn.lock

lemon-elephant-62047

08/19/2025, 1:32 PM

how to disable the e2ee option in agent? using node

Open in Slack

Previous Next