how can i use transcription feature with node js?
# ask-ai
l
how can i use transcription feature with node js?
t
Thanks for reaching out, Odd. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
Hey there! If you want to use the transcription feature with Node.js in LiveKit, here’s a friendly step-by-step guide to help you get started: 1. Set Up the STT Agent: First, create an agent that listens for audio tracks and connects with a speech-to-text service like Deepgram. You’ll want to use the
TextAudioSynchronizer
class to keep your audio and text in sync, and implement the
TranscriptionForwarder
interface to handle the transcription data smoothly. 2. Initialize the Transcription Service: Here’s a quick snippet to get your STT service up and running:
const sttStream = new STT({ sampleRate: 48000 }).stream();
3. Process Audio and Text: Pipe your audio stream into the STT service, and use the
pushAudio
and
pushText
methods from
TextAudioSynchronizer
to feed in the data. 4. Handle Transcription Events: Keep an ear out for transcription events from the STT service and let
TextAudioSynchronizer
handle updating the text in real-time. 5. Publish Transcriptions: Finally, implement the
TranscriptionForwarder
to share those transcription segments with everyone in the room. Here’s an example code snippet to give you a clearer picture:
Copy code
const { STT } = require('@livekit/agents-plugin-deepgram');
const { defineAgent, cli } = require('@livekit/agents');
const { Track, RoomEvent, TrackKind } = require('@livekit/rtc-node');
const { AudioStream } = require('@livekit/rtc-node');

export default defineAgent({
  entry: async (ctx) => {
    await ctx.connect();
    console.log('starting STT example agent');

    const transcribeTrack = async (track) => {
      const audioStream = new AudioStream(track);
      const sttStream = new STT({ sampleRate: 48000 }).stream();

      const sendTask = async () => {
        for await (const event of audioStream) {
          sttStream.pushFrame(event);
        }
      };

      const recvTask = async () => {
        for await (const event of sttStream) {
          if (event.type === STT.SpeechEventType.FINAL_TRANSCRIPT) {
            console.log(event.alternatives[0].text);
          }
        }
      };

      await Promise.all([sendTask(), recvTask()]);
    };

    ctx.room.on(RoomEvent.TrackSubscribed, async (track) => {
      if (track.kind === TrackKind.KIND_AUDIO) {
        transcribeTrack(track);
      }
    });
  },
});

cli.runApp(new WorkerOptions({ agent: fileURLToPath(import.meta.url) }));
And if you’re working on the frontend, React hooks like
useTrackTranscription
are super handy for listening to transcription events and showing them on screen:
Copy code
const { useTrackTranscription } = require('@livekit/components-react');

function TranscriptionDisplay() {
  const { transcriptions } = useTrackTranscription();

  return (
    
      {Object.values(transcriptions).map((segment) => (
        {segment.text}

      ))}
    
  );
}
Just make sure you’ve got all the right dependencies installed and your environment variables set up with your API keys and service URLs. If you want to dive deeper, the LiveKit Documentation is a fantastic resource. Happy coding!
l
will i get the pure text or the raw audio by doing this?
t
Just a heads-up: the transcription feature in LiveKit gives you pure text, not raw audio. The transcriptions come through in real-time as text streams that sync up nicely with the audio playback. If you're looking for raw audio, you'll want to use LiveKit’s audio streaming features separately, since the transcription system is all about delivering text. Feel free to reach out if you want a hand with managing both audio and transcription streams together—I’m here to help! Sources: Text and transcriptions | LiveKit Docs | agents/src/multimodal/multimodal_agent.ts
l
provide me the code to receive the transcription of the user's speech
t
Here’s a friendly example code snippet to help you receive user transcriptions in LiveKit using Node.js:
Copy code
// Import necessary modules
import { type JobContext, WorkerOptions, cli, defineAgent, stt } from '@livekit/agents';
import { STT } from '@livekit/agents-plugin-deepgram';
import type { Track } from '@livekit/rtc-node';
import { AudioStream, RoomEvent, TrackKind } from '@livekit/rtc-node';
import { fileURLToPath } from 'node:url';

export default defineAgent({
  entry: async (ctx: JobContext) => {
    await ctx.connect();
    console.log('starting STT example agent');

    const transcribeTrack = async (track: Track) => {
      const audioStream = new AudioStream(track);
      const sttStream = new STT({ sampleRate: 48000 }).stream();

      const sendTask = async () => {
        for await (const event of audioStream) {
          sttStream.pushFrame(event);
        }
      };

      const recvTask = async () => {
        for await (const event of sttStream) {
          if (event.type === stt.SpeechEventType.FINAL_TRANSCRIPT) {
            console.log(event.alternatives![0].text);
          }
        }
      };

      Promise.all([sendTask(), recvTask()]);
    };

    ctx.room.on(RoomEvent.TrackSubscribed, async (track: Track) => {
      if (track.kind === TrackKind.KIND_AUDIO) {
        transcribeTrack(track);
      }
    });
  },
});

cli.runApp(new WorkerOptions({ agent: fileURLToPath(import.meta.url) }));
This snippet listens for audio tracks, processes them with the Deepgram STT plugin, and logs the transcriptions as they happen. Just make sure you have all the right dependencies installed and a LiveKit server up and running. If you hit any bumps or want to chat more about it, I’m here to help!
l
i am using the livekit cloud server, so provide the code for that case
t
Here’s the updated code example for receiving user transcriptions using the LiveKit Cloud server with Node.js:
Copy code
// Import necessary modules
import { type JobContext, WorkerOptions, defineAgent, stt } from '@livekit/agents';
import { STT } from '@livekit/agents-plugin-deepgram';
import type { Track } from '@livekit/rtc-node';
import { AudioStream, RoomEvent, TrackKind } from '@livekit/rtc-node';
import { fileURLToPath } from 'node:url';

export default defineAgent({
  entry: async (ctx: JobContext) => {
    await ctx.connect();
    console.log('starting STT example agent');

    const transcribeTrack = async (track: Track) => {
      const audioStream = new AudioStream(track);
      const sttStream = new STT({ sampleRate: 48000 }).stream();

      const sendTask = async () => {
        for await (const event of audioStream) {
          sttStream.pushFrame(event);
        }
      };

      const recvTask = async () => {
        for await (const event of sttStream) {
          if (event.type === stt.SpeechEventType.FINAL_TRANSCRIPT) {
            console.log(event.alternatives![0].text);
          }
        }
      };

      Promise.all([sendTask(), recvTask()]);
    };

    ctx.room.on(RoomEvent.TrackSubscribed, async (track: Track) => {
      if (track.kind === TrackKind.KIND_AUDIO) {
        transcribeTrack(track);
      }
    });
  },
});

cli.runApp(new WorkerOptions({ agent: fileURLToPath(import.meta.url) }));
This code connects to the LiveKit Cloud server, listens for audio tracks, and uses the Deepgram STT plugin to process them. You’ll see the transcription results logged right in the console. Just make sure you’ve set up your LiveKit Cloud server credentials and environment variables for authentication. If you hit any bumps or want to chat more about it, I’m here to help!
To help you successfully receive transcriptions using
RoomEvent.TranscriptionReceived
, here’s an updated way to modify your code:
Copy code
import { useEffect, useState } from "react";
import {
  useMaybeRoomContext,
  TranscriptionSegment,
  RoomEvent,
  TrackPublication,
  Participant,
} from "livekit-client";
import { useMaybeRoomContext } from "@livekit/components-react";

export default function TranscriptionComponent() {
  const room = useMaybeRoomContext();
  const [transcriptions, setTranscriptions] = useState<
    { [id: string]: TranscriptionSegment }
  >({});

  useEffect(() => {
    if (!room) {
      return;
    }

    const updateTranscriptions = (
      segments: TranscriptionSegment[],
      participant?: Participant,
      publication?: TrackPublication,
    ) => {
      setTranscriptions((prev) => {
        const newTranscriptions = { ...prev };
        for (const segment of segments) {
          newTranscriptions[segment.id] = segment;
        }
        return newTranscriptions;
      });
    };

    room.on(RoomEvent.TranscriptionReceived, updateTranscriptions);
    return () => {
      room.off(RoomEvent.TranscriptionReceived, updateTranscriptions);
    };
  }, [room]);

  return (
    
  );
}
Key Changes: • We added proper state management with
useState
to keep track of transcription segments. • The event listener now updates the state whenever new transcriptions come in. • Transcriptions are nicely sorted and displayed in a list by the time they were received. Just make sure your room is set up and connected before using this component. If you hit any snags or want to chat more about it, I’m here to help!
l
provide me the simple code which provides the transcription of all the joined user's speech as i want to use them to build a live caption functionality
t
Here’s a simple React component that helps you provide live captions by transcribing the speech of all users who join:
Copy code
import { useEffect, useState } from "react";
import {
  TranscriptionSegment,
  Participant,
  TrackPublication,
  RoomEvent,
} from "livekit-client";
import { useMaybeRoomContext } from "@livekit/components-react";

export default function LiveCaptions() {
  const room = useMaybeRoomContext();
  const [transcriptions, setTranscriptions] = useState<{
    [id: string]: TranscriptionSegment;
  }>({});

  useEffect(() => {
    if (!room) {
      return;
    }

    const updateTranscriptions = (
      segments: TranscriptionSegment[],
      participant?: Participant,
      publication?: TrackPublication
    ) => {
      setTranscriptions((prev) => {
        const newTranscriptions = { ...prev };
        for (const segment of segments) {
          newTranscriptions[segment.id] = segment;
        }
        return newTranscriptions;
      });
    };

    room.on(RoomEvent.TranscriptionReceived, updateTranscriptions);
    return () => {
      room.off(RoomEvent.TranscriptionReceived, updateTranscriptions);
    };
  }, [room]);

  return (
    
  );
}
How it works: • It listens for
RoomEvent.TranscriptionReceived
to grab transcriptions as they happen. • Keeps those transcription segments in state and updates them on the fly. • Shows the transcriptions in a neat, time-sorted list. Just make sure your LiveKit server is set up with transcription agents to get this running smoothly. If you hit any bumps or want to chat more about it, I’m here to help!
l
what do you mean by transcription agents
t
Transcription agents in LiveKit are pretty cool tools that work behind the scenes to turn speech into text in real-time. They listen to audio streams from LiveKit rooms and use external speech-to-text (STT) services like Deepgram, Google Speech-to-Text, or Groq to get the job done, then send those transcriptions right back to you as it happens. Key Features: • They tune into audio tracks in a LiveKit room and handle the transcription process. • Transcriptions come through as text segments that you can display or use however you like. • They’re super flexible, working with multiple STT services through plugins, so you can customize them to fit your needs. • Plus, they support cool extras like synchronized transcription forwarding, multilingual translation, and even AI-powered summarization or analysis. Use Cases: • Live captions during meetings or conferences to keep everyone on the same page. • Making video calls more accessible for everyone. • Transcribing and translating in multiple languages for a global crowd. • Teaming up with AI to get even smarter insights from your transcriptions. Getting started is easy—just make sure transcription agents are set up on your LiveKit server and connected to your room. If you want, I’m here to help you get them up and running or to integrate them into your app! Sources: Transcription Applications | livekit/livekit_composite | DeepWiki | Agents Overview | LiveKit Docs | Transcription Applications | livekit/livekit_composite | DeepWiki | Transcriptions | LiveKit Docs | Text and transcriptions | LiveKit Docs
l
yes provide me the code of this type of agent
t
Here’s a handy code example for a transcription agent in LiveKit using Node.js with Deepgram STT integration:
Copy code
import { type JobContext, WorkerOptions, cli, defineAgent } from '@livekit/agents';
import { STT } from '@livekit/agents-plugin-deepgram';
import type { Track } from '@livekit/rtc-node';
import { AudioStream, RoomEvent, TrackKind } from '@livekit/rtc-node';

export default defineAgent({
  entry: async (ctx: JobContext) => {
    await ctx.connect();
    console.log('starting STT example agent');

    const transcribeTrack = async (track: Track) => {
      const audioStream = new AudioStream(track);
      const sttStream = new STT({ sampleRate: 48000 }).stream();

      const sendTask = async () => {
        for await (const event of audioStream) {
          sttStream.pushFrame(event);
        }
      };

      const recvTask = async () => {
        for await (const event of sttStream) {
          if (event.type === STT.SpeechEventType.FINAL_TRANSCRIPT) {
            console.log(event.alternatives![0].text);
          }
        }
      };

      Promise.all([sendTask(), recvTask()]);
    };

    ctx.room.on(RoomEvent.TrackSubscribed, async (track: Track) => {
      if (track.kind === TrackKind.KIND_AUDIO) {
        transcribeTrack(track);
      }
    });
  },
});

cli.runApp(new WorkerOptions({ 
  agent: fileURLToPath(import.meta.url) 
}));
How it works: • Connects to a LiveKit room and listens for audio tracks. • Uses Deepgram STT to transcribe the audio in real-time. • Prints the transcription results right to your console. Setup Instructions: 1. Install the necessary packages:
@livekit/agents
,
@livekit/agents-plugin-deepgram
, and
@livekit/rtc-node
. 2. Make sure your Deepgram API credentials are set up correctly. 3. Create a LiveKit room and start the agent. If you have any questions or want a hand getting this up and running, just let me know—I’m here to help! Sources: agents/src/pipeline/pipeline_agent.ts
l
give me the package.json for the installation of the provided packages by you
t
Here’s a handy sample
package.json
file to get you started with installing the necessary packages:
Copy code
{
  "name": "my-livekit-agent",
  "version": "1.0.0",
  "type": "module",
  "scripts": {
    "build": "tsc",
    "start": "node dist/index.js start",
    "dev": "tsx src/index.ts dev"
  },
  "dependencies": {
    "@livekit/agents": "^0.7.9",
    "@livekit/agents-plugin-deepgram": "^0.5.6",
    "@livekit/rtc-node": "^0.13.18"
  },
  "devDependencies": {
    "@types/node": "^22.5.5",
    "tsx": "^4.19.2",
    "typescript": "^5.0.0"
  }
}
Installation Instructions: 1. Pop the above content into a
package.json
file. 2. Run
npm install
to get all the dependencies set up. If you hit any snags or want to chat more about this, just let me know—I’m here to help! Sources: pnpm-lock.yaml | packages/basics/package.json
l
E:\MeetAI\server>npm run start:agent
server@1.0.0 start:agent
node agents/transcriptionAgent.js
Usage: agents [options] [command] LiveKit Agents CLI Options: -V, --version output the version number --log-level <level> Set the logging level (choices: "trace", "debug", "info", "warn", "error", "fatal", default: "info", env: LOG_LEVEL) --url <string> LiveKit server or Cloud project websocket URL (env: LIVEKIT_URL) --api-key <string> LiveKit server or Cloud project's API key (env: LIVEKIT_API_KEY) --api-secret <string> LiveKit server or Cloud project's API secret (env: LIVEKIT_API_SECRET) -h, --help display help for command Commands: start Start the worker in production mode dev [options] Start the worker in development mode connect [options] Connect to a specific room got this when tried to start the agent
t
Hey there! It looks like the error is happening because some important details needed to connect to your LiveKit server or Cloud project are missing. No worries, you just need to provide a few things: • LiveKit URL: This is the WebSocket URL for your LiveKit server or Cloud project. • API Key: Your API key for the LiveKit server or Cloud project. • API Secret: The API secret for your LiveKit server or Cloud project. You can share these details as environment variables or directly in your command. Here’s an example:
Copy code
LIVEKIT_URL=<wss://your-livekit-server-url> \
LIVEKIT_API_KEY=your-api-key \
LIVEKIT_API_SECRET=your-api-secret \
npm run start:agent
Another option is to put these variables in a
.env
file and load them with a package like
dotenv
. If you want, I’m happy to help you set this up—just let me know!
l
latest npm version of livekit sdk, livekit agent
t
Here are the latest versions of the LiveKit packages, based on what’s available right now: • LiveKit Client SDK (
livekit-client
):
2.15.5 • LiveKit Server SDK (
livekit-server-sdk
):
2.13.2 • LiveKit Agents Framework (
@livekit/agents
):
0.7.0 If you want to double-check you’re using the most recent versions, you can run these commands on npm:
Copy code
npm view livekit-client version
npm view livekit-server-sdk version
npm view @livekit/agents version
Feel free to reach out if you need any more help with these packages—I’m here for you! Sources: package-lock.json | package-lock.json | frontend/package-lock.json | pnpm-lock.yaml | example/yarn.lock
l
how to disable the e2ee option in agent? using node