how to display stt and tts transcriptions on frontend LiveKit Community #ask-ai

Join Slack

how to display stt and tts transcriptions on front...

# ask-ai

stocky-tailor-95498

08/13/2025, 9:02 AM

how to display stt and tts transcriptions on frontend

tall-airline-16756

08/13/2025, 9:02 AM

Thanks for reaching out, Nakshatra. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.

tall-airline-16756

08/13/2025, 9:03 AM

Hey there! If you want to show STT and TTS transcriptions on the frontend using LiveKit, here’s a friendly step-by-step guide to help you out: • Frontend Integration: ◦ First, use the LiveKit SDK to listen for

RoomEvent.TranscriptionReceived

events. These events give you transcription segments as they come in. ◦ Next, keep those transcription segments in your state and display them in your UI. Here’s a simple example using React to get you started: import { useEffect, useState } from "react"; import { TranscriptionSegment, Participant, TrackPublication, RoomEvent, } from "livekit-client"; import { useMaybeRoomContext } from "@livekit/components-react"; export default function Transcriptions() { const room = useMaybeRoomContext(); const [transcriptions, setTranscriptions] = useState<{ [id: string]: TranscriptionSegment }>({}); useEffect(() => { if (!room) return; const updateTranscriptions = ( segments: TranscriptionSegment[], participant?: Participant, publication?: TrackPublication, ) => { setTranscriptions(prev => { const newTranscriptions = { ...prev }; for (const segment of segments) { newTranscriptions[segment.id] = segment; } return newTranscriptions; }); }; room.on(RoomEvent.TranscriptionReceived, updateTranscriptions); return () => room.off(RoomEvent.TranscriptionReceived, updateTranscriptions); }, [room]); return ( <ul> {Object.values(transcriptions) .sort((a, b) => a.firstReceivedTime - b.firstReceivedTime) .map(segment => ( <li key="{segment.id}">{segment.text} ))} </ul> ); }*Agent Integration:*from livekit.agents import stt, transcription from livekit.plugins.deepgram import STT async def _forward_transcription( stt_stream: stt.SpeechStream, stt_forwarder: transcription.STTSegmentsForwarder, ): """Forward the transcription and log the transcript in the console""" async for ev in stt_stream: stt_forwarder.update(ev) if ev.type == stt.SpeechEventType.INTERIM_TRANSCRIPT: print(ev.alternatives[0].text, end="") elif ev.type == stt.SpeechEventType.FINAL_TRANSCRIPT: print("\n") print(" -> ", ev.alternatives[0].text)Just a few things to keep in mind:Transcriptions come in segments and might get updated as processing continues.You can check the

final

property on a

TranscriptionSegment

to know when a segment is fully done.Make sure your agent is set up to forward transcriptions for everything to work smoothly.If you want to dive deeper or see full examples, don’t forget to check out the LiveKit Examples — they have complete transcription frontends and agents ready to explore. Sources: Transcriptions | LiveKit Docs

stocky-tailor-95498

08/13/2025, 9:05 AM

and this goes in page.tsx?

tall-airline-16756

08/13/2025, 9:06 AM

Absolutely! You can pop the provided React component into your

page.tsx

file or any other spot where you'd like those transcriptions to show up. Just a couple of things to keep in mind: • Make sure the

useMaybeRoomContext

hook is set up right and linked to your LiveKit room context. • Don’t forget to import all the necessary bits, like

livekit-client

and

@livekit/components-react

. • And the component should be rendered inside a React tree that has access to the LiveKit room context. If you want a hand fitting it into your project, just give me a shout—I’m happy to help!

stocky-tailor-95498

08/13/2025, 9:07 AM

i have an existing script, need to add transcriptions to it: 'use client'; import { LiveKitRoom, AudioConference, useRoomContext, StartAudio } from '@livekit/components-react'; import { useCallback, useEffect, useState } from 'react'; // Generates a random user ID for storage only (not sent to backend) function generateRandomUserId() { return `user-${Math.floor(Math.random() * 100000)}`; } export default function AudioExamplePage() { const [mounted, setMounted] = useState(false); const [userIdentity, setUserIdentity] = useState<string>(''); const endpoint = process.env.NEXT_PUBLIC_LK_TOKEN_ENDPOINT!; const [isStarting, setIsStarting] = useState(false); const [isInRoom, setIsInRoom] = useState(false); const [token, setToken] = useState<string | null>(null); const [serverUrlState, setServerUrlState] = useState<string | null>(null); const [roomNameState, setRoomNameState] = useState<string | null>(null); const [error, setError] = useState<string | null>(null); // Mounting logic and storing random identity locally useEffect(() => { setMounted(true); const stored = localStorage.getItem('lk-user-id'); if (stored) { setUserIdentity(stored); } else { const uid = generateRandomUserId(); localStorage.setItem('lk-user-id', uid); setUserIdentity(uid); } }, []); // Start session by calling second token generator endpoint const handleStart = useCallback(async () => { if (!endpoint) _return_; setError(null); setIsStarting(true); try { const res = await fetch(endpoint, { method: 'GET', cache: 'no-store' }); if (!res.ok) throw new Error(

Token fetch failed: ${res.status} ${res.statusText}

); const json = await res.json(); const t = json?.participantToken ?? json?.token ?? json?.accessToken; const serverUrlFromApi = json?.serverUrl; const roomFromApi = json?.roomName; if (typeof t !== 'string') throw new Error('Token not found in response.'); if (typeof serverUrlFromApi !== 'string') throw new Error('serverUrl not found in response.'); if (typeof roomFromApi !== 'string') throw new Error('roomName not found in response.'); setToken(t); setServerUrlState(serverUrlFromApi); setRoomNameState(roomFromApi); setIsInRoom(true); } catch (e: any) { setError(e.message || 'Failed to start.'); setToken(null); setServerUrlState(null); setRoomNameState(null); setIsInRoom(false); } finally { setIsStarting(false); } }, [endpoint]); const handleStoppedByChild = useCallback(() => { setIsInRoom(false); setToken(null); }, []); if (!mounted || !userIdentity) { return ( <div className="min-h-screen flex items-center justify-center"> <p role="status" aria-live="polite">Loading…</p> </div> ); } return ( <div data-lk-theme="default" style={{ height: '100vh', width: '100vw', display: 'flex', flexDirection: 'column', background: '#1A1A1A', color: 'white', }} > <header style={{ padding: '12px 16px', borderBottom: '1px solid #333', display: 'flex', alignItems: 'center', gap: 12 }}> <h1 style={{ fontSize: 22, margin: 0, flex: 1, fontWeight: 600 }}>Real-Time Voice Translation</h1> {!isInRoom ? ( <button type="button" onClick={handleStart} disabled={isStarting} style={{ padding: '10px 16px', borderRadius: 8, border: 'none', background: '#007AFF', color: 'white', fontWeight: 500, cursor: isStarting ? 'not-allowed' : 'pointer', transition: 'background-color 0.2s', }} aria-label="Start" > {isStarting ? 'Starting…' : 'Start Session'} </button> ) : null} </header> {error && ( <div role="alert" style={{ padding: '8px 16px', color: '#FF4D4D', background: '#442222' }}> {error} </div> )} {!isInRoom && ( <div style={{ flex: 1, display: 'grid', placeItems: 'center', padding: 16, textAlign: 'center' }}> <div> <h2 style={{ fontSize: 24, fontWeight: 600, marginBottom: 8 }}>Welcome</h2> <p style={{ opacity: 0.8, maxWidth: 400 }}>Click "Start Session" to begin real-time translation</p> </div> </div> )} {isInRoom && token && serverUrlState && ( <LiveKitRoom audio video={false} token={token} serverUrl={serverUrlState} style={{ flex: 1, display: 'flex', flexDirection: 'column' }} className="w-full h-full" onDisconnected={handleStoppedByChild} > <StartAudio label="Click to enable audio" /> <ContentGrid /> <div role="toolbar" aria-label="Call controls" style={{ padding: '0 16px 16px' }}> <div style={{ borderRadius: 12, background: 'rgba(0,0,0,0.2)', padding: 12, display: 'flex', justifyContent: 'center', alignItems: 'center', gap: 16 }}> <AudioConference /> <StopButton onStopped={handleStoppedByChild} /> </div> </div> </LiveKitRoom> )} </div> ); } function StopButton({ onStopped }: { onStopped: () => void }) { const room = useRoomContext(); const [stopping, setStopping] = useState(false); const onStop = useCallback(async () => { if (!room || stopping) _return_; setStopping(true); try { await room.disconnect(); } catch (err) { console.error('Error disconnecting:', err); } finally { setStopping(false); onStopped(); } }, [room, stopping, onStopped]); return ( <button type="button" onClick={onStop} disabled={stopping} style={{ padding: '10px 16px', borderRadius: 8, border: '1px solid #FF4D4D', background: '#FF4D4D', color: 'white', fontWeight: 500, cursor: stopping ? 'not-allowed' : 'pointer', transition: 'background-color 0.2s', }} aria-label="Stop" > {stopping ? 'Stopping…' : 'Stop Session'} </button> ); } function ContentGrid() { return ( <main id="main" role="main" style={{ flex: 1, display: 'grid', gridTemplateColumns: '1fr 1fr', gap: 24, padding: 24, overflowY: 'auto', }} > {/* English section */} <section aria-labelledby="source-heading" style={{ display: 'flex', flexDirection: 'column', background: '#2C2C2E', borderRadius: 12, padding: 20, overflow: 'hidden' }} > <h2 id="source-heading" style={{ fontSize: 20, margin: 0, marginBottom: 16, fontWeight: 600, color: '#E0E0E0' }} lang="en"> Source Language (English) </h2> <div className="sr-only" aria-live="polite" aria-atomic="false" /> <pre style={{ flex: 1, whiteSpace: 'pre-wrap', wordBreak: 'break-word', color: 'white', lineHeight: 1.6, fontFamily: 'monospace', fontSize: 16, background: '#1C1C1E', padding: 16, borderRadius: 8, overflowY: 'auto' }} aria-label="Live English transcript" lang="en" > Speak to start transcription… </pre> </section> {/* Hindi section */} <section aria-labelledby="target-heading" style={{ display: 'flex', flexDirection: 'column', background: '#2C2C2E', borderRadius: 12, padding: 20, overflow: 'hidden' }} > <h2 id="target-heading" style={{ fontSize: 20, margin: 0, marginBottom: 16, fontWeight: 600, color: '#E0E0E0' }} lang="hi"> Target Language (Hindi) </h2> <div className="sr-only" aria-live="assertive" aria-atomic="false" /> <pre style={{ flex: 1, whiteSpace: 'pre-wrap', wordBreak: 'break-word', color: 'white', lineHeight: 1.6, fontFamily: 'monospace', fontSize: 16, background: '#1C1C1E', padding: 16, borderRadius: 8, overflowY: 'auto' }} aria-label="Live Hindi translation" lang="hi" > Awaiting translation… </pre> </section> </main> ); }

tall-airline-16756

08/13/2025, 9:07 AM

Hey there! If you want to add transcriptions to your existing script, you can either weave the transcription logic right into the

ContentGrid

component or create a fresh component just for showing transcriptions. Here’s a friendly step-by-step on how you can tweak your script: 1. Import What You Need: First up, make sure you bring in the right classes and hooks to handle transcriptions:

import { RoomEvent, TranscriptionSegment } from 'livekit-client';

2. Set Up State for Transcriptions: Next, add a state variable in your

AudioExamplePage

component to keep track of transcription segments:

const [transcriptions, setTranscriptions] = useState<{ [id: string]: TranscriptionSegment }>({});

3. Listen for Transcription Events: Then, use the

RoomEvent.TranscriptionReceived

event to update your transcription state. Pop this inside a

useEffect

hook where you have access to the room context:

Copy code

useEffect(() => {
  const room = useRoomContext();
  if (!room) return;

  const handleTranscription = (segments: TranscriptionSegment[]) => {
    setTranscriptions(prev => {
      const updated = { ...prev };
      segments.forEach(segment => {
        updated[segment.id] = segment;
      });
      return updated;
    });
  };

  room.on(RoomEvent.TranscriptionReceived, handleTranscription);
  return () => room.off(RoomEvent.TranscriptionReceived, handleTranscription);
}, []);

4. Show Those Transcriptions: Finally, update your

ContentGrid

component so it displays the transcription segments nicely. Something like this:

Copy code

<section>
  <h2>Live Transcriptions</h2>
  <ul>
    {Object.values(transcriptions)
      .sort((a, b) => a.firstReceivedTime - b.firstReceivedTime)
      .map(segment => (
        <li key={segment.id}>{segment.text}</li>
      ))}
  </ul>
</section>

Give these changes a go, and you’ll see the transcriptions pop up in real-time. If you get stuck or want to chat more about it, just let me know—I’m here to help!

stocky-tailor-95498

08/13/2025, 9:34 AM

i did everything as follows: 'use client'; import { LiveKitRoom, AudioConference, useRoomContext, StartAudio } from '@livekit/components-react'; import { useCallback, useEffect, useState } from 'react'; import { RoomEvent, TranscriptionSegment } from 'livekit-client'; // Generates a random user ID for storage only (not sent to backend) function generateRandomUserId() { return `user-${Math.floor(Math.random() * 100000)}`; } // ✅ New component to listen for transcriptions inside LiveKit context function TranscriptionListener({ setTranscriptions }: { setTranscriptions: React.Dispatch<React.SetStateAction<{ [id: string]: TranscriptionSegment }>> }) { const room = useRoomContext(); useEffect(() => { if (!room) _return_; const handleTranscription = (segments: TranscriptionSegment[]) => { setTranscriptions(prev => { const updated = { ...prev }; segments.forEach(segment => { updated[segment.id] = segment; }); return updated; }); }; room.on(RoomEvent.TranscriptionReceived, handleTranscription); return () => { room.off(RoomEvent.TranscriptionReceived, handleTranscription); }; }, [room, setTranscriptions]); return null; // This component only listens; no UI } export default function AudioExamplePage() { const [mounted, setMounted] = useState(false); const [userIdentity, setUserIdentity] = useState<string>(''); const endpoint = process.env.NEXT_PUBLIC_LK_TOKEN_ENDPOINT!; const [isStarting, setIsStarting] = useState(false); const [isInRoom, setIsInRoom] = useState(false); const [token, setToken] = useState<string | null>(null); const [serverUrlState, setServerUrlState] = useState<string | null>(null); const [roomNameState, setRoomNameState] = useState<string | null>(null); const [error, setError] = useState<string | null>(null); const [transcriptions, setTranscriptions] = useState<{ [id: string]: TranscriptionSegment }>({}); // Mounting logic and storing random identity locally useEffect(() => { setMounted(true); const stored = localStorage.getItem('lk-user-id'); if (stored) { setUserIdentity(stored); } else { const uid = generateRandomUserId(); localStorage.setItem('lk-user-id', uid); setUserIdentity(uid); } }, []); // Start session by calling second token generator endpoint const handleStart = useCallback(async () => { if (!endpoint) _return_; setError(null); setIsStarting(true); try { const res = await fetch(endpoint, { method: 'GET', cache: 'no-store' }); if (!res.ok) throw new Error(

Token fetch failed: ${res.status} ${res.statusText}

); const json = await res.json(); const t = json?.participantToken ?? json?.token ?? json?.accessToken; const serverUrlFromApi = json?.serverUrl; const roomFromApi = json?.roomName; if (typeof t !== 'string') throw new Error('Token not found in response.'); if (typeof serverUrlFromApi !== 'string') throw new Error('serverUrl not found in response.'); if (typeof roomFromApi !== 'string') throw new Error('roomName not found in response.'); setToken(t); setServerUrlState(serverUrlFromApi); setRoomNameState(roomFromApi); setIsInRoom(true); } catch (e: any) { setError(e.message || 'Failed to start.'); setToken(null); setServerUrlState(null); setRoomNameState(null); setIsInRoom(false); } finally { setIsStarting(false); } }, [endpoint]); const handleStoppedByChild = useCallback(() => { setIsInRoom(false); setToken(null); }, []); if (!mounted || !userIdentity) { return ( <div className="min-h-screen flex items-center justify-center"> <p role="status" aria-live="polite">Loading…</p> </div> ); } return ( <div data-lk-theme="default" style={{ height: '100vh', width: '100vw', display: 'flex', flexDirection: 'column', background: '#1A1A1A', color: 'white', }} > <header style={{ padding: '12px 16px', borderBottom: '1px solid #333', display: 'flex', alignItems: 'center', gap: 12 }}> <h1 style={{ fontSize: 22, margin: 0, flex: 1, fontWeight: 600 }}>Real-Time Voice Translation</h1> {!isInRoom ? ( <button type="button" onClick={handleStart} disabled={isStarting} style={{ padding: '10px 16px', borderRadius: 8, border: 'none', background: '#007AFF', color: 'white', fontWeight: 500, cursor: isStarting ? 'not-allowed' : 'pointer', transition: 'background-color 0.2s', }} aria-label="Start" > {isStarting ? 'Starting…' : 'Start Session'} </button> ) : null} </header> {error && ( <div role="alert" style={{ padding: '8px 16px', color: '#FF4D4D', background: '#442222' }}> {error} </div> )} {!isInRoom && ( <div style={{ flex: 1, display: 'grid', placeItems: 'center', padding: 16, textAlign: 'center' }}> <div> <h2 style={{ fontSize: 24, fontWeight: 600, marginBottom: 8 }}>Welcome</h2> <p style={{ opacity: 0.8, maxWidth: 400 }}>Click "Start Session" to begin real-time translation</p> </div> </div> )} {isInRoom && token && serverUrlState && ( <LiveKitRoom audio video={false} token={token} serverUrl={serverUrlState} style={{ flex: 1, display: 'flex', flexDirection: 'column' }} className="w-full h-full" onDisconnected={handleStoppedByChild} > <StartAudio label="Click to enable audio" /> <TranscriptionListener setTranscriptions={setTranscriptions} /> <ContentGrid transcriptions={transcriptions} /> <div role="toolbar" aria-label="Call controls" style={{ padding: '0 16px 16px' }}> <div style={{ borderRadius: 12, background: 'rgba(0,0,0,0.2)', padding: 12, display: 'flex', justifyContent: 'center', alignItems: 'center', gap: 16 }}> <AudioConference /> <StopButton onStopped={handleStoppedByChild} /> </div> </div> </LiveKitRoom> )} </div> ); } function StopButton({ onStopped }: { onStopped: () => void }) { const room = useRoomContext(); const [stopping, setStopping] = useState(false); const onStop = useCallback(async () => { if (!room || stopping) _return_; setStopping(true); try { await room.disconnect(); } catch (err) { console.error('Error disconnecting:', err); } finally { setStopping(false); onStopped(); } }, [room, stopping, onStopped]); return ( <button type="button" onClick={onStop} disabled={stopping} style={{ padding: '10px 16px', borderRadius: 8, border: '1px solid #FF4D4D', background: '#FF4D4D', color: 'white', fontWeight: 500, cursor: stopping ? 'not-allowed' : 'pointer', transition: 'background-color 0.2s', }} aria-label="Stop" > {stopping ? 'Stopping…' : 'Stop Session'} </button> ); } function ContentGrid({ transcriptions }: { transcriptions: { [id: string]: TranscriptionSegment } }) { return ( <main id="main" role="main" style={{ flex: 1, display: 'grid', gridTemplateColumns: '1fr 1fr', gap: 24, padding: 24, overflowY: 'auto', }} > {/* English section */} <section aria-labelledby="source-heading" style={{ display: 'flex', flexDirection: 'column', background: '#2C2C2E', borderRadius: 12, padding: 20, overflow: 'hidden' }} > <h2 id="source-heading" style={{ fontSize: 20, margin: 0, marginBottom: 16, fontWeight: 600, color: '#E0E0E0' }} lang="en"> Source Language (English) </h2> <ul style={{ listStyle: 'none', padding: 0, margin: 0 }}> {Object.values(transcriptions) .sort((a, b) => a.firstReceivedTime - b.firstReceivedTime) .map(segment => ( <li key={segment.id} style={{ marginBottom: '8px' }}> {segment.text} </li> ))} </ul> </section> {/* Hindi section */} <section aria-labelledby="target-heading" style={{ display: 'flex', flexDirection: 'column', background: '#2C2C2E', borderRadius: 12, padding: 20, overflow: 'hidden' }} > <h2 id="target-heading" style={{ fontSize: 20, margin: 0, marginBottom: 16, fontWeight: 600, color: '#E0E0E0' }} lang="hi"> Target Language (Hindi) </h2> <pre style={{ flex: 1, whiteSpace: 'pre-wrap', wordBreak: 'break-word', color: 'white', lineHeight: 1.6, fontFamily: 'monospace', fontSize: 16, background: '#1C1C1E', padding: 16, borderRadius: 8, overflowY: 'auto' }} aria-label="Live Hindi translation" lang="hi" > Awaiting translation… </pre> </section> </main> ); } still can't see transcriptions on my page

tall-airline-16756

08/13/2025, 9:34 AM

Hey there! It seems like you’ve got the transcription logic set up pretty well, but let’s double-check a few things to make sure those transcriptions actually show up: 1. Verify Transcription Events: Make sure the

RoomEvent.TranscriptionReceived

event is firing. You could add a quick console log inside your

handleTranscription

function to see if transcription segments are coming through:

Copy code

const handleTranscription = (segments: TranscriptionSegment[]) => {
  console.log('Transcription segments received:', segments);
  setTranscriptions(prev => {
    const updated = { ...prev };
    segments.forEach(segment => {
      updated[segment.id] = segment;
    });
    return updated;
  });
};

2. Check Agent Configuration: Double-check that your LiveKit agent is set up to forward transcriptions. If it’s not, those transcription events won’t reach your frontend. 3. Inspect the UI: Take a look at the

ContentGrid

component and confirm it’s rendering the transcription segments. Adding a temporary log to peek at the

transcriptions

state can help:

console.log('Current transcriptions:', transcriptions);

4. Room Context: Make sure the

useRoomContext

hook is giving you the right room instance. If the room isn’t initialized properly, transcription events won’t come through. 5. Backend Support: Lastly, check that your backend supports transcription services and that everything (like your STT provider) is configured correctly. Give these a try and let me know what you find! If things still aren’t working, I’m here to help you dig a little deeper.

stocky-tailor-95498

08/13/2025, 9:36 AM

yes, i havent added the changes to the python backend: from dotenv import load_dotenv from livekit import agents from livekit.agents import AgentSession, Agent, RoomInputOptions from livekit.plugins import ( openai, cartesia, deepgram, noise_cancellation, silero, ) from livekit.plugins.turn_detector.multilingual import MultilingualModel from dotenv import load_dotenv load_dotenv(dotenv_path=".env.local") class Translator(Agent): def __init__(self) -> None: super().__init__(instructions= "Translate English speech to Hindi only, say nothing else",) async def entrypoint(ctx: agents.JobContext): session = AgentSession( stt=deepgram.STT(), llm=openai.LLM(model="gpt-4o-mini"), tts=cartesia.TTS( model="sonic-2", voice="28ca2041-5dda-42df-8123-f58ea9c3da00", language="hi" ), vad=silero.VAD.load(), turn_detection=MultilingualModel(), use_tts_aligned_transcription=True, ) await session.start( room=ctx.room, agent=Translator(), room_input_options=RoomInputOptions( noise_cancellation=noise_cancellation.BVC(), ), ) await ctx.connect() await session.generate_reply( instructions="Greet 'Hello I'm a translator'" ) if name == "__main__": agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))

tall-airline-16756

08/13/2025, 9:37 AM

Hey there! To get transcriptions working on the frontend, you’ll want to make sure your Python backend is set up to forward those transcription events properly. Based on the script you shared, here’s a friendly step-by-step guide: 1. Enable Transcription Forwarding: You’ll need to update your

AgentSession

to include the

STTSegmentsForwarder

, which takes care of sending transcriptions to the frontend. Here’s how you can tweak your session initialization:

Copy code

from livekit.agents.transcription import STTSegmentsForwarder

async def entrypoint(ctx: agents.JobContext):
    stt_forwarder = STTSegmentsForwarder()
    session = AgentSession(
        stt=deepgram.STT(),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=cartesia.TTS(
            model="sonic-2",
            voice="28ca2041-5dda-42df-8123-f58ea9c3da00",
            language="hi"
        ),
        vad=silero.VAD.load(),
        turn_detection=MultilingualModel(),
        use_tts_aligned_transcription=True,
    )

    # Forward transcriptions
    session.add_transcription_forwarder(stt_forwarder)

    await session.start(
        room=ctx.room,
        agent=Translator(),
        room_input_options=RoomInputOptions(
            noise_cancellation=noise_cancellation.BVC(),
        ),
    )

    await ctx.connect()

    await session.generate_reply(
        instructions="Greet 'Hello I'm a translator'"
    )

2. Check Your Backend Logs: It’s super helpful to add some logging so you can see transcription events being processed and forwarded. Here’s a quick example:

Copy code

import logging
logging.basicConfig(level=<http://logging.INFO|logging.INFO>)

async def entrypoint(ctx: agents.JobContext):
    stt_forwarder = STTSegmentsForwarder()
    session = AgentSession(
        stt=deepgram.STT(),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=cartesia.TTS(
            model="sonic-2",
            voice="28ca2041-5dda-42df-8123-f58ea9c3da00",
            language="hi"
        ),
        vad=silero.VAD.load(),
        turn_detection=MultilingualModel(),
        use_tts_aligned_transcription=True,
    )

    session.add_transcription_forwarder(stt_forwarder)

    async def log_transcriptions():
        async for event in stt_forwarder.stream():
            <http://logging.info|logging.info>("Transcription event: %s", event)

    await session.start(
        room=ctx.room,
        agent=Translator(),
        room_input_options=RoomInputOptions(
            noise_cancellation=noise_cancellation.BVC(),
        ),
    )

    await ctx.connect()

    await session.generate_reply(
        instructions="Greet 'Hello I'm a translator'"
    )

    await log_transcriptions()

3. Restart Your Backend: Once you’ve made these updates, just restart your Python backend to make sure everything kicks in smoothly. After you’ve done this, your backend should be forwarding transcription events right to the frontend. If anything feels off or you hit a snag, just give me a shout—I’m here to help!

stocky-tailor-95498

08/13/2025, 9:41 AM

sttsegmentforwarder is from 0.x versions, i am using1.x

tall-airline-16756

08/13/2025, 9:41 AM

Hey there! Just a quick update about LiveKit 1.x — the way transcription forwarding works has gotten a bit simpler. You don’t need to use

STTSegmentsForwarder

anymore. Now, transcription events get forwarded automatically as long as your backend is set up right. Here’s a quick rundown of what you should check: 1. Make Sure Transcription is Enabled: Double-check that your

AgentSession

is set up with an STT provider (like

deepgram.STT()

) and that the

use_tts_aligned_transcription

flag is set to

True

. Looks like you’ve already got this part covered in your script! 2. Confirm Event Forwarding: In LiveKit 1.x, transcription events are sent to the frontend automatically. Just make sure your frontend is listening for

RoomEvent.TranscriptionReceived

events — which you’ve already done, so great job! 3. Peek at Backend Logs: It’s helpful to add some logging to see that transcription events are flowing through. Here’s an example you can use:

Copy code

import logging
logging.basicConfig(level=<http://logging.INFO|logging.INFO>)

async def entrypoint(ctx: agents.JobContext):
    session = AgentSession(
        stt=deepgram.STT(),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=cartesia.TTS(
            model="sonic-2",
            voice="28ca2041-5dda-42df-8123-f58ea9c3da00",
            language="hi"
        ),
        vad=silero.VAD.load(),
        turn_detection=MultilingualModel(),
        use_tts_aligned_transcription=True,
    )

    async def log_transcriptions():
        async for event in session.transcription_stream():
            <http://logging.info|logging.info>("Transcription event: %s", event)

    await session.start(
        room=ctx.room,
        agent=Translator(),
        room_input_options=RoomInputOptions(
            noise_cancellation=noise_cancellation.BVC(),
        ),
    )

    await ctx.connect()

    await session.generate_reply(
        instructions="Greet 'Hello I'm a translator'"
    )

    await log_transcriptions()

4. Restart Your Backend: Once you’ve made these tweaks, just restart your backend so everything kicks in properly. With these steps, transcription events should flow smoothly to your frontend. If you hit any bumps or want to chat more about it, just let me know — I’m here to help!

stocky-tailor-95498

08/13/2025, 9:43 AM

AgentSession.__init__() got an unexpected keyword argument 'use_tts_aligned_transcription'

tall-airline-16756

08/13/2025, 9:44 AM

Hey there! It looks like the error is telling us that the

use_tts_aligned_transcription

argument isn’t valid for the

AgentSession

in LiveKit 1.x. It might have been removed or swapped out in the newer version. No worries though—here’s a friendly guide on what you can do next: 1. Remove the Argument: If

use_tts_aligned_transcription

isn’t supported anymore, just take it out from the

AgentSession

initialization. The transcription should still work fine as long as your backend is set up correctly. 2. Check the Docs: Take a peek at the LiveKit 1.x documentation or release notes to see if there are any updates to the

AgentSession

API. You might find new parameters or ways to handle transcription. 3. Give It a Test Run: After removing that argument, restart your backend and see if transcription events make it to the frontend. Also, double-check that your STT provider (like

deepgram.STT()

) is all set up properly. If you hit any snags or want a hand with anything else, just let me know—I’m here to help!

stocky-tailor-95498

08/13/2025, 9:46 AM

AgentSession' object has no attribute 'transcription_stream'

tall-airline-16756

08/13/2025, 9:46 AM

Hey there! It looks like the error is telling us that the

AgentSession

object in LiveKit 1.x doesn’t have a

transcription_stream

attribute anymore. This probably means the way transcription is handled has changed in the newer version. No worries though—here’s a friendly guide on how to move forward: 1. Double-check Transcription Handling: In LiveKit 1.x, transcription events usually get forwarded automatically to the frontend if your backend is set up right. So, make sure your STT provider (like

deepgram.STT()

) is correctly configured inside the

AgentSession

. 2. Drop the `transcription_stream`: Since this attribute isn’t there anymore, you can safely remove the related code. Instead, just have your frontend listen for

RoomEvent.TranscriptionReceived

events—as you’ve already got in place. 3. Peek at Backend Logs: It helps to add some logging so you can confirm transcription events are making it through on the backend. Here’s a quick example:

Copy code

import logging
logging.basicConfig(level=<http://logging.INFO|logging.INFO>)

async def entrypoint(ctx: agents.JobContext):
    session = AgentSession(
        stt=deepgram.STT(),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=cartesia.TTS(
            model="sonic-2",
            voice="28ca2041-5dda-42df-8123-f58ea9c3da00",
            language="hi"
        ),
        vad=silero.VAD.load(),
        turn_detection=MultilingualModel(),
    )

    await session.start(
        room=ctx.room,
        agent=Translator(),
        room_input_options=RoomInputOptions(
            noise_cancellation=noise_cancellation.BVC(),
        ),
    )

    await ctx.connect()

    <http://logging.info|logging.info>("Session started successfully")

4. Give Your Backend a Restart: Once you’ve made these tweaks, restart your backend and try out the transcription flow again. If you’re still not seeing transcription events on the frontend, just give me a shout! I’m here to help you dig deeper and get things working smoothly.

Open in Slack

Previous Next