how to display stt and tts transcriptions on front...
# ask-ai
s
how to display stt and tts transcriptions on frontend
t
Thanks for reaching out, Nakshatra. I'm an AI assistant and I'm here to help. Give me a moment while I research this topic.
Hey there! If you want to show STT and TTS transcriptions on the frontend using LiveKit, here’s a friendly step-by-step guide to help you out: • Frontend Integration: ◦ First, use the LiveKit SDK to listen for
RoomEvent.TranscriptionReceived
events. These events give you transcription segments as they come in. ◦ Next, keep those transcription segments in your state and display them in your UI. Here’s a simple example using React to get you started: import { useEffect, useState } from "react"; import { TranscriptionSegment, Participant, TrackPublication, RoomEvent, } from "livekit-client"; import { useMaybeRoomContext } from "@livekit/components-react"; export default function Transcriptions() { const room = useMaybeRoomContext(); const [transcriptions, setTranscriptions] = useState<{ [id: string]: TranscriptionSegment }>({}); useEffect(() => { if (!room) return; const updateTranscriptions = ( segments: TranscriptionSegment[], participant?: Participant, publication?: TrackPublication, ) => { setTranscriptions(prev => { const newTranscriptions = { ...prev }; for (const segment of segments) { newTranscriptions[segment.id] = segment; } return newTranscriptions; }); }; room.on(RoomEvent.TranscriptionReceived, updateTranscriptions); return () => room.off(RoomEvent.TranscriptionReceived, updateTranscriptions); }, [room]); return ( <ul> {Object.values(transcriptions) .sort((a, b) => a.firstReceivedTime - b.firstReceivedTime) .map(segment => ( <li key="{segment.id}">{segment.text} ))} </ul> ); }*Agent Integration:*from livekit.agents import stt, transcription from livekit.plugins.deepgram import STT async def _forward_transcription( stt_stream: stt.SpeechStream, stt_forwarder: transcription.STTSegmentsForwarder, ): """Forward the transcription and log the transcript in the console""" async for ev in stt_stream: stt_forwarder.update(ev) if ev.type == stt.SpeechEventType.INTERIM_TRANSCRIPT: print(ev.alternatives[0].text, end="") elif ev.type == stt.SpeechEventType.FINAL_TRANSCRIPT: print("\n") print(" -> ", ev.alternatives[0].text)Just a few things to keep in mind:Transcriptions come in segments and might get updated as processing continues.You can check the
final
property on a
TranscriptionSegment
to know when a segment is fully done.Make sure your agent is set up to forward transcriptions for everything to work smoothly.If you want to dive deeper or see full examples, don’t forget to check out the LiveKit Examples — they have complete transcription frontends and agents ready to explore. Sources: Transcriptions | LiveKit Docs
s
and this goes in page.tsx?
t
Absolutely! You can pop the provided React component into your
page.tsx
file or any other spot where you'd like those transcriptions to show up. Just a couple of things to keep in mind: • Make sure the
useMaybeRoomContext
hook is set up right and linked to your LiveKit room context. • Don’t forget to import all the necessary bits, like
livekit-client
and
@livekit/components-react
. • And the component should be rendered inside a React tree that has access to the LiveKit room context. If you want a hand fitting it into your project, just give me a shout—I’m happy to help!
s
i have an existing script, need to add transcriptions to it: 'use client'; import { LiveKitRoom, AudioConference, useRoomContext, StartAudio } from '@livekit/components-react'; import { useCallback, useEffect, useState } from 'react'; // Generates a random user ID for storage only (not sent to backend) function generateRandomUserId() { return `user-${Math.floor(Math.random() * 100000)}`; } export default function AudioExamplePage() { const [mounted, setMounted] = useState(false); const [userIdentity, setUserIdentity] = useState<string>(''); const endpoint = process.env.NEXT_PUBLIC_LK_TOKEN_ENDPOINT!; const [isStarting, setIsStarting] = useState(false); const [isInRoom, setIsInRoom] = useState(false); const [token, setToken] = useState<string | null>(null); const [serverUrlState, setServerUrlState] = useState<string | null>(null); const [roomNameState, setRoomNameState] = useState<string | null>(null); const [error, setError] = useState<string | null>(null); // Mounting logic and storing random identity locally useEffect(() => { setMounted(true); const stored = localStorage.getItem('lk-user-id'); if (stored) { setUserIdentity(stored); } else { const uid = generateRandomUserId(); localStorage.setItem('lk-user-id', uid); setUserIdentity(uid); } }, []); // Start session by calling second token generator endpoint const handleStart = useCallback(async () => { if (!endpoint) _return_; setError(null); setIsStarting(true); try { const res = await fetch(endpoint, { method: 'GET', cache: 'no-store' }); if (!res.ok) throw new Error(
Token fetch failed: ${res.status} ${res.statusText}
); const json = await res.json(); const t = json?.participantToken ?? json?.token ?? json?.accessToken; const serverUrlFromApi = json?.serverUrl; const roomFromApi = json?.roomName; if (typeof t !== 'string') throw new Error('Token not found in response.'); if (typeof serverUrlFromApi !== 'string') throw new Error('serverUrl not found in response.'); if (typeof roomFromApi !== 'string') throw new Error('roomName not found in response.'); setToken(t); setServerUrlState(serverUrlFromApi); setRoomNameState(roomFromApi); setIsInRoom(true); } catch (e: any) { setError(e.message || 'Failed to start.'); setToken(null); setServerUrlState(null); setRoomNameState(null); setIsInRoom(false); } finally { setIsStarting(false); } }, [endpoint]); const handleStoppedByChild = useCallback(() => { setIsInRoom(false); setToken(null); }, []); if (!mounted || !userIdentity) { return ( <div className="min-h-screen flex items-center justify-center"> <p role="status" aria-live="polite">Loading…</p> </div> ); } return ( <div data-lk-theme="default" style={{ height: '100vh', width: '100vw', display: 'flex', flexDirection: 'column', background: '#1A1A1A', color: 'white', }} > <header style={{ padding: '12px 16px', borderBottom: '1px solid #333', display: 'flex', alignItems: 'center', gap: 12 }}> <h1 style={{ fontSize: 22, margin: 0, flex: 1, fontWeight: 600 }}>Real-Time Voice Translation</h1> {!isInRoom ? ( <button type="button" onClick={handleStart} disabled={isStarting} style={{ padding: '10px 16px', borderRadius: 8, border: 'none', background: '#007AFF', color: 'white', fontWeight: 500, cursor: isStarting ? 'not-allowed' : 'pointer', transition: 'background-color 0.2s', }} aria-label="Start" > {isStarting ? 'Starting…' : 'Start Session'} </button> ) : null} </header> {error && ( <div role="alert" style={{ padding: '8px 16px', color: '#FF4D4D', background: '#442222' }}> {error} </div> )} {!isInRoom && ( <div style={{ flex: 1, display: 'grid', placeItems: 'center', padding: 16, textAlign: 'center' }}> <div> <h2 style={{ fontSize: 24, fontWeight: 600, marginBottom: 8 }}>Welcome</h2> <p style={{ opacity: 0.8, maxWidth: 400 }}>Click "Start Session" to begin real-time translation</p> </div> </div> )} {isInRoom && token && serverUrlState && ( <LiveKitRoom audio video={false} token={token} serverUrl={serverUrlState} style={{ flex: 1, display: 'flex', flexDirection: 'column' }} className="w-full h-full" onDisconnected={handleStoppedByChild} > <StartAudio label="Click to enable audio" /> <ContentGrid /> <div role="toolbar" aria-label="Call controls" style={{ padding: '0 16px 16px' }}> <div style={{ borderRadius: 12, background: 'rgba(0,0,0,0.2)', padding: 12, display: 'flex', justifyContent: 'center', alignItems: 'center', gap: 16 }}> <AudioConference /> <StopButton onStopped={handleStoppedByChild} /> </div> </div> </LiveKitRoom> )} </div> ); } function StopButton({ onStopped }: { onStopped: () => void }) { const room = useRoomContext(); const [stopping, setStopping] = useState(false); const onStop = useCallback(async () => { if (!room || stopping) _return_; setStopping(true); try { await room.disconnect(); } catch (err) { console.error('Error disconnecting:', err); } finally { setStopping(false); onStopped(); } }, [room, stopping, onStopped]); return ( <button type="button" onClick={onStop} disabled={stopping} style={{ padding: '10px 16px', borderRadius: 8, border: '1px solid #FF4D4D', background: '#FF4D4D', color: 'white', fontWeight: 500, cursor: stopping ? 'not-allowed' : 'pointer', transition: 'background-color 0.2s', }} aria-label="Stop" > {stopping ? 'Stopping…' : 'Stop Session'} </button> ); } function ContentGrid() { return ( <main id="main" role="main" style={{ flex: 1, display: 'grid', gridTemplateColumns: '1fr 1fr', gap: 24, padding: 24, overflowY: 'auto', }} > {/* English section */} <section aria-labelledby="source-heading" style={{ display: 'flex', flexDirection: 'column', background: '#2C2C2E', borderRadius: 12, padding: 20, overflow: 'hidden' }} > <h2 id="source-heading" style={{ fontSize: 20, margin: 0, marginBottom: 16, fontWeight: 600, color: '#E0E0E0' }} lang="en"> Source Language (English) </h2> <div className="sr-only" aria-live="polite" aria-atomic="false" /> <pre style={{ flex: 1, whiteSpace: 'pre-wrap', wordBreak: 'break-word', color: 'white', lineHeight: 1.6, fontFamily: 'monospace', fontSize: 16, background: '#1C1C1E', padding: 16, borderRadius: 8, overflowY: 'auto' }} aria-label="Live English transcript" lang="en" > Speak to start transcription… </pre> </section> {/* Hindi section */} <section aria-labelledby="target-heading" style={{ display: 'flex', flexDirection: 'column', background: '#2C2C2E', borderRadius: 12, padding: 20, overflow: 'hidden' }} > <h2 id="target-heading" style={{ fontSize: 20, margin: 0, marginBottom: 16, fontWeight: 600, color: '#E0E0E0' }} lang="hi"> Target Language (Hindi) </h2> <div className="sr-only" aria-live="assertive" aria-atomic="false" /> <pre style={{ flex: 1, whiteSpace: 'pre-wrap', wordBreak: 'break-word', color: 'white', lineHeight: 1.6, fontFamily: 'monospace', fontSize: 16, background: '#1C1C1E', padding: 16, borderRadius: 8, overflowY: 'auto' }} aria-label="Live Hindi translation" lang="hi" > Awaiting translation… </pre> </section> </main> ); }
t
Hey there! If you want to add transcriptions to your existing script, you can either weave the transcription logic right into the
ContentGrid
component or create a fresh component just for showing transcriptions. Here’s a friendly step-by-step on how you can tweak your script: 1. Import What You Need: First up, make sure you bring in the right classes and hooks to handle transcriptions:
import { RoomEvent, TranscriptionSegment } from 'livekit-client';
2. Set Up State for Transcriptions: Next, add a state variable in your
AudioExamplePage
component to keep track of transcription segments:
const [transcriptions, setTranscriptions] = useState<{ [id: string]: TranscriptionSegment }>({});
3. Listen for Transcription Events: Then, use the
RoomEvent.TranscriptionReceived
event to update your transcription state. Pop this inside a
useEffect
hook where you have access to the room context:
Copy code
useEffect(() => {
  const room = useRoomContext();
  if (!room) return;

  const handleTranscription = (segments: TranscriptionSegment[]) => {
    setTranscriptions(prev => {
      const updated = { ...prev };
      segments.forEach(segment => {
        updated[segment.id] = segment;
      });
      return updated;
    });
  };

  room.on(RoomEvent.TranscriptionReceived, handleTranscription);
  return () => room.off(RoomEvent.TranscriptionReceived, handleTranscription);
}, []);
4. Show Those Transcriptions: Finally, update your
ContentGrid
component so it displays the transcription segments nicely. Something like this:
Copy code
<section>
  <h2>Live Transcriptions</h2>
  <ul>
    {Object.values(transcriptions)
      .sort((a, b) => a.firstReceivedTime - b.firstReceivedTime)
      .map(segment => (
        <li key={segment.id}>{segment.text}</li>
      ))}
  </ul>
</section>
Give these changes a go, and you’ll see the transcriptions pop up in real-time. If you get stuck or want to chat more about it, just let me know—I’m here to help!
s
i did everything as follows: 'use client'; import { LiveKitRoom, AudioConference, useRoomContext, StartAudio } from '@livekit/components-react'; import { useCallback, useEffect, useState } from 'react'; import { RoomEvent, TranscriptionSegment } from 'livekit-client'; // Generates a random user ID for storage only (not sent to backend) function generateRandomUserId() { return `user-${Math.floor(Math.random() * 100000)}`; } // New component to listen for transcriptions inside LiveKit context function TranscriptionListener({ setTranscriptions }: { setTranscriptions: React.Dispatch<React.SetStateAction<{ [id: string]: TranscriptionSegment }>> }) { const room = useRoomContext(); useEffect(() => { if (!room) _return_; const handleTranscription = (segments: TranscriptionSegment[]) => { setTranscriptions(prev => { const updated = { ...prev }; segments.forEach(segment => { updated[segment.id] = segment; }); return updated; }); }; room.on(RoomEvent.TranscriptionReceived, handleTranscription); return () => { room.off(RoomEvent.TranscriptionReceived, handleTranscription); }; }, [room, setTranscriptions]); return null; // This component only listens; no UI } export default function AudioExamplePage() { const [mounted, setMounted] = useState(false); const [userIdentity, setUserIdentity] = useState<string>(''); const endpoint = process.env.NEXT_PUBLIC_LK_TOKEN_ENDPOINT!; const [isStarting, setIsStarting] = useState(false); const [isInRoom, setIsInRoom] = useState(false); const [token, setToken] = useState<string | null>(null); const [serverUrlState, setServerUrlState] = useState<string | null>(null); const [roomNameState, setRoomNameState] = useState<string | null>(null); const [error, setError] = useState<string | null>(null); const [transcriptions, setTranscriptions] = useState<{ [id: string]: TranscriptionSegment }>({}); // Mounting logic and storing random identity locally useEffect(() => { setMounted(true); const stored = localStorage.getItem('lk-user-id'); if (stored) { setUserIdentity(stored); } else { const uid = generateRandomUserId(); localStorage.setItem('lk-user-id', uid); setUserIdentity(uid); } }, []); // Start session by calling second token generator endpoint const handleStart = useCallback(async () => { if (!endpoint) _return_; setError(null); setIsStarting(true); try { const res = await fetch(endpoint, { method: 'GET', cache: 'no-store' }); if (!res.ok) throw new Error(
Token fetch failed: ${res.status} ${res.statusText}
); const json = await res.json(); const t = json?.participantToken ?? json?.token ?? json?.accessToken; const serverUrlFromApi = json?.serverUrl; const roomFromApi = json?.roomName; if (typeof t !== 'string') throw new Error('Token not found in response.'); if (typeof serverUrlFromApi !== 'string') throw new Error('serverUrl not found in response.'); if (typeof roomFromApi !== 'string') throw new Error('roomName not found in response.'); setToken(t); setServerUrlState(serverUrlFromApi); setRoomNameState(roomFromApi); setIsInRoom(true); } catch (e: any) { setError(e.message || 'Failed to start.'); setToken(null); setServerUrlState(null); setRoomNameState(null); setIsInRoom(false); } finally { setIsStarting(false); } }, [endpoint]); const handleStoppedByChild = useCallback(() => { setIsInRoom(false); setToken(null); }, []); if (!mounted || !userIdentity) { return ( <div className="min-h-screen flex items-center justify-center"> <p role="status" aria-live="polite">Loading…</p> </div> ); } return ( <div data-lk-theme="default" style={{ height: '100vh', width: '100vw', display: 'flex', flexDirection: 'column', background: '#1A1A1A', color: 'white', }} > <header style={{ padding: '12px 16px', borderBottom: '1px solid #333', display: 'flex', alignItems: 'center', gap: 12 }}> <h1 style={{ fontSize: 22, margin: 0, flex: 1, fontWeight: 600 }}>Real-Time Voice Translation</h1> {!isInRoom ? ( <button type="button" onClick={handleStart} disabled={isStarting} style={{ padding: '10px 16px', borderRadius: 8, border: 'none', background: '#007AFF', color: 'white', fontWeight: 500, cursor: isStarting ? 'not-allowed' : 'pointer', transition: 'background-color 0.2s', }} aria-label="Start" > {isStarting ? 'Starting…' : 'Start Session'} </button> ) : null} </header> {error && ( <div role="alert" style={{ padding: '8px 16px', color: '#FF4D4D', background: '#442222' }}> {error} </div> )} {!isInRoom && ( <div style={{ flex: 1, display: 'grid', placeItems: 'center', padding: 16, textAlign: 'center' }}> <div> <h2 style={{ fontSize: 24, fontWeight: 600, marginBottom: 8 }}>Welcome</h2> <p style={{ opacity: 0.8, maxWidth: 400 }}>Click "Start Session" to begin real-time translation</p> </div> </div> )} {isInRoom && token && serverUrlState && ( <LiveKitRoom audio video={false} token={token} serverUrl={serverUrlState} style={{ flex: 1, display: 'flex', flexDirection: 'column' }} className="w-full h-full" onDisconnected={handleStoppedByChild} > <StartAudio label="Click to enable audio" /> <TranscriptionListener setTranscriptions={setTranscriptions} /> <ContentGrid transcriptions={transcriptions} /> <div role="toolbar" aria-label="Call controls" style={{ padding: '0 16px 16px' }}> <div style={{ borderRadius: 12, background: 'rgba(0,0,0,0.2)', padding: 12, display: 'flex', justifyContent: 'center', alignItems: 'center', gap: 16 }}> <AudioConference /> <StopButton onStopped={handleStoppedByChild} /> </div> </div> </LiveKitRoom> )} </div> ); } function StopButton({ onStopped }: { onStopped: () => void }) { const room = useRoomContext(); const [stopping, setStopping] = useState(false); const onStop = useCallback(async () => { if (!room || stopping) _return_; setStopping(true); try { await room.disconnect(); } catch (err) { console.error('Error disconnecting:', err); } finally { setStopping(false); onStopped(); } }, [room, stopping, onStopped]); return ( <button type="button" onClick={onStop} disabled={stopping} style={{ padding: '10px 16px', borderRadius: 8, border: '1px solid #FF4D4D', background: '#FF4D4D', color: 'white', fontWeight: 500, cursor: stopping ? 'not-allowed' : 'pointer', transition: 'background-color 0.2s', }} aria-label="Stop" > {stopping ? 'Stopping…' : 'Stop Session'} </button> ); } function ContentGrid({ transcriptions }: { transcriptions: { [id: string]: TranscriptionSegment } }) { return ( <main id="main" role="main" style={{ flex: 1, display: 'grid', gridTemplateColumns: '1fr 1fr', gap: 24, padding: 24, overflowY: 'auto', }} > {/* English section */} <section aria-labelledby="source-heading" style={{ display: 'flex', flexDirection: 'column', background: '#2C2C2E', borderRadius: 12, padding: 20, overflow: 'hidden' }} > <h2 id="source-heading" style={{ fontSize: 20, margin: 0, marginBottom: 16, fontWeight: 600, color: '#E0E0E0' }} lang="en"> Source Language (English) </h2> <ul style={{ listStyle: 'none', padding: 0, margin: 0 }}> {Object.values(transcriptions) .sort((a, b) => a.firstReceivedTime - b.firstReceivedTime) .map(segment => ( <li key={segment.id} style={{ marginBottom: '8px' }}> {segment.text} </li> ))} </ul> </section> {/* Hindi section */} <section aria-labelledby="target-heading" style={{ display: 'flex', flexDirection: 'column', background: '#2C2C2E', borderRadius: 12, padding: 20, overflow: 'hidden' }} > <h2 id="target-heading" style={{ fontSize: 20, margin: 0, marginBottom: 16, fontWeight: 600, color: '#E0E0E0' }} lang="hi"> Target Language (Hindi) </h2> <pre style={{ flex: 1, whiteSpace: 'pre-wrap', wordBreak: 'break-word', color: 'white', lineHeight: 1.6, fontFamily: 'monospace', fontSize: 16, background: '#1C1C1E', padding: 16, borderRadius: 8, overflowY: 'auto' }} aria-label="Live Hindi translation" lang="hi" > Awaiting translation… </pre> </section> </main> ); } still can't see transcriptions on my page
t
Hey there! It seems like you’ve got the transcription logic set up pretty well, but let’s double-check a few things to make sure those transcriptions actually show up: 1. Verify Transcription Events: Make sure the
RoomEvent.TranscriptionReceived
event is firing. You could add a quick console log inside your
handleTranscription
function to see if transcription segments are coming through:
Copy code
const handleTranscription = (segments: TranscriptionSegment[]) => {
  console.log('Transcription segments received:', segments);
  setTranscriptions(prev => {
    const updated = { ...prev };
    segments.forEach(segment => {
      updated[segment.id] = segment;
    });
    return updated;
  });
};
2. Check Agent Configuration: Double-check that your LiveKit agent is set up to forward transcriptions. If it’s not, those transcription events won’t reach your frontend. 3. Inspect the UI: Take a look at the
ContentGrid
component and confirm it’s rendering the transcription segments. Adding a temporary log to peek at the
transcriptions
state can help:
console.log('Current transcriptions:', transcriptions);
4. Room Context: Make sure the
useRoomContext
hook is giving you the right room instance. If the room isn’t initialized properly, transcription events won’t come through. 5. Backend Support: Lastly, check that your backend supports transcription services and that everything (like your STT provider) is configured correctly. Give these a try and let me know what you find! If things still aren’t working, I’m here to help you dig a little deeper.
s
yes, i havent added the changes to the python backend: from dotenv import load_dotenv from livekit import agents from livekit.agents import AgentSession, Agent, RoomInputOptions from livekit.plugins import ( openai, cartesia, deepgram, noise_cancellation, silero, ) from livekit.plugins.turn_detector.multilingual import MultilingualModel from dotenv import load_dotenv load_dotenv(dotenv_path=".env.local") class Translator(Agent): def __init__(self) -> None: super().__init__(instructions= "Translate English speech to Hindi only, say nothing else",) async def entrypoint(ctx: agents.JobContext): session = AgentSession( stt=deepgram.STT(), llm=openai.LLM(model="gpt-4o-mini"), tts=cartesia.TTS( model="sonic-2", voice="28ca2041-5dda-42df-8123-f58ea9c3da00", language="hi" ), vad=silero.VAD.load(), turn_detection=MultilingualModel(), use_tts_aligned_transcription=True, ) await session.start( room=ctx.room, agent=Translator(), room_input_options=RoomInputOptions( noise_cancellation=noise_cancellation.BVC(), ), ) await ctx.connect() await session.generate_reply( instructions="Greet 'Hello I'm a translator'" ) if name == "__main__": agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))
t
Hey there! To get transcriptions working on the frontend, you’ll want to make sure your Python backend is set up to forward those transcription events properly. Based on the script you shared, here’s a friendly step-by-step guide: 1. Enable Transcription Forwarding: You’ll need to update your
AgentSession
to include the
STTSegmentsForwarder
, which takes care of sending transcriptions to the frontend. Here’s how you can tweak your session initialization:
Copy code
from livekit.agents.transcription import STTSegmentsForwarder

async def entrypoint(ctx: agents.JobContext):
    stt_forwarder = STTSegmentsForwarder()
    session = AgentSession(
        stt=deepgram.STT(),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=cartesia.TTS(
            model="sonic-2",
            voice="28ca2041-5dda-42df-8123-f58ea9c3da00",
            language="hi"
        ),
        vad=silero.VAD.load(),
        turn_detection=MultilingualModel(),
        use_tts_aligned_transcription=True,
    )

    # Forward transcriptions
    session.add_transcription_forwarder(stt_forwarder)

    await session.start(
        room=ctx.room,
        agent=Translator(),
        room_input_options=RoomInputOptions(
            noise_cancellation=noise_cancellation.BVC(),
        ),
    )

    await ctx.connect()

    await session.generate_reply(
        instructions="Greet 'Hello I'm a translator'"
    )
2. Check Your Backend Logs: It’s super helpful to add some logging so you can see transcription events being processed and forwarded. Here’s a quick example:
Copy code
import logging
logging.basicConfig(level=<http://logging.INFO|logging.INFO>)

async def entrypoint(ctx: agents.JobContext):
    stt_forwarder = STTSegmentsForwarder()
    session = AgentSession(
        stt=deepgram.STT(),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=cartesia.TTS(
            model="sonic-2",
            voice="28ca2041-5dda-42df-8123-f58ea9c3da00",
            language="hi"
        ),
        vad=silero.VAD.load(),
        turn_detection=MultilingualModel(),
        use_tts_aligned_transcription=True,
    )

    session.add_transcription_forwarder(stt_forwarder)

    async def log_transcriptions():
        async for event in stt_forwarder.stream():
            <http://logging.info|logging.info>("Transcription event: %s", event)

    await session.start(
        room=ctx.room,
        agent=Translator(),
        room_input_options=RoomInputOptions(
            noise_cancellation=noise_cancellation.BVC(),
        ),
    )

    await ctx.connect()

    await session.generate_reply(
        instructions="Greet 'Hello I'm a translator'"
    )

    await log_transcriptions()
3. Restart Your Backend: Once you’ve made these updates, just restart your Python backend to make sure everything kicks in smoothly. After you’ve done this, your backend should be forwarding transcription events right to the frontend. If anything feels off or you hit a snag, just give me a shout—I’m here to help!
s
sttsegmentforwarder is from 0.x versions, i am using1.x
t
Hey there! Just a quick update about LiveKit 1.x — the way transcription forwarding works has gotten a bit simpler. You don’t need to use
STTSegmentsForwarder
anymore. Now, transcription events get forwarded automatically as long as your backend is set up right. Here’s a quick rundown of what you should check: 1. Make Sure Transcription is Enabled: Double-check that your
AgentSession
is set up with an STT provider (like
deepgram.STT()
) and that the
use_tts_aligned_transcription
flag is set to
True
. Looks like you’ve already got this part covered in your script! 2. Confirm Event Forwarding: In LiveKit 1.x, transcription events are sent to the frontend automatically. Just make sure your frontend is listening for
RoomEvent.TranscriptionReceived
events — which you’ve already done, so great job! 3. Peek at Backend Logs: It’s helpful to add some logging to see that transcription events are flowing through. Here’s an example you can use:
Copy code
import logging
logging.basicConfig(level=<http://logging.INFO|logging.INFO>)

async def entrypoint(ctx: agents.JobContext):
    session = AgentSession(
        stt=deepgram.STT(),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=cartesia.TTS(
            model="sonic-2",
            voice="28ca2041-5dda-42df-8123-f58ea9c3da00",
            language="hi"
        ),
        vad=silero.VAD.load(),
        turn_detection=MultilingualModel(),
        use_tts_aligned_transcription=True,
    )

    async def log_transcriptions():
        async for event in session.transcription_stream():
            <http://logging.info|logging.info>("Transcription event: %s", event)

    await session.start(
        room=ctx.room,
        agent=Translator(),
        room_input_options=RoomInputOptions(
            noise_cancellation=noise_cancellation.BVC(),
        ),
    )

    await ctx.connect()

    await session.generate_reply(
        instructions="Greet 'Hello I'm a translator'"
    )

    await log_transcriptions()
4. Restart Your Backend: Once you’ve made these tweaks, just restart your backend so everything kicks in properly. With these steps, transcription events should flow smoothly to your frontend. If you hit any bumps or want to chat more about it, just let me know — I’m here to help!
s
AgentSession.__init__() got an unexpected keyword argument 'use_tts_aligned_transcription'
t
Hey there! It looks like the error is telling us that the
use_tts_aligned_transcription
argument isn’t valid for the
AgentSession
in LiveKit 1.x. It might have been removed or swapped out in the newer version. No worries though—here’s a friendly guide on what you can do next: 1. Remove the Argument: If
use_tts_aligned_transcription
isn’t supported anymore, just take it out from the
AgentSession
initialization. The transcription should still work fine as long as your backend is set up correctly. 2. Check the Docs: Take a peek at the LiveKit 1.x documentation or release notes to see if there are any updates to the
AgentSession
API. You might find new parameters or ways to handle transcription. 3. Give It a Test Run: After removing that argument, restart your backend and see if transcription events make it to the frontend. Also, double-check that your STT provider (like
deepgram.STT()
) is all set up properly. If you hit any snags or want a hand with anything else, just let me know—I’m here to help!
s
AgentSession' object has no attribute 'transcription_stream'
t
Hey there! It looks like the error is telling us that the
AgentSession
object in LiveKit 1.x doesn’t have a
transcription_stream
attribute anymore. This probably means the way transcription is handled has changed in the newer version. No worries though—here’s a friendly guide on how to move forward: 1. Double-check Transcription Handling: In LiveKit 1.x, transcription events usually get forwarded automatically to the frontend if your backend is set up right. So, make sure your STT provider (like
deepgram.STT()
) is correctly configured inside the
AgentSession
. 2. Drop the `transcription_stream`: Since this attribute isn’t there anymore, you can safely remove the related code. Instead, just have your frontend listen for
RoomEvent.TranscriptionReceived
events—as you’ve already got in place. 3. Peek at Backend Logs: It helps to add some logging so you can confirm transcription events are making it through on the backend. Here’s a quick example:
Copy code
import logging
logging.basicConfig(level=<http://logging.INFO|logging.INFO>)

async def entrypoint(ctx: agents.JobContext):
    session = AgentSession(
        stt=deepgram.STT(),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=cartesia.TTS(
            model="sonic-2",
            voice="28ca2041-5dda-42df-8123-f58ea9c3da00",
            language="hi"
        ),
        vad=silero.VAD.load(),
        turn_detection=MultilingualModel(),
    )

    await session.start(
        room=ctx.room,
        agent=Translator(),
        room_input_options=RoomInputOptions(
            noise_cancellation=noise_cancellation.BVC(),
        ),
    )

    await ctx.connect()

    <http://logging.info|logging.info>("Session started successfully")
4. Give Your Backend a Restart: Once you’ve made these tweaks, restart your backend and try out the transcription flow again. If you’re still not seeing transcription events on the frontend, just give me a shout! I’m here to help you dig deeper and get things working smoothly.