To send bytes of input audio to your Speech-to-Text (STT) service in LiveKit’s agent session, the implementation depends on the specific STT service you are using. Here are examples for different services:
• *OpenAI STT:*`encoded_frame = {
"type": "input_audio_buffer.append",
"audio": base64.b64encode(frame.data.tobytes()).decode("utf-8"),
}
await ws.send_json(encoded_frame)`
• *AWS STT:*`await stream.input_stream.send_audio_event(audio_chunk=frame.data.tobytes())`
• *Baseten STT:*`int16_array = np.frombuffer(frame.data, dtype=np.int16)
await ws.send_bytes(int16_array.tobytes())`
• *Gladia STT:*`chunk_b64 = base64.b64encode(chunk).decode("utf-8")
await ws.send_str(json.dumps({"type": "audio_chunk", "data": {"chunk": chunk_b64}}))`
• *Deepgram STT:*`await ws.send_bytes(frame.data.tobytes())`
Each example processes audio frames and sends them to the respective STT service using WebSockets or specialized API streams. Ensure you adapt the code to match the specific requirements of your chosen STT service.
Sources:
GitHub::livekit::agents | livekit-plugins/livekit-plugins-deepgram/livekit/...