For a telephony assistant where latency is critical and large data access is required, the choice between a RAG-based approach and a fine-tuned model depends on specific needs:
•
RAG-Based Approach: Ideal for accessing large, dynamic datasets. It allows real-time data retrieval but introduces additional latency due to the lookup process. Optimizations like caching and efficient vector searches can help mitigate delays.
•
Fine-Tuned Model: Provides faster responses as it doesn't rely on external lookups, making it better for low-latency applications. However, it may struggle with very large datasets and requires retraining to incorporate new data.
If the dataset is too large to embed but latency is critical, a hybrid approach could be effective. This would involve using a fine-tuned model for common queries and RAG for less frequent, data-intensive requests.
Sources:
Agent Testing & Evaluation | livekit/livekit_composite | DeepWiki |
External data and RAG | LiveKit Docs