I have some questions about Pinot realtime/upsert tables:
1. Retention. We have an upsert table whose retention is set to 10 days. However, I’m seeing “latest” rows where the value is 14 days old. Is the cleanup process “lazy”?
2. Although not required are there any advantages/disadvantages to the time column being the same as the sorted column? Am I correct to understand that the recommendation is to only have one sorted column?
06/24/2021, 5:21 PM
1. The retention management is on segment level, so the segment won't be removed if any record within the segment is not expired
2. Having a column sorted can accelerate the filtering and benefit from the data locality. Pinot does not support secondary sorted column yet, so usually it is hard to sort on multiple columns
06/24/2021, 5:22 PM
okay, thanks. we’ll keep the sorted index same as the time column.
06/24/2021, 5:23 PM
Based on your use case, you might want to choose sorted column differently. E.g. if you always filter on a key column (e.g.
), you might want to sort on it to benefit from fast filtering and data locality
If you always have a range filter on time, then sorting on time column is also a good option