Hi Pinot team, I am trying to create a realtime Pi...
# general
w
Hi Pinot team, I am trying to create a realtime Pinot table ingesting the data from Kafka topic.
Copy code
1. The Kafka stream data has two time columns: processed_at and created_at. 
2. The processed_at column is in-order inside Kafka stream.
3. The created_at is out-of-order inside Kafka stream
The retention of realtime pinot table is depending on created_at. If we want to use created_at as timeColumnName, since created_at can be very old, a lot of stale segments can be created. If we want to use processed_at as timeColumnName, a lot of old orders can live in the realtime table. Do you guys have any suggestion about which one to choose as timeColumnName?
👍 1
s
IIUC, you need all records that for which
createdAt > now -R
(where R is retention). If R is high, then you need those old segments, so why are you worried about lot of stale segments? Are the values of
createdAt
so random that all records ever ingested can be retained? As long as newer records in the kafka topic have reasonable values for `createdAt`(i.e. higher than older ones), I would use createdAt as time column. If necessary, you can add a filter at the time of creating the table to drop records earlier than epoch
tableCreationTime - R
. On the other hand, if you can get createdAt values all over the map all the time, then maybe what you need is a
REFRESH
table with no time column.
w
@User Thanks! I gave filter a try. It works.