Alon Burg
06/10/2021, 9:18 AMPinot: Realtime OLAP for 530 Million Users
it says
At Linkedin, business events are published in Kafka streams and
are ETL'ed onto HDFS. Pinot supports near-realtime data ingestion by reading events directly from Kafka [19] as well as data
pushes from offline systems like Hadoop. As such, Pinot follows
the lambda architecture [23], transparently merging streaming data
from Kafka and offline data from Hadoop. As data on Hadoop is a
global view of a single hour or day of data as opposed to a direct
stream of events, it allows for the generation of more optimal segments and aggregation of records across the time window.
Is there a general rule of thumb of when should I keep raw events in Pinot vs aggregated data?Ken Krugler
06/10/2021, 1:32 PM