Apache Pinot

Hey guys, how should we be handling the scenario if there are multiple kafka topics that need to be ingested to pinot and joined to have the final result? Should there be a pre-aggregate/lookup streaming job that consolidates multiple topics data into one topic that gets ingested by pinot, or should we use Presto to do the joins?

Do all topics receive data in same format?

the topics will have different data, like “app_install” can be one kafka topic, and “app_open” can be another kafka topic. these two need to be joined

You likely need a stream processing (flink) job upstream for this. Unless all you want to do is dimension lookup, in which case refer to <https://docs.pinot.apache.org/users/user-guide-query/lookup-udf-join|https://docs.pinot.apache.org/users/user-guide-query/lookup-udf-join>

But what if there are lots of such events, any two of which can be joined at query time?

In lookup join, the dimension table is static (periodic refresh). If you are referring to flink, then that’s what it is made for.