Apache Pinot

Hey guys,I am planning to do POC on pinot for the real time analytics over kafka wanted to understand does it support joins to multiple kafka topics and how efficient over flink at this time

Pinot by itself does not handle handle joining Kafka streams at ingestion time. It does support defining a dimension table, and using a `lookup` UDF to do star-schema style dimension lookups. It cannot do standard SQL-style `JOIN` operations between multiple tables, but PrestoDB can use Pinot as a back-end to accomplish full ANSI-SQL joins: <https://eng.uber.com/engineering-sql-support-on-apache-pinot/>

Also, you probably will be better off asking questions like this in <#CDRCA57FC|>, as this <#CDRJ5UE21|> channel is intended more for non-technical discussion.

if your use case requires Stream -&gt; Stream joins on unbounded inputs (e.g., two infinite Kafka streams), IMO, you would be better off doing that work in Spark, Flink, or Kafka Streams, where you have better control over the time window for data retention as well as the logic for the join and any subsequent transformations or projections of the join result, and then ingesting the output into Pinot for query-time analysis.

I’m not affiliated with the Pinot project, so take my opinion with a whole handful of salt - people much smarter than I am might have some better solutions for you :slightly_smiling_face:

<@U01LNAKLRG8> another possible solution is to use samza before ingesting to pinot

Sorry <@UDSRQR2GP>! I like Samza, I swear! I've never used it in a production use case so I always forget to suggest it :(

no issues. At Linkedin, many use cases use samza to process data first before ingesting to Pinot