Apache Pinot

Hi folks! I'm trying to get a Pinot cluster setup in AWS, but I need to decide exactly the pieces of this cluster so that the sysops from my the company I work for can set it up, and I'd like your opinions on this. I'll explain more in this thread.

Basically, we'll start with around 100M rows, and I have this data backed up elsewhere, in compressed singular objects where each object will be inserted as one row, and in total we have ~500GB of these compressed objects.

We don't need crazy fast queries, just being sub-minute is already great for us. So I'm thinking about this organization:

• 1 node for Kafka, gp2 EBS, 100GB
• 1 node for the Pinot Controller, gp2 as well (although it doesn't seem to need to be fast)
• 2 nodes for the Pinot Brokers (if it's possible to have this replication, for availability)
• 3 nodes for the Pinot Servers, gp2 with 1TB each

What do you mean one object of 100M rows becomes 1 row in Pinot? If so, how are you planning to query it?