Hi everyone, I’m looking to leverage Pinot in a simple Analytics use-case: allowing distinct counts, funnel analysis and anomaly detection of user click events from our App
Currently, our somewhat large company we are ingesting 100~200K events/second of 300 different (but defined) schemas , the biggest schema should have 40 columns but the majority are less than 20. In this mix, there are also at least 10% of late-events and duplicates. (2TB a day)
Currently we reach for more than 500 users querying in an exploratory/interactive fashion over this data in our own front-end. With Pinot we hope to achieve sub-minute latency.
Pinot looks the perfect fit for this use-case, since there is no need to join events, but my main doubt is how big should this infrastructure to support this volume? And how hard is going to support a deployment for this volume?
I’m planning on deploying with K8s using S3 as segment store for Pinot. I also don’t need the Offline Server or any batch ingestion job