I am new to pinot. I read online and feel that peo...
# general
l
I am new to pinot. I read online and feel that people choose pinot over druid is because it has better perf. https://leventov.medium.com/comparison-of-the-open-source-olap-systems-for-big-data-clickhouse-druid-and-pinot-8e042a5ed1c7 heavily compares clickhouse VS druid/pinot as opposed to druid VS pinot. I know startree data structure is unique to pinot but do not have a sense how much it help pinot win the race. Which tool is better in what scenario. Or one is obviously better than another in general. Could pinot guru shed some light? I love to hear the pinot advantages mapped to key design differences. Thanks very much.
k
Hi @User - it would help a lot to include some details of your use case. E.g. batch vs. real-time vs mixed, data volume & velocity, how Pinot will be used (e.g. backend for dashboard, or something else), etc. More context will mean much better answers.
l
Thanks @User. Since druid and pinot are not good at join (my info may be out of dated), it might be good to use snowflake as the full fledged datawarehouse and tolerate longer query response. I plan to use pinot/druid in streaming. The data coming from kafka is ingested into pinot/druid so that people can directly query/visualize the streaming data in sub-second delay. Data volume is around 10K messages per second and each message is around 2K bytes. Hope this clarifies.
d
Pinot will shine as you ramp up high QPS. It’s scabilility models offers better linearity as you reach high ingestion and qps rates.
k
Typically you’d use Presto on top of Pinot to support joins, or denormalize (flatten) data to remove the need to join
l
Thanks guys. This is really helpful.
This covers some of the performance difference
in terms of features, Pinot has a powerful indexing techniques that help you achieve low latency at high throughput • Inverted (most systems have this) • Sorted Index (similar to BTree) • Range Index • Text Index • JSON Index • Geo Spatial Index • StarTree Index
🙏 1
l
@User, https://medium.com/confluera-engineering/real-time-security-insights-apache-pinot-at-confluera-a6e5f401ff02 compared druid and pinot with pre-defined queries which covers automation scenario. Is there any perf comparison for unpredefined ad-hoc query? Ken's presentation compared pinot VS elastic search for such ad-hoc query. But I am also interested in pinot VS druid in this case. Thanks for any pointer.
k
LinkedIn did that and published the numbers in pinot paper
my suggestion is always to try it with your own data and do the comparison especially for performance
🙏 1
k
^^^ what @User said 🙂 It’s far too easy to manipulate results for “benchmarketing”, so I always recommend trying it with real data for your particular use case.
👍 1