Hi team < UDQU92KBK> < UDRJ7G85T> < UGRJA9TEH> Im doing a PO Apache Pinot #general

Hi team <@UDQU92KBK> <@UDRJ7G85T> <@UGRJA9TEH>..Im...

Bodu Janardhan

04/13/2022, 2:25 PM

Hi team @User @User @User..Im doing a POC to use Pinot where we have some 70 tables(in older database) and majorly 10 to 15 tables are queried currently with joins to get aggregations and analytics of the system. Our data is growing at fast pace and wanted to check if pinot satisfies our needs. I wanted to use presto with Pinot, with existing join queries and wanted to minimize new data modelling for pinot. I found some bench marks here(https://www.startree.ai/blogs/real-time-analytics-with-presto-and-apache-pinot-part-ii) for presto + pinot but it was only for single table(name- complexWebsite, with billion records). Merging/Modelling all columns from our tables to single table(to satisfy without joins) is very difficult since we have many field dependencies. Do you have any reference links where I can find kind of similar above benchmarking with multiple tables(with joins) queried from presto to pinot. Can anyone help me in this aspect on how to proceed..Thanks in advance..

Mayank

04/13/2022, 2:34 PM

The number of tables shouldn’t really impact join performance. @User for any data points

Bodu Janardhan

04/14/2022, 5:49 AM

We would have similar join queries

Copy code

Select sum(sum_amount) as sales, Month(date) as month, city from orders JOIN customers ON
orders.customer_id = customs.customer_id where customers.state = 'California' AND cutomers.gender='Female' Group By Month(orders.date), customers.city

Bodu Janardhan

04/14/2022, 5:54 AM

@User @User would like know how Indexes would work when queries from presto with billions of records for above in pinot..Please share any data points or resource links which describes performance/benchmark for such joins

Mayank

04/14/2022, 2:01 PM

Presto will push down all the predicates to Pinot to evaluate, and return the matching data, which presto can join on. In such case the performance is much better than presto-only, what you save is scanning of data due to indexes

Mayank

04/14/2022, 2:02 PM

Also the connector can push down aggregation as well, as in, as long as a query or part of query can be executed by Pinot, it is pushed down

Open in Slack

Previous Next