Hey team, I am seeing a huge variation in performa...
# troubleshooting
y
Hey team, I am seeing a huge variation in performance of the following queries.
Copy code
select distinct DATETIMECONVERT(transaction_date, '1:DAYS:EPOCH', '1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd', '1:DAYS') from transactions limit 1000 -- 80+ seconds
select distinct transaction_date from transactions limit 1000 -- 3.5 seconds
Can you help with the how to optimize the same. In the meanwhile we have added another column in the
yyyy-MM-dd
format to support the same.
m
How many docs does the filter select, and how many distinct values? Also what’s your jvm settings?
y
There is no filter .. about 22 billion records. And 700 distinct values.
Screenshot 2021-07-21 at 11.19.50 PM.png
m
Yeah, so I think DateTimeConvert on 22B records might be slow.
Can you try count(*) + group by
DATETIMECONVERT(transaction_date, '1:DAYS:EPOCH', '1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd', '1:DAYS')
to get the distinct values?
It might be slightly faster, but not sure by how much. Also, adding another column will make it super fast.
j
I don't think doing aggregation group-by can be faster than distinct though. The cost is mainly on transform as we can see distinct itself is quite fast (second query)
Ideally we should support doing transform on broker side after getting the distinct results, which avoids the per-record transform
Currently one workaround would be to create a derived column for this transformation, and directly querying the derived column
You may create the derived column by adding the ingestion transform and do a table reload: https://docs.pinot.apache.org/developers/advanced/ingestion-level-transformations