We have an offline Pinot table with 20 columns experiencing high latency or timeout when query it with large offset: e.g. 1000000. some of the columns has large text: e.g. 50000 chars. Is there some thing we could change to help improving the query latency?
m
Mayank
08/05/2021, 1:04 AM
what does the query look like?
x
Xiang Fu
08/05/2021, 1:06 AM
why query with large offset?
c
Carl
08/05/2021, 1:06 AM
Typically Select * from table where timecolumname < somedate order by id limit 20 offset 1300000.
Carl
08/05/2021, 1:10 AM
Is a pagated query from UI support retrieve data from over 2 million records.
x
Xiang Fu
08/05/2021, 1:11 AM
since it’s order by and offset, I think the broker will retrieve 1.3 million rows then give you 20
Xiang Fu
08/05/2021, 1:11 AM
so it’s expected to be long waiting time
c
Carl
08/05/2021, 1:15 AM
Yeah that make sense. But Is there anything we can optimize other than remove the ordering?
k
Kishore G
08/05/2021, 1:37 AM
You want to use something with time
c
Carl
08/05/2021, 1:39 AM
You mean ordering only by the field configured as timecolumnname?
m
Mayank
08/05/2021, 1:59 AM
Perhaps break query down into smaller queries for time ranges
k
Kishore G
08/05/2021, 2:04 AM
sorry, got distracted and forgot to write the full solution
Kishore G
08/05/2021, 2:05 AM
Think of adding an additional where clause for time in the query
Kishore G
08/05/2021, 2:06 AM
Select * from table where time < some date and time > prev max date fetched limit 10000
Kishore G
08/05/2021, 2:07 AM
First time u run the query you can set prev-Max to 1970 or not have that predicate
Kishore G
08/05/2021, 2:08 AM
After you fetch the result for the first 10000, use the max time from the result