ayush sharma
03/12/2021, 7:18 PMSegment query returned '50001' rows per split, maximum allowed is '50000' rows. with query "SELECT * FROM pinot_table LIMIT 50001"
Presto cannot even query something like this:
presto:default> select count(*) from pinot.default.pinot_table;
Even, if we increase the 50k limit of presto's pinot.properties pinot.max-rows-per-split-for-segment-queries
to 1 million, the presto server crashes stating heap memory exceeded.
To work it around, we got to know that we can make pinot to do the aggregations and feed the aggregated result to presto which will in turn feed the superset to visualize the charts, by writing the aggregation logic inside the sub query of presto like,
presto:default> select * from pinot.default."select count(*) from pinot_table"
This returns the expected result.
Problem # 3
We found that, though we can make pinot to do the aggregations, we cannot use the supported transformation function of pinot listed here, inside the sub query of presto.
The query
select datetrunc('day', epoch_ms_col, 'milliseconds') from pinot_table limit 10
works fine in pinot but when embedded in presto as sub query like below does not work
presto:default> select * from pinot.default."select datetrunc('day', epoch_ms_col, 'milliseconds') from pinot_table limit 10";
Query failed: Column datetrunc('day',epoch_ms_col,'milliseconds') not found in table default.select datetrunc('day', epoch_ms_col, 'milliseconds') from pinot_table limit 10
I do not know if we are doing something wrong while querying/implementing or have missed some useful config setting that can solve our problem.
The SQL Lab query which we want to query from pinot and eventually use the result to make a chart is like
SELECT
day_of_week(epoch_ms_col),
count(*)
from pinot_table
group by day_of_week(epoch_ms_col)
Any help is really appreciated !!!Kishore G
Xiang Fu
Elon
03/12/2021, 9:21 PMRon Kitay
03/16/2021, 6:29 PMPinot
to extract a large amount of data - what are the limitations?
e.g., if I want to do something like:
SELECT * from table where creationTime => x and creationTime<y
And save that output to a file (or files) - e.g. with the spark connector.
What are the limits here? If the result is 2 TB of data, will that be supported?Elon
03/16/2021, 6:31 PMXiang Fu