Diogo Baeder
04/22/2022, 9:28 PMMayank
Mayank
Xiang Fu
Diogo Baeder
04/22/2022, 9:35 PMselect distinct weight from weights
inner join brands_metrics on weights.pmxid = brands_metrics.pmxid
inner join filters on weights.pmxid = filters.pmxid
where
brand_id = 1000226 and
filters.name = 'gender' and
contains(filters."values", 2) and
weights.date > 20130501 and
weights.date < 20130515 and
brands_metrics.date > 20130501 and
brands_metrics.date < 20130515 and
filters.date > 20130501 and
filters.date < 20130515
and here's some explanation about what's going on:
• I only want data between 2013-05-01 and 2013-05-15
• I want to join data from weights
, brands_metrics
and filters
where they match on the caseid
column (which is common to all the 3 tables)
• I expected Trino to do the querying in the Pinot Broker, but it seems like it first gets all of the segments existing in those tables and does the queries inside each of them, so for example doing each separate query takes only a few ms in Pinot itself, but running the full query in Trino takes 6s which is unacceptableDiogo Baeder
04/22/2022, 9:36 PMMayank
Xiang Fu
Diogo Baeder
04/22/2022, 9:38 PMXiang Fu
Xiang Fu
Xiang Fu
Diogo Baeder
04/22/2022, 9:40 PMDiogo Baeder
04/22/2022, 9:47 PMselect weight, caseid as weights_caseid from weights where "date" > 20130501 and "date" < 20130515 limit 100
if I run it in Pinot, it runs in 5ms, and in Trino it's taking more than 1s. And then looking at the logs, I see this for example:
trino-server-xavier | 2022-04-22T21:44:38.522Z INFO 20220422_214437_00113_jhvv6.1.0.0-105-102 io.trino.plugin.pinot.PinotSegmentPageSource Query 'SELECT "caseid", "weight" FROM weights_OFFLINE WHERE (("date" > '20130501' AND "date" < '20130515')) LIMIT 100' on host 'Optional[Server_172.30.0.5_8098]' for segment splits: [weights_OFFLINE_20131224_20131224_201]
meaning Trino is trying to query the weights_OFFLINE_20131224_20131224_201
segment, which is totally out of the period I want to get data from. Pinot matches 5 segments but Trino tries to run it over all the 209 segments I have.Xiang Fu
Xiang Fu
Xiang Fu
SELECT * FROM pinot.default."select weight, caseid as weights_caseid from weights where \"date\" > 20130501 and \"date\" < 20130515 limit 100"
Diogo Baeder
04/22/2022, 9:50 PMDiogo Baeder
04/22/2022, 9:50 PMXiang Fu
Xiang Fu
Xiang Fu
Diogo Baeder
04/22/2022, 9:52 PMtrino.exceptions.TrinoUserError: TrinoUserError(type=USER_ERROR, name=SYNTAX_ERROR, message="line 1:94: mismatched input '" > 20130501 and "'. Expecting: '(', ',', 'CROSS', 'EXCEPT', 'FETCH', 'FULL', 'GROUP', 'HAVING', 'INNER', 'INTERSECT', 'JOIN', 'LEFT', 'LIMIT', 'MATCH_RECOGNIZE', 'NATURAL', 'OFFSET', 'ORDER', 'RIGHT', 'TABLESAMPLE', 'UNION', 'WHERE', 'WINDOW', <EOF>", query_id=20220422_215155_00117_jhvv6)
so yeah, I guess Trino or Presto are a no-go for me. 😕Xiang Fu
Xiang Fu
Xiang Fu
Diogo Baeder
04/22/2022, 9:53 PMdate
Xiang Fu
SELECT * FROM pinot.default."select weight, caseid as weights_caseid from weights where date > 20130501 and date < 20130515 limit 100"
Xiang Fu
Xiang Fu
Diogo Baeder
04/22/2022, 9:55 PMtrino.exceptions.TrinoQueryError: TrinoQueryError(type=INTERNAL_ERROR, name=GENERIC_INTERNAL_ERROR, message="Caught exception while parsing query: select weight, caseid as weights_caseid from weights where date > 20130501 and date < 20130515 limit 100", query_id=20220422_215444_00133_jhvv6)
probably because Pinot itself doesn't accept just date
(since it's reserved)Diogo Baeder
04/22/2022, 9:55 PMXiang Fu
Diogo Baeder
04/22/2022, 9:56 PMXiang Fu
Xiang Fu
Xiang Fu
select distinct weight from (SELECT * FROM pinot.default."SELECT weight, pmxid FROM weights WHERE date > 20130501 and date < 20130515 and brand_id = 1000226") weights
inner join (SELECT * FROM pinot.default."SELECT weight, pmxid FROM weights WHERE date > 20130501 and date < 20130515 and brand_id = 1000226") brands_metrics on weights.pmxid = brands_metrics.pmxid
inner join (SELECT * FROM pinot.default."SELECT weight, pmxid FROM weights WHERE date > 20130501 and date < 20130515 and name = 'gender' and contains(filters."values", 2) ") filters on weights.pmxid = filters.pmxid
Xiang Fu
Diogo Baeder
04/22/2022, 9:57 PMXiang Fu
Diogo Baeder
04/22/2022, 9:58 PMXiang Fu
Xiang Fu
Diogo Baeder
04/22/2022, 10:08 PMdate_
and values_
. I'm too lazy 😄Diogo Baeder
04/23/2022, 12:55 AMDiogo Baeder
04/23/2022, 12:57 AMDiogo Baeder
04/23/2022, 1:00 AMselect distinct weight from
pinot.default."select weight, caseid as weights_caseid from weights where date_ > 20150501 and date_ < 20150515" as weights
inner join pinot.default."select brand_id, caseid as brands_metrics_caseid from brands_metrics where brand_id = 1000226 and date_ > 20150501 and date_ < 20150515 limit 100" as brands_metrics
on weights_caseid = brands_metrics_caseid
inner join pinot.default."select name, caseid as filters_caseid from filters where name = 'gender' and values_ = 2 and date_ > 20150501 and date_ < 20150515" as filters
on weights_caseid = filters_caseid
Mayank
Elon
04/24/2022, 11:02 AMElon
04/24/2022, 11:03 AMDiogo Baeder
04/24/2022, 12:41 PM