https://pinot.apache.org/ logo
#troubleshooting
Title
# troubleshooting
a

Aaron Weiss

03/11/2022, 3:59 PM
Hey, just started playing with Trino as I have a need to do subqueries / joins on Pinot tables. I'm having trouble with array fields (singleValueField: false) in Pinot when querying through Trino. From reading through the connector documentation, it seems to support arrays. Here is my Pinot query that works (service is String array field):
Copy code
select service, count(*) from immutable_unified_events group by service limit 10
I've tried this query in Trino using basic and passthrough syntax, but get the same error either way.
Copy code
class java.lang.String cannot be cast to class java.util.List
Trino standard query:
Copy code
select service, count(*) from pinot.default.immutable_unified_events group by service limit 10;
Trino passthrough query:
Copy code
select * from pinot.default."select service, count(*) from immutable_unified_events group by service limit 10";
e

Elon

03/11/2022, 4:44 PM
Hi @Aaron Weiss - we have a fix for this, I will mention the pr for it here shortly.
a

Aaron Weiss

03/11/2022, 5:20 PM
thanks @Elon, so once that PR gets merged, it would be live, or that would be a version update?
e

Elon

03/11/2022, 5:21 PM
You can rebase from master once it's landed - we usually wait until releases to rebase, but you can try it. I'll update this thread so you can see the fix.
a

Aaron Weiss

03/11/2022, 9:25 PM
thanks a lot, appreciate it!
e

Elon

03/14/2022, 9:47 PM
@Aaron Weiss - just to get some more context, what are the datatypes for the columns in the query? Also, which ones are arrays?
There might be a workaround you can do now...
oh, just saw your first comment:) What data type is
immutable_unified_events
?
Can you paste the schema, atleast for the `service`field, i.e. the FieldSpec json?
Fyi, this may be an issue with pinot - I ran a similar query in pinot and it appears to select the 2nd element of the array as the grouping column, instead of selecting the entire array.
cc @Mayank @Jackie @Xiang Fu lmk - is the finding correct? If so I can file a github issue, lmk
@Aaron Weiss - does the result of the query in pinot set the grouping column (service) to the entire array or just a single value from the array?
I'm seeing just one value from the array, the 2nd value if present or
null
if not.
Testing w pinot 0.9.3 ^^
j

Jackie

03/14/2022, 10:15 PM
@Elon I don't follow the issue here. What unexpected behavior do you find in pinot?
e

Elon

03/14/2022, 10:19 PM
ex. let's say you have an array column, string for example:
string_array_col
with the following values
Copy code
['hi', 'hello', 'how are you'],
[null],
['good day'],
['bye', 'good bye', 'take care']
If you do
Copy code
select string_array_col, count(*) from table group by string_array_col
The result will be:
Copy code
'hello', 1
null, 2
'good bye', 1
Is that expected i.e. is it documented and I just didn't know? or should it return the entire array for each grouping set, like:
Copy code
['hi', 'hello', 'how are you'], 1
[null], 1
['good day'], 1
['bye', 'good bye', 'take care'], 1
I would have expected this ^^^ but got the former.
lmk - also would be happy to help work on a fix, if you can verify that it is in fact an issue
j

Jackie

03/15/2022, 12:17 AM
This is expected MV column behavior, where the grouping is applied on each individual value of the MV column
e

Elon

03/15/2022, 12:17 AM
aha! Thanks:)
Question: since trino expects the grouping to be by array column should grouping by array columns either disable pushdown or throw an error? Thinking the first might make sense - i.e. do not push grouping by arrays into pinot as the result would be different. wdyt?
j

Jackie

03/15/2022, 12:19 AM
To get the array grouping semantic, we need to pre-join the values into a single value and do the grouping
Not pushing down is also fine, but that will end up scanning the whole table I imagine
e

Elon

03/15/2022, 12:20 AM
Yep
It seems like pre-joining would have to be done inside pinot, right? i.e. there is no way to tell which grouping set the value came from, ex. if the same value is in multiple arrays/array positions.
j

Jackie

03/15/2022, 12:21 AM
To push down the query, we need to add a transform function to join MV values into a single value
e

Elon

03/15/2022, 12:21 AM
nice, is that something that would be a welcome contribution? Worth filing a github issue?
j

Jackie

03/15/2022, 12:22 AM
Sure, thanks!
a

Aaron Weiss

03/15/2022, 1:38 PM
Hey @Elon, sorry I didn't respond earlier, but I think yall got it straight. An MV field does get broken up when you group by, which is actually the preferred behavior in my case, but I could also see where not breaking it up could be useful in some cases. So I like @Jackie’s idea of having the option of grouping by the entire array through a function. As for the Trino errors, I tried grouping by an MV string column only and then an MV int column only and got the same error for both on the Trino side.
πŸ’‘ 1
e

Elon

03/15/2022, 4:57 PM
Yep, agreed. Sounds good!
Created an issue to describe the feature request, let me know if it makes sense: https://github.com/apache/pinot/issues/8353
a

Aaron Weiss

03/15/2022, 7:17 PM
looks good to me
πŸ‘ 1
hey, you said you had a PR link for pending fix for the array errors through Trino?
e

Elon

03/15/2022, 7:19 PM
Yes, it's being worked on - I will add your query as a unit test πŸ™‚ The fix is to disable pushdown when the grouping column is an array. I'll update it shortly and share the link.
a

Aaron Weiss

03/15/2022, 7:32 PM
thanks, just so I understand, you would still end up with the split array resultset? What implication does disabling pushdown have on the performance?
e

Elon

03/15/2022, 7:33 PM
It would end up with the entire array as the grouping column - this matches the behavior of trino and other query engines, where
select col, ... from table group by col
will not modify or explode the column value.
lmk if that makes sense or if you have any feedback/questions.
a

Aaron Weiss

03/15/2022, 7:34 PM
hmmm, I'd actually prefer the split array resultset, but we're saying that's just not possible through trino?
I mean not throwing an error is better than current state of course πŸ™‚
e

Elon

03/15/2022, 7:34 PM
It is absolutely:) You just need to do it in a passthrough query. But if you did the above query it would return the result that trino expects.
There would be no error thrown with the fix, here is an example where you can get the behavior you want:
Copy code
select * from pinot.default."select array_col, max(...), count(*), agg_function(...) from table group by array_col"
will return the exploded array
but doing
Copy code
select array_col, count(*), max(...) from table group by array_col
a

Aaron Weiss

03/15/2022, 7:36 PM
ah ok, yeah that seems like best of both worlds, nice
e

Elon

03/15/2022, 7:36 PM
will return the result with unmodified mv column, which matches the behavior if the table was in any other type of connector (postgres, hive, iceberg, etc.)
I will add support for that and update.
a

Aaron Weiss

03/15/2022, 7:37 PM
got it, yeah my initial example where service is a string MV column would work now
Copy code
select * from pinot.default."select service, count(*) from immutable_unified_events group by service limit 10";
er after the PR
thanks again, really appreciate it, this is a big use case for us
e

Elon

03/15/2022, 7:37 PM
sure! Great that you're using pinot and trino πŸ™‚
a

Aaron Weiss

03/15/2022, 7:38 PM
yeah it's really been fun to work with so far, and the responsiveness of the community has been great
πŸ™Œ 1
e

Elon

03/15/2022, 11:08 PM
@Aaron Weiss - just added support both the best of both worlds:) Regular queries return the full array, passthrough queries explode the array so you still get the benefit of that feature in pinot
Just adding some tests and will share the link
a

Aaron Weiss

03/15/2022, 11:08 PM
Awesome! Thanks sir
πŸ‘ 1
e

Elon

03/16/2022, 2:18 AM
Here is the pr with the fixes for array support: https://github.com/trinodb/trino/pull/9781
and here is a pr which contains this, mixed case tablename support, time boundary handling and pinot 0.9.3 libraries:
These are all still under review but you can try them.
a

Aaron Weiss

03/16/2022, 2:13 PM
hey @Elon, we just noticed that filters on MV columns also fail in trino. Does this PR fix that as well? browser is an MV string column
Copy code
pinot> SELECT COUNT(1) FROM pinot.default.immutable_unified_events WHERE browser = 'Chrome'
[2022-03-16 10:10:43] [58] Query failed (#20220315_220247_00048_mxd5n): line 1:75: Cannot apply operator: array(varchar) = varchar(6)
[2022-03-16 10:10:43] io.trino.spi.TrinoException: Unexpected parameters (array(varchar), varchar(6)) for function $operator$equal. Expected: $operator$equal(T, T) T:comparable
Tried the same with flexible syntax and while it doesn't error, it just returns NULL if filtering on MV col. If I use SV col, everything works as expected using either syntax.
Copy code
SELECT * FROM pinot.default."SELECT COUNT(1) FROM immutable_unified_events WHERE browser = 'Chrome'"
e

Elon

03/16/2022, 3:05 PM
Is
browser
an MV column and
immutable_unified_events
an SV column?
a

Aaron Weiss

03/16/2022, 3:07 PM
immutable_unified_events is the table
if I try an SV column, filters work
and yes browser is MV column
πŸ‘ 1
e

Elon

03/16/2022, 3:08 PM
What is the corresponding pinot query which works?
a

Aaron Weiss

03/16/2022, 3:08 PM
the same query works in Pinot
Copy code
SELECT COUNT(1) FROM immutable_unified_events WHERE browser = 'Chrome'
e

Elon

03/16/2022, 3:11 PM
Ok, I'll test it and update the pr, thanks for catching this!
a

Aaron Weiss

03/16/2022, 3:11 PM
thank you!
πŸ‘ 1
k

Kamal Chavda

04/29/2023, 2:54 AM
@alon I found this thread while looking for a solution to the same error Aaron shared above (from a year ago) where I'm trying to filter on a Pinot MV column using Trino (414).
Copy code
trino error: line 1:50: Cannot apply operator: array(bigint) = integer
Is query passthrough still the only option? @Mayank, @Xiang Fu. Looks like MV column support was added to prestodb? https://github.com/prestodb/presto/pull/15093
e

Elon

04/29/2023, 2:55 AM
Can you paste the query?
k

Kamal Chavda

04/29/2023, 2:57 AM
Copy code
select * from events where ids = 29536865 limit 10
ids
column is MV in the
events
table in Pinot.
x

Xiang Fu

04/29/2023, 6:39 AM
when you describe the table, what do you see
k

Kamal Chavda

05/01/2023, 2:30 PM
It lists out the columns, type, extra, comment.
x

Xiang Fu

05/01/2023, 4:45 PM
For the data type, it gives you an array of type or just the data type?
e

Elon

05/01/2023, 4:51 PM
What is the type? Iirc we now have array support. Which version of Pinot and Trino do you run?
k

Kamal Chavda

05/01/2023, 5:15 PM
for the column that I'm using in the query above, the type =
array(bigint)
. I tried similar query on
array(varchar)
type column as well and got a failure. I'm running Trino 414 in K8s and Pinot is latest Startree version.
e

Elon

05/01/2023, 5:26 PM
What version for startree? I definitely see support for array types including unit/integration tests which pass (also used it in prod) - support is for Pinot 11.0 and newer. The error message should not be what it was above iirc, can you paste the error you get and I’ll try to repro?
k

Kamal Chavda

05/01/2023, 6:56 PM
So when I try to query Pinot via Trino in Superset I get the error below:
Copy code
Trino Error
trino error: line 1:64: Cannot apply operator: array(bigint) = integer


This may be triggered by:
Issue 1002 - The database returned an unexpected error. 

See less
From Redash:
Copy code
Error running query: TrinoUserError(type=USER_ERROR, name=TYPE_MISMATCH, message="line 1:65: Cannot apply operator: array(bigint) = integer", query_id=20230501_185514_00374_xk5ym)
DBeaver:
Copy code
SQL Error [58]: Query failed (#20230501_171816_00370_xk5ym): line 2:65: Cannot apply operator: array(bigint) = integer
Running,
0.13.0-ST.39
Startree
e

Elon

05/01/2023, 7:10 PM
This doesn’t look like it comes from the Pinot connector - either that or for some reason it thinks the column is a SV integer. Can you paste the output of describe table? Feel free to redact any column names, just need to see the types
k

Kamal Chavda

05/01/2023, 7:29 PM
Untitled spreadsheet - Sheet1.csv
e

Elon

05/01/2023, 7:31 PM
And which field corresponds with the one that you had an issue with?
Also, can you try in trino cli? Want to see if the issue is with superset or trino
Ah I think this may be the issue: in pinot doing array type = value has different semantics, it will explode the array. In trino you can do contains(ids, <value>)
You can also use a dynamic table (query on quotes) - soon you will be able to use table functions and paste the native query in.
For now the contains trino Udf will work: root cause is the difference in the sql parsers and how they interpret arrays. Someone from star tree can add more context: @Xiang Fu) @Jackie @Mayank
k

Kamal Chavda

05/01/2023, 8:13 PM
Ah! facepalm , I was under the impression that Trino supported the same semantics via the pinot connector. Thank you for looking into this and your time! πŸ™
e

Elon

05/01/2023, 8:13 PM
Sure, any time!
x

Xiang Fu

05/02/2023, 4:09 AM
Thanks @Elon !
4 Views