Hey, just started playing with Trino as I have a n...
# troubleshooting
a
Hey, just started playing with Trino as I have a need to do subqueries / joins on Pinot tables. I'm having trouble with array fields (singleValueField: false) in Pinot when querying through Trino. From reading through the connector documentation, it seems to support arrays. Here is my Pinot query that works (service is String array field):
Copy code
select service, count(*) from immutable_unified_events group by service limit 10
I've tried this query in Trino using basic and passthrough syntax, but get the same error either way.
Copy code
class java.lang.String cannot be cast to class java.util.List
Trino standard query:
Copy code
select service, count(*) from pinot.default.immutable_unified_events group by service limit 10;
Trino passthrough query:
Copy code
select * from pinot.default."select service, count(*) from immutable_unified_events group by service limit 10";
e
Hi @Aaron Weiss - we have a fix for this, I will mention the pr for it here shortly.
a
thanks @Elon, so once that PR gets merged, it would be live, or that would be a version update?
e
You can rebase from master once it's landed - we usually wait until releases to rebase, but you can try it. I'll update this thread so you can see the fix.
a
thanks a lot, appreciate it!
e
@Aaron Weiss - just to get some more context, what are the datatypes for the columns in the query? Also, which ones are arrays?
There might be a workaround you can do now...
oh, just saw your first comment:) What data type is
immutable_unified_events
?
Can you paste the schema, atleast for the `service`field, i.e. the FieldSpec json?
Fyi, this may be an issue with pinot - I ran a similar query in pinot and it appears to select the 2nd element of the array as the grouping column, instead of selecting the entire array.
cc @Mayank @Jackie @Xiang Fu lmk - is the finding correct? If so I can file a github issue, lmk
@Aaron Weiss - does the result of the query in pinot set the grouping column (service) to the entire array or just a single value from the array?
I'm seeing just one value from the array, the 2nd value if present or
null
if not.
Testing w pinot 0.9.3 ^^
j
@Elon I don't follow the issue here. What unexpected behavior do you find in pinot?
e
ex. let's say you have an array column, string for example:
string_array_col
with the following values
Copy code
['hi', 'hello', 'how are you'],
[null],
['good day'],
['bye', 'good bye', 'take care']
If you do
Copy code
select string_array_col, count(*) from table group by string_array_col
The result will be:
Copy code
'hello', 1
null, 2
'good bye', 1
Is that expected i.e. is it documented and I just didn't know? or should it return the entire array for each grouping set, like:
Copy code
['hi', 'hello', 'how are you'], 1
[null], 1
['good day'], 1
['bye', 'good bye', 'take care'], 1
I would have expected this ^^^ but got the former.
lmk - also would be happy to help work on a fix, if you can verify that it is in fact an issue
j
This is expected MV column behavior, where the grouping is applied on each individual value of the MV column
e
aha! Thanks:)
Question: since trino expects the grouping to be by array column should grouping by array columns either disable pushdown or throw an error? Thinking the first might make sense - i.e. do not push grouping by arrays into pinot as the result would be different. wdyt?
j
To get the array grouping semantic, we need to pre-join the values into a single value and do the grouping
Not pushing down is also fine, but that will end up scanning the whole table I imagine
e
Yep
It seems like pre-joining would have to be done inside pinot, right? i.e. there is no way to tell which grouping set the value came from, ex. if the same value is in multiple arrays/array positions.
j
To push down the query, we need to add a transform function to join MV values into a single value
e
nice, is that something that would be a welcome contribution? Worth filing a github issue?
j
Sure, thanks!
a
Hey @Elon, sorry I didn't respond earlier, but I think yall got it straight. An MV field does get broken up when you group by, which is actually the preferred behavior in my case, but I could also see where not breaking it up could be useful in some cases. So I like @Jackieโ€™s idea of having the option of grouping by the entire array through a function. As for the Trino errors, I tried grouping by an MV string column only and then an MV int column only and got the same error for both on the Trino side.
๐Ÿ’ก 1
e
Yep, agreed. Sounds good!
Created an issue to describe the feature request, let me know if it makes sense: https://github.com/apache/pinot/issues/8353
a
looks good to me
๐Ÿ‘ 1
hey, you said you had a PR link for pending fix for the array errors through Trino?
e
Yes, it's being worked on - I will add your query as a unit test ๐Ÿ™‚ The fix is to disable pushdown when the grouping column is an array. I'll update it shortly and share the link.
a
thanks, just so I understand, you would still end up with the split array resultset? What implication does disabling pushdown have on the performance?
e
It would end up with the entire array as the grouping column - this matches the behavior of trino and other query engines, where
select col, ... from table group by col
will not modify or explode the column value.
lmk if that makes sense or if you have any feedback/questions.
a
hmmm, I'd actually prefer the split array resultset, but we're saying that's just not possible through trino?
I mean not throwing an error is better than current state of course ๐Ÿ™‚
e
It is absolutely:) You just need to do it in a passthrough query. But if you did the above query it would return the result that trino expects.
There would be no error thrown with the fix, here is an example where you can get the behavior you want:
Copy code
select * from pinot.default."select array_col, max(...), count(*), agg_function(...) from table group by array_col"
will return the exploded array
but doing
Copy code
select array_col, count(*), max(...) from table group by array_col
a
ah ok, yeah that seems like best of both worlds, nice
e
will return the result with unmodified mv column, which matches the behavior if the table was in any other type of connector (postgres, hive, iceberg, etc.)
I will add support for that and update.
a
got it, yeah my initial example where service is a string MV column would work now
Copy code
select * from pinot.default."select service, count(*) from immutable_unified_events group by service limit 10";
er after the PR
thanks again, really appreciate it, this is a big use case for us
e
sure! Great that you're using pinot and trino ๐Ÿ™‚
a
yeah it's really been fun to work with so far, and the responsiveness of the community has been great
๐Ÿ™Œ 1
e
@Aaron Weiss - just added support both the best of both worlds:) Regular queries return the full array, passthrough queries explode the array so you still get the benefit of that feature in pinot
Just adding some tests and will share the link
a
Awesome! Thanks sir
๐Ÿ‘ 1
e
Here is the pr with the fixes for array support: https://github.com/trinodb/trino/pull/9781
and here is a pr which contains this, mixed case tablename support, time boundary handling and pinot 0.9.3 libraries:
These are all still under review but you can try them.
a
hey @Elon, we just noticed that filters on MV columns also fail in trino. Does this PR fix that as well? browser is an MV string column
Copy code
pinot> SELECT COUNT(1) FROM pinot.default.immutable_unified_events WHERE browser = 'Chrome'
[2022-03-16 10:10:43] [58] Query failed (#20220315_220247_00048_mxd5n): line 1:75: Cannot apply operator: array(varchar) = varchar(6)
[2022-03-16 10:10:43] io.trino.spi.TrinoException: Unexpected parameters (array(varchar), varchar(6)) for function $operator$equal. Expected: $operator$equal(T, T) T:comparable
Tried the same with flexible syntax and while it doesn't error, it just returns NULL if filtering on MV col. If I use SV col, everything works as expected using either syntax.
Copy code
SELECT * FROM pinot.default."SELECT COUNT(1) FROM immutable_unified_events WHERE browser = 'Chrome'"
e
Is
browser
an MV column and
immutable_unified_events
an SV column?
a
immutable_unified_events is the table
if I try an SV column, filters work
and yes browser is MV column
๐Ÿ‘ 1
e
What is the corresponding pinot query which works?
a
the same query works in Pinot
Copy code
SELECT COUNT(1) FROM immutable_unified_events WHERE browser = 'Chrome'
e
Ok, I'll test it and update the pr, thanks for catching this!
a
thank you!
๐Ÿ‘ 1
k
@alon I found this thread while looking for a solution to the same error Aaron shared above (from a year ago) where I'm trying to filter on a Pinot MV column using Trino (414).
Copy code
trino error: line 1:50: Cannot apply operator: array(bigint) = integer
Is query passthrough still the only option? @Mayank, @Xiang Fu. Looks like MV column support was added to prestodb? https://github.com/prestodb/presto/pull/15093
e
Can you paste the query?
k
Copy code
select * from events where ids = 29536865 limit 10
ids
column is MV in the
events
table in Pinot.
x
when you describe the table, what do you see
k
It lists out the columns, type, extra, comment.
x
For the data type, it gives you an array of type or just the data type?
e
What is the type? Iirc we now have array support. Which version of Pinot and Trino do you run?
k
for the column that I'm using in the query above, the type =
array(bigint)
. I tried similar query on
array(varchar)
type column as well and got a failure. I'm running Trino 414 in K8s and Pinot is latest Startree version.
e
What version for startree? I definitely see support for array types including unit/integration tests which pass (also used it in prod) - support is for Pinot 11.0 and newer. The error message should not be what it was above iirc, can you paste the error you get and Iโ€™ll try to repro?
k
So when I try to query Pinot via Trino in Superset I get the error below:
Copy code
Trino Error
trino error: line 1:64: Cannot apply operator: array(bigint) = integer


This may be triggered by:
Issue 1002 - The database returned an unexpected error. 

See less
From Redash:
Copy code
Error running query: TrinoUserError(type=USER_ERROR, name=TYPE_MISMATCH, message="line 1:65: Cannot apply operator: array(bigint) = integer", query_id=20230501_185514_00374_xk5ym)
DBeaver:
Copy code
SQL Error [58]: Query failed (#20230501_171816_00370_xk5ym): line 2:65: Cannot apply operator: array(bigint) = integer
Running,
0.13.0-ST.39
Startree
e
This doesnโ€™t look like it comes from the Pinot connector - either that or for some reason it thinks the column is a SV integer. Can you paste the output of describe table? Feel free to redact any column names, just need to see the types
k
Untitled spreadsheet - Sheet1.csv
e
And which field corresponds with the one that you had an issue with?
Also, can you try in trino cli? Want to see if the issue is with superset or trino
Ah I think this may be the issue: in pinot doing array type = value has different semantics, it will explode the array. In trino you can do contains(ids, <value>)
You can also use a dynamic table (query on quotes) - soon you will be able to use table functions and paste the native query in.
For now the contains trino Udf will work: root cause is the difference in the sql parsers and how they interpret arrays. Someone from star tree can add more context: @Xiang Fu) @Jackie @Mayank
k
Ah! facepalm , I was under the impression that Trino supported the same semantics via the pinot connector. Thank you for looking into this and your time! ๐Ÿ™
e
Sure, any time!
x
Thanks @Elon !