Not sure if anyone is familiar with the Pinot Trin...
# troubleshooting
t
Not sure if anyone is familiar with the Pinot Trino Connector, but I also have this problem querying the pinot table only through trino after making my pinot table hybrid (instead of just realtime). I am having issues with the pinot pinot connector after ingesting Offline Segments into my Pinot table, going from a Realtime only table to a hybrid Realtime and Offline table. I am able to query pinot directly, however, as soon as segments become available in the offline table, the trino pinot connector starts to raise SQL exceptions for some reason.. Example: Pinot passthrough works with trino:
Copy code
> SELECT * from pinot.default."SELECT * FROM uplinkpayloadevent WHERE time_string < '2022-01-01T00:00:00' ORDER BY time_string DESC LIMIT 300"
[2022-10-18 13:45:13] 300 rows retrieved starting from 1 in 1 s 339 ms (execution: 1 s 138 ms, fetching: 201 ms)
However, using trino itself fails:
Copy code
> SELECT * from pinot.default.uplinkpayloadevent WHERE time_string < '2022-01-01T00:00:00' ORDER BY time_string DESC
[2022-10-18 13:46:13] 0 rows retrieved in 398 ms (execution: 333 ms, fetching: 65 ms)
[2022-10-18 13:46:13] [65536] Query failed (#20221018_174613_00013_q2hjp): Caught exception while parsing query: SELECT "app_tok", "gatewayaddress", "message_str", "key_hash", "net_tok", "acctid", "id", "moduleaddress", "time_string", "key_range" FROM uplinkpayloadevent_REALTIME  WHERE time_string >= 2022-10-18T09:12:39.768 AND (("time_string" < '2022-01-01T00:00:00')) LIMIT 2147483647
[2022-10-18 13:46:13] org.apache.calcite.sql.parser.babel.ParseException: Encountered "T09" at line 1, column 200.
... stack trace continues....
Full Stack Trace:
Copy code
org.apache.pinot.sql.parsers.SqlCompilationException: Caught exception while parsing query: SELECT "app_tok", "gatewayaddress", "message_str", "key_hash", "net_tok", "acctid", "id", "moduleaddress", "time_string", "key_range" FROM uplinkpayloadevent_REALTIME  WHERE time_string >= 2022-10-18T09:12:39.768 AND (("time_string" < '2022-01-01T00:00:00')) LIMIT 2147483647
	at org.apache.pinot.sql.parsers.CalciteSqlParser.compileCalciteSqlToPinotQuery(CalciteSqlParser.java:334)
	at org.apache.pinot.sql.parsers.CalciteSqlParser.compileToPinotQuery(CalciteSqlParser.java:133)
	at org.apache.pinot.sql.parsers.CalciteSqlCompiler.compileToBrokerRequest(CalciteSqlCompiler.java:35)
	at io.trino.plugin.pinot.client.PinotGrpcDataFetcher$PinotGrpcServerQueryClient.queryPinot(PinotGrpcDataFetcher.java:244)
	at io.trino.plugin.pinot.client.PinotGrpcDataFetcher.fetchData(PinotGrpcDataFetcher.java:99)
	at io.trino.plugin.pinot.PinotSegmentPageSource.getNextPage(PinotSegmentPageSource.java:114)
	at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:311)
	at io.trino.operator.Driver.processInternal(Driver.java:411)
	at io.trino.operator.Driver.lambda$process$10(Driver.java:314)
	at io.trino.operator.Driver.tryWithLock(Driver.java:706)
	at io.trino.operator.Driver.process(Driver.java:306)
	at io.trino.operator.Driver.processForDuration(Driver.java:277)
	at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:736)
	at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:164)
	at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:515)
	at io.trino.$gen.Trino_399____20221018_171212_2.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: org.apache.calcite.sql.parser.SqlParseException: Encountered "T09" at line 1, column 200.
Was expecting one of:
    <EOF> 
    "ORDER" ...
    "LIMIT" ...
    "OFFSET" ...
    "FETCH" ...
    "GROUP" ...
    "HAVING" ...
    "WINDOW" ...
    "UNION" ...
    "INTERSECT" ...
    "EXCEPT" ...
    "MINUS" ...
    "." ...
    "NOT" ...
    "IN" ...
    "<" ...
    "<=" ...
    ">" ...
    ">=" ...
    "=" ...
    "<>" ...
    "!=" ...
    "BETWEEN" ...
    "LIKE" ...
    "ILIKE" ...
    "RLIKE" ...
    "SIMILAR" ...
    "!" ...
    "~" ...
    "::" ...
    "+" ...
    "-" ...
    "*" ...
    "/" ...
    "%" ...
    "||" ...
    "AND" ...
    "OR" ...
    "IS" ...
    "MEMBER" ...
    "SUBMULTISET" ...
    "CONTAINS" ...
    "OVERLAPS" ...
    "EQUALS" ...
    "PRECEDES" ...
    "SUCCEEDS" ...
    "IMMEDIATELY" ...
    "MULTISET" ...
    "[" ...
    "FORMAT" ...
    
	at org.apache.calcite.sql.parser.babel.SqlBabelParserImpl.convertException(SqlBabelParserImpl.java:393)
	at org.apache.calcite.sql.parser.babel.SqlBabelParserImpl.normalizeException(SqlBabelParserImpl.java:157)
	at org.apache.calcite.sql.parser.SqlParser.handleException(SqlParser.java:145)
	at org.apache.calcite.sql.parser.SqlParser.parseQuery(SqlParser.java:160)
	at org.apache.pinot.sql.parsers.CalciteSqlParser.compileCalciteSqlToPinotQuery(CalciteSqlParser.java:332)
	... 18 more
Caused by: org.apache.calcite.sql.parser.babel.ParseException: Encountered "T09" at line 1, column 200.
Was expecting one of:
    <EOF> 
    "ORDER" ...
    "LIMIT" ...
    "OFFSET" ...
    "FETCH" ...
    "GROUP" ...
    "HAVING" ...
    "WINDOW" ...
    "UNION" ...
    "INTERSECT" ...
    "EXCEPT" ...
    "MINUS" ...
    "." ...
    "NOT" ...
    "IN" ...
    "<" ...
    "<=" ...
    ">" ...
    ">=" ...
    "=" ...
    "<>" ...
    "!=" ...
    "BETWEEN" ...
    "LIKE" ...
    "ILIKE" ...
    "RLIKE" ...
    "SIMILAR" ...
    "!" ...
    "~" ...
    "::" ...
    "+" ...
    "-" ...
    "*" ...
    "/" ...
    "%" ...
    "||" ...
    "AND" ...
    "OR" ...
    "IS" ...
    "MEMBER" ...
    "SUBMULTISET" ...
    "CONTAINS" ...
    "OVERLAPS" ...
    "EQUALS" ...
    "PRECEDES" ...
    "SUCCEEDS" ...
    "IMMEDIATELY" ...
    "MULTISET" ...
    "[" ...
    "FORMAT" ...
    
	at org.apache.calcite.sql.parser.babel.SqlBabelParserImpl.generateParseException(SqlBabelParserImpl.java:35553)
	at org.apache.calcite.sql.parser.babel.SqlBabelParserImpl.jj_consume_token(SqlBabelParserImpl.java:35367)
	at org.apache.calcite.sql.parser.babel.SqlBabelParserImpl.SqlStmtEof(SqlBabelParserImpl.java:965)
	at org.apache.calcite.sql.parser.babel.SqlBabelParserImpl.parseSqlStmtEof(SqlBabelParserImpl.java:205)
	at org.apache.calcite.sql.parser.SqlParser.parseQuery(SqlParser.java:158)
	... 19 more
x
cc: @Elon I think the time value generated ‘2022-10-18T091239.768’ should be quoted
btw, suggest using number epoch value as main time column, that’s more efficient
e
Yes, good catch! What version is this?
t
trino 399 (also tried 400, get a different grpc error there)
e
and what version of pinot do you use?
t
apachepinot/pinot:release-0.11.0
x
the calcite parser need single quote on string for literal
e
Error appears to be in time boundary handling
What type is the time column? also +1 to @Xiang Fu’s suggestion - that will be a workaround but we'd like to address the root cause.
x
this time value string
2022-10-18T09:12:39.768
has some special characters
e
have to look at the code - iirc it's bcz time column is assumed to be numeric.
x
so
Copy code
WHERE time_string >= 2022-10-18T09:12:39.768 AND (("time_string" < '2022-01-01T00:00:00'))
is wrong, it should be
Copy code
WHERE ("time_string" >= '2022-10-18T09:12:39.768') AND (("time_string" < '2022-01-01T00:00:00'))
e
Yep, will modify version for v400 - what's the grpc error you get with v400? That's supposed to be compatible with pinot 0.11.0
x
basically double quotes on identifier and single quotes on literal value, especially the string value
🙏 1
e
yep, looking at the code, will push a fix - also want to know about the grpc issue - hope it's not a bug:)
x
if it’s same query, I think it will be similar error, grpc may swallow some stacktrace at server side
💡 1
e
sgtm:)
x
yeah, cause server side will still parse the query with same compiler
e
yep - iirc this is bcz the time boundary code assumes numeric, should follow your suggestion above.
t
This is the Trino 400 error for the same query:
Copy code
SELECT * from pinot.default.uplinkpayloadevent WHERE time_string < '2022-01-01T00:00:00.000' ORDER BY time_string DESC

io.grpc.StatusRuntimeException: INVALID_ARGUMENT: Bad request
	at io.grpc.Status.asRuntimeException(Status.java:535)
	at io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:648)
	at io.trino.plugin.pinot.client.PinotGrpcDataFetcher$PinotGrpcServerQueryClient$ResponseIterator.computeNext(PinotGrpcDataFetcher.java:274)
	at io.trino.plugin.pinot.client.PinotGrpcDataFetcher$PinotGrpcServerQueryClient$ResponseIterator.computeNext(PinotGrpcDataFetcher.java:261)
	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:146)
	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:141)
	at io.trino.plugin.pinot.client.PinotGrpcDataFetcher.endOfData(PinotGrpcDataFetcher.java:85)
	at io.trino.plugin.pinot.PinotSegmentPageSource.getNextPage(PinotSegmentPageSource.java:117)
	at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:311)
	at io.trino.operator.Driver.processInternal(Driver.java:411)
	at io.trino.operator.Driver.lambda$process$10(Driver.java:314)
	at io.trino.operator.Driver.tryWithLock(Driver.java:706)
	at io.trino.operator.Driver.process(Driver.java:306)
	at io.trino.operator.Driver.processForDuration(Driver.java:277)
	at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:736)
	at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:164)
	at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:515)
	at io.trino.$gen.Trino_400____20221018_180612_2.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
e
ah, ok, that's in line w what @Xiang Fu mentioned above
👍 1
@Xiang Fu is the "debug/..." endpoint still the recommended way to retrieving the time boundary for a hybrid table?
i.e. broker
debug/timeBoundary/{tablename}
?
x
Also, suggest use epoch value or TIMESTAMP type for time column and have a human readable one (string) for your reference and debug.
Otherwise you will see performance hit later on
and you need to add it again
t
will this query work with the date time column using an epoch Long value instead? I can add a new column and define a transform for that..
x
yes in pinot, but I don’t know the behavior from trino side. If you want to use trino, I would suggest to go with long type epoch timestamp value
what has been tested for years
t
okay, thank you
@Xiang Fu I have gone ahead and • Added a new timestamp column
Copy code
{
  "name": "message_timestamp",
  "dataType": "LONG",
  "format": "1:MILLISECONDS:EPOCH",
  "granularity": "1:MILLISECONDS"
}
• Added a transform function to populate the new
Copy code
"transformFunction": "Groovy({
    def t_ingest = new groovy.json.JsonSlurper().parseText(message_str).get('__prop.t_ingest');
    !time_string || time_string < '2014-01-01 00:00:00.000000' ?
        (!t_ingest ? 1388552400 : Long.valueOf(t_ingest)) :
        Date.parse(\"yyyy-MM-dd'T'HH:mm:ss.SSS\", time_string).getTime();
}, time_string, message_str)"
• I updated both Realtime and Offline tables to use the new message_timestamp column for date times.
Copy code
"timeColumnName": "message_timestamp",
• Reloaded all segments and generated data for the new time column However, the pinot connector is still failing in trino with the same error message
e
Could be due to cache - to eliminate that possibility can you restart? If that works we will need to change the cache semantics.
Or wait a few more mins?
t
which pods should be restarted? trino’s? or pinot’s?
e
trino - I suspect it is caching older information.
oh wait
trino 🙂 if you updated the table configs you will need to restart the cluster for ingesting segments to pick them up. @Xiang Fu can you confirm if that's true?
t
restarted and downgraded back to 399, still seeing the same error
all of my realtime and realtime to offline segments have been reloaded and updated with the transform (I was getting a warning and now I see all epoch timestamp columns are populated)
will try restarting pinot as well
restarting pinot seemed to fix the trino queries
🚀 1
x
Trino has a metadata cache for about 2-5 mins
pinot side, once you update the table/schema config, you need to reload the table to let those configs reflect on already loaded segments/tables
And of course, restart pinot will do the trick
👍 1
e
Is there a separate reload api call? Or is it just the
reload
param in
TableConfigsRestletResource
?
t
I pushed this “reload all segments” button which seemed to apply the new transformations - which seems to be calling
segments/uplinkpayloadevent_OFFLINE/reload?type=OFFLINE
x
yes, this is the reload api
💡 1
e
Does that api also apply to when schema is changed? i.e. add a column to a table?
x
yes
🎉 1
e
nice!