Hello, Has anyone succesfully configured Pinot to ...
# general
p
Hello, Has anyone succesfully configured Pinot to work in Trino in a Kubernetes environment? Following their documentation, they mention that
The Pinot broker and server must be accessible via DNS as Pinot returns hostnames and not IP addresses.
, does this mean the actual pods or the services? Can someone share what their configurations look like? I've tried the trino slack unsuccessfully...
m
@User ^^
p
FYI, I've been looking at the pinot connector source code, this are my preliminary findings. When querying trino for table data in pinot I get the following message:
Query 20210615_110517_00024_m2a8t failed: No valid brokers found for hitexecutionview
, where
hitexecutionview
is the name of my table but lowercased. In the connector source code this message occurs when the connector is the brokers for a given table. This search is done by a cache which is populated using the REST
tables/<table name>/instances
GET endpoint from the Table API. So far I see 2 distinct points of concern: 1.) Table name being lowercased means that the api is unable to find the table. 2.) The REST call is hardcoded to use the `http` scheme. In my pinot setup, I can't use the swagger api with the
http
scheme, only
https
e
Is this a mixed case table?
p
Real-time
e
I mean, in pinot is the table mixed case? i.e. hitExecutionView or something similar?
m
I think so
Does that matter @User?
e
Yes, for the connector. We have an unmerged pull request that handles this. Currently trino lowercases all sql identifiers. We do handle mixed case columns, but not tables yet.
p
Yes it is Elon
I am now trying a lowercase table
Do columns fields also have to be lowercase?
e
Nope, that should be handled
p
Alright 👍
m
THanks @User.
Good candidate to be added into FAQ? #C023BNDT0N8 🙂
😁 1
e
Sure, lmk if you have any more issues.
p
So futher updates, after lowercasing the table when I try to query the table the requests time-out.
Copy code
trino:default> select * from hitexecutionview limit 1;

Query 20210615_164649_00006_ag23g, FAILED, 1 node
Splits: 49 total, 0 done (0.00%)
1:00 [0 rows, 0B] [0 rows/s, 0B/s]

Query 20210615_164649_00006_ag23g failed: null value in entry: Server_pinot-server-1.pinot-server-headless.dc-pinot.svc.cluster.local_8098=null
In the pinot sql UI this query takes 5ms
e
Is
pinot.set.instance.id.to.hostname
set to true in the server config?
p
In pinot or trino?
e
Good point:) In pinot
p
Edit, actually it is, yes
e
Try setting that in pinot, and just to be safe (if you can) restart trino before retrying the select.
oh
Can you try
Copy code
select * from "select * from hitexecutionview limit 1"
in trino?
p
Copy code
trino:default> select * from "select * from hitexecutionview limit 1"
            -> ;

Query 20210615_165418_00007_ag23g, FAILED, 1 node
Splits: 17 total, 0 done (0.00%)
0.83 [0 rows, 0B] [0 rows/s, 0B/s]

Query 20210615_165418_00007_ag23g failed: Internal error
e
Maybe try restarting trino (just in case it's something with cached information from the other table)?
Also, enable
--debug
in the trino cli, so we can see the stack trace
p
restart the trino cli or the pods?
e
the trino pods
p
Copy code
$ kubectl -n dc-trino exec -it trino-coordinator-7ddccf54fc-cbq5m -- /bin/bash
[trino@trino-coordinator-7ddccf54fc-cbq5m /]$ trino
trino> exit
[trino@trino-coordinator-7ddccf54fc-cbq5m /]$ trino --debug --catalog pinot --schema default
trino:default> select * from hitexecutionview limit 1;

Query 20210615_170048_00002_ieyau, FAILED, 1 node
<http://localhost:8080/ui/query.html?20210615_170048_00002_ieyau>
Splits: 97 total, 0 done (0.00%)
CPU Time: 0.0s total,     0 rows/s,     0B/s, 100% active
Per Node: 0.0 parallelism,     0 rows/s,     0B/s
Parallelism: 0.0
Peak Memory: 0B
1:01 [0 rows, 0B] [0 rows/s, 0B/s]

Query 20210615_170048_00002_ieyau failed: null value in entry: Server_pinot-server-0.pinot-server-headless.dc-pinot.svc.cluster.local_8098=null
java.lang.NullPointerException: null value in entry: Server_pinot-server-0.pinot-server-headless.dc-pinot.svc.cluster.local_8098=null
	at com.google.common.collect.CollectPreconditions.checkEntryNotNull(CollectPreconditions.java:32)
	at com.google.common.collect.SingletonImmutableBiMap.<init>(SingletonImmutableBiMap.java:42)
	at com.google.common.collect.ImmutableBiMap.of(ImmutableBiMap.java:72)
	at com.google.common.collect.ImmutableMap.of(ImmutableMap.java:119)
	at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:454)
	at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:433)
	at io.trino.plugin.pinot.PinotSegmentPageSource.queryPinot(PinotSegmentPageSource.java:221)
	at io.trino.plugin.pinot.PinotSegmentPageSource.fetchPinotData(PinotSegmentPageSource.java:182)
	at io.trino.plugin.pinot.PinotSegmentPageSource.getNextPage(PinotSegmentPageSource.java:150)
	at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:301)
	at io.trino.operator.Driver.processInternal(Driver.java:387)
	at io.trino.operator.Driver.lambda$processFor$9(Driver.java:291)
	at io.trino.operator.Driver.tryWithLock(Driver.java:683)
	at io.trino.operator.Driver.processFor(Driver.java:284)
	at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1075)
	at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
	at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
	at io.trino.$gen.Trino_358____20210615_165930_2.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
e
hmmm - is pinot running in kube but trino running locally?
p
No, both are running in k8s in separate namespaces
e
But same k8s cluster?
p
yes
e
hmm really strange - we use pinot in k8s (version 0.6.0) (and latest trino connector). What version of pinot are you using?
I might try to repro locally...
Can you run
desc hitexecutionview
from trino cli?
p
Pinot 0.7.1
👍 1
Copy code
trino:default> desc hitexecutionview
            -> ;
           Column           |  Type   | Extra | Comment 
----------------------------+---------+-------+---------
 jobstatus                  | bigint  |       |         
 gender                     | varchar |       |         
 rightclickcount            | bigint  |       |         
 internalcrowdmemberid      | bigint  |       |         
 externalcrowdmemberid      | varchar |       |         
 internalhitexecutionid     | bigint  |       |         
 accuracy                   | varchar |       |         
 inputforuicontrols         | varchar |       |         
 audiolength                | bigint  |       |         
 hitstatus                  | bigint  |       |         
 internaljobid              | bigint  |       |         
 internaljobmemberid        | bigint  |       |         
 priceperunit               | double  |       |         
 result                     | varchar |       |         
 dateofbirthms              | bigint  |       |         
 baseinternalhitid          | bigint  |       |         
 dateofevent                | bigint  |       |         
 beforeinputdelay           | bigint  |       |         
 jobtypecode                | varchar |       |         
 externalhitexecutionid     | varchar |       |         
 cancelreason               | bigint  |       |         
 keyscount                  | bigint  |       |         
 externaljobmemberid        | varchar |       |         
 isjobmembervendor          | varchar |       |         
 activetime                 | bigint  |       |         
 isrta                      | varchar |       |         
 fluency                    | varchar |       |         
 dateofbirth                | varchar |       |         
 languagecode               | varchar |       |         
 leftclickcount             | bigint  |       |         
 baseinternalhitexecutionid | bigint  |       |         
 source-intelligible        | varchar |       |         
 externalprojectid          | varchar |       |         
 externaljobid              | varchar |       |         
 totaltasktime              | bigint  |       |         
 translation                | varchar |       |         
 isqualification            | varchar |       |         
 skipreason                 | varchar |       |         
 messagetimestamp           | varchar |       |         
 externalhitid              | varchar |       |         
 hitexecutionstatus         | bigint  |       |         
(41 rows)

Query 20210615_171235_00003_ieyau, FINISHED, 2 nodes
<http://localhost:8080/ui/query.html?20210615_171235_00003_ieyau>
Splits: 19 total, 19 done (100.00%)
CPU Time: 0.0s total, 1.32K rows/s,  106KB/s, 55% active
Per Node: 0.0 parallelism,    64 rows/s, 5.18KB/s
Parallelism: 0.1
Peak Memory: 0B
0.32 [41 rows, 3.3KB] [128 rows/s, 10.4KB/s]
e
Nice, so that works.
Trying to see if there's any changes between 0.6.0 and 0.7.1 that would affect the routing table.
p
metadata queries all seem to work, its just data that does not
e
Yep, looks like it tried to run the query and got back a null response
We are still poc'ing the upgrade, still using pinot 0.6.0
p
how does the connector work when running the query?
Does it serialize the query and send it through a REST endpoint?
e
Yep, I just read from the beginning - your pinot cluster is only accessible via https?
That query uses the PinotQueryClient
p
Yes, http does not work, can not tell you why though. @User was the one that told me to use https schema in the swagger client
👍 1
e
Trying to see if this is due to changes in 0.7.1 - would it be possible to spin up a test pinot 0.6.0 cluster and try it?
p
I think so, let me try...
Unfortunately I'm not able to @User. I have a hard dependency on the following image: pinot:0.8.0-SNAPSHOT-f15225f9c-20210523-jdk8 because of some fixes I need.
Is there any way I can help the upgrade of the connector to a more recent pinot version?
e
We're working on that too, should have a pr ready (there are many pr's in flight right now). If I use the pinot 0.7.1 image should that be compatible with the snapshot you mentioned above?
Want to try this locally to debug it.
p
Can't confirm but I'm hopeful that yes. The image I'm using is from 22nd of May 2021 (https://github.com/apache/incubator-pinot/commit/f15225f9c8abe8d9efa52c31c00f0d7418b368eb), 0.7.1 was released on April 15th, that's ~37 days of potential changes.
e
Ok, we were working on updating the pinot connector to use 0.7.1 (latest stable) but 0.8.0 may be coming out very soon, first trying 0.7.1 and will see what happens:)
Will update shortly...
p
Thank you very very much for all the assistence Elon, if there is anything I can do to help please let me know.
e
Sure:)
p
Status update, I managed to try a downgraded 0.7.1 cluster, the connector seems to work!
Copy code
trino:default> select externalHitExecutionId from hitexecutionview limit 1;
        externalHitExecutionId        
--------------------------------------
 18dc8417-346b-4ff8-a440-0d0b8b6e0e72 
(1 row)

Query 20210616_085001_00007_unjsj, FINISHED, 1 node
Splits: 33 total, 33 done (100.00%)
0.44 [16 rows, 576B] [36 rows/s, 1.29KB/s]
It would seem that between 0.7.1 and the image I'm using in production f15225f9c8abe8d9efa52c31c00f0d7418b368eb there is some incompatible change with the current connector.
Curiously, it seems star selects do not work:
Copy code
trino:default> select * from hitexecutionview;
 jobstatus | gender | rightclickcount | internalcrowdmemberid | externalcrowdmemberid | internalhitexecutionid | accuracy | inputforuicontrols | audiolength | hitstatus | internaljobid | internaljobmembe
-----------+--------+-----------------+-----------------------+-----------------------+------------------------+----------+--------------------+-------------+-----------+---------------+-----------------
(0 rows)

Query 20210616_104504_00063_unjsj, FINISHED, 1 node
Splits: 32 total, 32 done (100.00%)
0.23 [0 rows, 0B] [0 rows/s, 0B/s]
e
That's great! Also verified that the latest image (java 8 and java 11 versions) does not work with the current connector. We are working on updating it.
👍 1
Looks like changes to the DataTable interface, getting "Unsupported data table version 3"....
p
Is there an easy fix?
e
Looking into it - not sure how much 0.8.0-SNAPSHOT will change before 0.8.0 is released. We were working on 0.7.1, might just jump to 0.8.0 - I think it will be released soon.
0.7.1 is the latest published release.
m
@User yes DataTable has an incompatible change In one of the PRs
👍 1
e
Is there a way to upgrade formats? Or does upgrading pinot also upgrade all the tables?
Or should we support both the older and newer formats?
Also, verified that 0.7.1 "passthrough queries" do not work with real and double arrays (updating w fix). I am actively working on 0.7.1 and think we should support both versions once 0.8.0 is released - wdyt @User, @User?
p
Please forgive my lack of knowledge. What are passthrough queries?
m
Yeah, if we can support both somehow that would be good. Might need some api in Pinot to get release version?
e
that would be great!
iirc wasn't there one?
@User - a "passthrough" query allows you to "pass" a query directly into pinot:
Copy code
select * from "select ... from pinot_table where ..."
👍 1
p
TIL 🙂
😁 1
e
Useful if you want to use pinot udf's
or guarantee filters and aggregations are pushed down (until aggregation pushdown pr is merged)
looks like there's a
/version
api
in the controller...
m
Oh cool
🏄 1
e
do you know when 0.8.0 will be out? Seems like there are so many great features, can't wait:)
m
We should start on that soon. We are currently working on graduation
p
Do you have an idea on a fuzzy ETA?
m
May be in the next month or so (One of the committers would have to pick up this task).
❤️ 2