Guys I merged the latest changes into our fork built again t Apache Pinot #troubleshooting

Guys, I merged the latest changes into our fork, b...

Alexander Vivas

02/11/2021, 3:04 PM

Guys, I merged the latest changes into our fork, built again the docker image, deployed and now when we try to create a table we see this error in the controller:

ClassNotFoundException: org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory

Any suggestions?

Daniel Lavoie

02/11/2021, 3:05 PM

Are you running a customized PluginDir value?

Alexander Vivas

02/11/2021, 3:07 PM

Nope, actually we're using almost everything as it was in the first place

Alexander Vivas

02/11/2021, 3:07 PM

pluginsDir: /opt/pinot/plugins

Daniel Lavoie

02/11/2021, 3:08 PM

what about

plugins.include

Alexander Vivas

02/11/2021, 3:08 PM

This is the portion in values.yaml used to deploy in a kubernetes environment using helm:

Alexander Vivas

02/11/2021, 3:10 PM

sample.yaml

Daniel Lavoie

02/11/2021, 3:12 PM

can you list the content

/opt/pinot/plugins

of your within your forked image?

Daniel Lavoie

02/11/2021, 3:15 PM

Also, can you extract your controller logs using this command?

Copy code

kubectl exec <controller-pod-name> -- cat pinotController.log > controller.log

The startup logs all plugins being loaded

Alexander Vivas

02/11/2021, 3:21 PM

This is what I see in

/opt/pinot/plugins

Daniel Lavoie

02/11/2021, 3:22 PM

If you can share your controller logs we’ll have a better understand of what is going on

Alexander Vivas

02/11/2021, 3:40 PM

controller.log

Daniel Lavoie

02/11/2021, 3:42 PM

Somehow, you have provided

pinot-gcs

plugins.include

. the plugin is loaded by default, so you don’t need specify it. Overwriting plugins.include will disable all other plugins, including the kafka one.

Alexander Vivas

02/11/2021, 3:42 PM

Ah... I did it as we saw here: https://docs.pinot.apache.org/basics/data-import/pinot-file-system/import-from-gcp

Daniel Lavoie

02/11/2021, 3:43 PM

Yes, the documentation may induce in error since it doesn’t mention that 1- other plugins will be disabled, 2- it’s already part of the docker image.

Daniel Lavoie

02/11/2021, 3:44 PM

I don’t see anything in your

values.yaml

, so I guess it is part of your docker image fork?

Alexander Vivas

02/11/2021, 3:45 PM

It's in the

jvmOpts

section of

values.yaml

Daniel Lavoie

02/11/2021, 3:45 PM

Not for the controller

Daniel Lavoie

02/11/2021, 3:46 PM

You only shared the controller values

Alexander Vivas

02/11/2021, 3:49 PM

Sorry, wrong file

Alexander Vivas

02/11/2021, 3:50 PM

This is the one

Alexander Vivas

02/11/2021, 3:50 PM

sample.yaml

Daniel Lavoie

02/11/2021, 3:50 PM

anyways, just remove

-Dplugins.dir=/opt/pinot/plugins -Dplugins.include=pinot-gcs

from all your jvmArgs

Alexander Vivas

02/11/2021, 3:51 PM

Okay, I'll try that

Alexander Vivas

02/12/2021, 10:32 AM

I did that and I just deployed our table in pinot. For that table I set this properties:

Copy code

"realtime.segment.flush.threshold.time":"24h"
"realtime.segment.flush.threshold.size":"0"
"realtime.segment.flush.desired.size":"500M"

The thing is, despite having set 24 hours and 500MB as our limits in time and size for segments, I see this behavior in gcs, not sure if that's a good sign or not:

Alexander Vivas

02/12/2021, 10:33 AM

It's been consuming by roughly ten minutes or so and there is already 9 segments, one of them is not showing here yet, but look at those sizes, is that okay?

Daniel Lavoie

02/12/2021, 1:17 PM

I’m not sure but I don’t think threshold.size 0 is helping out

Daniel Lavoie

02/12/2021, 1:21 PM

https://docs.pinot.apache.org/configuration-reference/table#realtime-table-config

Daniel Lavoie

02/12/2021, 1:22 PM

Copy code

realtime.segment.flush.threshold.segment.size
realtime.segment.flush.threshold.time
realtime.segment.flush.threshold.rows

Daniel Lavoie

02/12/2021, 1:23 PM

So you want something like

Copy code

"realtime.segment.flush.threshold.time":"24h",
"realtime.segment.flush.threshold.size":"500M",
"realtime.segment.flush.threshold.rows": "500000000",

Default values for

Daniel Lavoie

02/12/2021, 1:24 PM

But my guess is that 500m records will always be bigger than 500MB 😛 So I would leave rows to default value.

Alexander Vivas

02/12/2021, 1:30 PM

Nice! I'll try those out

Alexander Vivas

02/12/2021, 1:30 PM

Thanks

Alexander Vivas

02/12/2021, 2:09 PM

So, just to confirm, this should be the properties to go on the table config, right?

Copy code

"realtime.segment.flush.threshold.time":"24h"
"realtime.segment.flush.threshold.segment.size":"500M"
"realtime.segment.flush.threshold.rows": "5000000"

Daniel Lavoie

02/12/2021, 2:09 PM

Yes next to your kafka config

Daniel Lavoie

02/12/2021, 2:09 PM

The rows config is not required since 5 million is the default

Alexander Vivas

02/12/2021, 2:10 PM

Ah, okay, nice

Alexander Vivas

02/12/2021, 2:33 PM

Interesting, now it consumed 5 million messages from kafka and stopped consuming, is that okay?

Daniel Lavoie

02/12/2021, 2:34 PM

What are you logs saying?

Alexander Vivas

02/12/2021, 2:34 PM

Created only one segment, every time I try to get the number of records in pinot I get different results, I presume this is because the query get to a server that doesn't hold yet the whole segment, we have 2 replicas configured

Daniel Lavoie

02/12/2021, 2:35 PM

Check the server logs

Alexander Vivas

02/12/2021, 2:38 PM

server.log

Daniel Lavoie

02/12/2021, 2:40 PM

Zookeeper is down.

Copy code

21/02/12 10:26:44.302 WARN [ClientCnxn] [Start a Pinot [SERVER]-SendThread(mls-zookeeper.production.svc.cluster.local:2181)] Session 0x100722c6f210005 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused

Daniel Lavoie

02/12/2021, 2:41 PM

Unless that is an old message. Can you share the controller logs too?

Alexander Vivas

02/12/2021, 2:41 PM

No, actually it's been quite stable for now

Alexander Vivas

02/12/2021, 2:41 PM

I think that might be because I redeployed everything to use gcs in an attempt to configure it as the deep store

Alexander Vivas

02/12/2021, 2:45 PM

Another thing is, It stopped consuming after reaching 5 million messages but still hasn't stored the segment in gcs as it was doing previously

Daniel Lavoie

02/12/2021, 2:46 PM

Can you share the controller logs?

Daniel Lavoie

02/12/2021, 2:46 PM

there’s a high chance it’s simply failing to store the segment

Daniel Lavoie

02/12/2021, 2:47 PM

General tip, whenever something is not working as expected, always check all the logs first

Alexander Vivas

02/12/2021, 2:56 PM

But I don't see any errors regarding gcs here

controller (1).log

Daniel Lavoie

02/12/2021, 2:57 PM

How many controllers do you have?

Alexander Vivas

02/12/2021, 2:57 PM

Daniel Lavoie

02/12/2021, 2:57 PM

get the logs from all of them 🙂

Alexander Vivas

02/12/2021, 2:58 PM

Okay, hold on

Alexander Vivas

02/12/2021, 3:19 PM

I see more data in controller 2,

controller-1.log controller-2.log controller-0.log

Alexander Vivas

02/12/2021, 3:20 PM

It took it quite some time to upload the segment data to gcs, but now I see it there. It didn't reach the 500MB goal though

Alexander Vivas

02/12/2021, 3:20 PM

Screenshot 2021-02-12 at 16.20.14.png

Alexander Vivas

02/12/2021, 3:21 PM

I saw a log regarding the minion component, I only started 1 instance with the default config values because I wasn't sure how we were supposed to use it

Alexander Vivas

02/12/2021, 3:21 PM

Now I see pinot started consuming again

Alexander Vivas

02/12/2021, 3:21 PM

Didn't change anything, I was just getting the logs out of every controller instance

Daniel Lavoie

02/12/2021, 3:22 PM

how many servers do you have?

Alexander Vivas

02/12/2021, 3:22 PM

We have 3 servers

Daniel Lavoie

02/12/2021, 3:22 PM

pinot-server

Alexander Vivas

02/12/2021, 3:23 PM

Yep, 3 pinot server instances

Daniel Lavoie

02/12/2021, 3:23 PM

ok, keep monitoring and tell me if you keep witnessing the same behavior. You could also increase the

rows

value 10 000 000, your segments should double in size

Alexander Vivas

02/12/2021, 3:24 PM

Okay, and then regarding the minion instance, should I also provide the same resources to it? How many minion instances should we have with this architecture?

Daniel Lavoie

02/12/2021, 3:25 PM

Minion is used for operational tasks such as scheduled batch ingestion.

Daniel Lavoie

02/12/2021, 3:25 PM

If you only need realtime streaming, you don’t need minion instances

Alexander Vivas

02/12/2021, 3:26 PM

Okay, then if I understood correctly, if we were to have offline tables and ingest data from any other source then we should increase the minion resources and instances. Is that correct?

Daniel Lavoie

02/12/2021, 3:27 PM

Do your ingestion need, yes

Daniel Lavoie

02/12/2021, 3:27 PM

Typically, a bunch of workers for an initial load, then just the right amount for the periodic new files to ingest

Alexander Vivas

02/12/2021, 3:28 PM

Ah, okay, so it depends on the workload for the batch ingestion

Alexander Vivas

02/12/2021, 3:28 PM

Many thanks!

Daniel Lavoie

02/12/2021, 3:29 PM

Yes, it scales horizontally pretty well, it’s usually CPU bottlenecked because it’s responsible for generating the segments.

Alexander Vivas

02/12/2021, 6:28 PM

It works perfectly!

Alexander Vivas

02/12/2021, 6:28 PM

Last question, can we use this https://docs.pinot.apache.org/basics/data-import/pinot-file-system/import-from-gcp#job-spec to import data exported from bigquery?

Daniel Lavoie

02/12/2021, 6:30 PM

As long as your can export the data with a schema matching the one of pinot, yes you can

Alexander Vivas

02/12/2021, 6:30 PM

Thanks

Open in Slack

Previous Next