This message was deleted Apache Druid #troubleshooting

Join Slack

This message was deleted.

# troubleshooting

Slackbot

05/16/2023, 12:36 AM

This message was deleted.

Gian Merlino

05/16/2023, 1:11 AM

IIRC the main way this happens is when the client connects to a Broker that is different from the one that originally served the connection

Gian Merlino

05/16/2023, 1:11 AM

Do you have more than one Broker?

Gian Merlino

05/16/2023, 1:12 AM

If so, make sure you're getting sticky connections: https://druid.apache.org/docs/latest/querying/sql-jdbc.html#connection-stickiness

Gian Merlino

05/16/2023, 1:12 AM

Using the Router would achieve this

Gian Merlino

05/16/2023, 1:12 AM

Also, there is a driver-side ability to transparently reconnect when this happens. See

transparent_reconnection

on https://calcite.apache.org/avatica/docs/client_reference.html

Gian Merlino

05/16/2023, 1:13 AM

Generally, for Druid, since it's read-only, it's good to set it to

true

NKorade

05/16/2023, 7:14 PM

Thanks a. lot @Gian Merlino Will try these and let you know if that helps. Appreciate your help!

NKorade

05/16/2023, 7:14 PM

Do you have more than one Broker?

Yes, we have multiple brokers, 3 at this time.

NKorade

05/16/2023, 7:30 PM

@Gian Merlino in the router config doc, it says-

Copy code

If no druid.router.avatica.balancer property is set, the Router will also default to using the Rendezvous Hash Balancer.

Were you referring to this configuration (

druid.router.avatica.balancer

)? If yes, given we are not setting it explicitly, it's defaulting to

Rendezvous Hash Balancer

. Won't this be sufficient to take care of stickiness?

NKorade

05/17/2023, 12:49 AM

Also, documentation for avatica-query-balancing gives an example of runtime.prioperties at the bottom that has the following two properties-

Copy code

# Number of threads used by the Router proxy http client
druid.router.http.numMaxThreads=100

druid.server.http.numThreads=100

The doc explains

druid.router.http.numMaxThreads

for router. However, I didn't find

druid.server.http.numThreads

explained anywhere for the router. What's the difference between the two?

NKorade

05/17/2023, 12:51 AM

If so, make sure you're getting sticky connections:

@Gian Merlino I added the following configuration to our routers. The number of

org.apache.calcite.avatica.NoSuchConnectionException

exceptions actually increased and they are more consistently happening now.

Gian Merlino

05/17/2023, 3:45 AM

hmm. yes I would think the default settings would be sufficient…

Gian Merlino

05/17/2023, 3:50 AM

couple questions: 1) if you point your JDBC client at a Broker directly do you still see this? [if not: it's likely something related to stickiness] 2) to debug stickiness, try enabling DEBUG logging for the

org.apache.druid.server.router

package. you'll see logs like this:

"Balancer class [%s] sending request with connectionId [%s] to server: %s"

NKorade

05/17/2023, 4:36 AM

Thanks @Gian Merlino Let me try this and get back.

NKorade

05/17/2023, 4:36 AM

One thing I forgot to mention, our Druid cluster is deployed in K8S with istio service mesh. Druid nodes communicate with each other via envoy proxy sidecar. I am now suspecting the issue is related to that. Likely, stickiness of the connections is not being maintained when requests are proxied through the sidecar. Will dig into it more but let me know if you have any thoughts or ideas that might be useful. Thanks again for all your help with this.

Didip Kerabat

05/18/2023, 2:28 AM

I think you are on the right track @NKorade . Make sure every single layer of your HTTP proxy maintains the stickiness and same connection timeout settings.

Ganesh Asokan

07/27/2023, 6:16 AM

Hi @NKorade I am getting "org.apache.druid.sql.avatica.DruidMeta - No such connection: guid" quite a lot too. • Using K8s and istio • Connecting to router and not broker from client • Using JDBC • Have 2 brokers and 2 routers Hence it seem my issue has a lot of overlap with yours. Can you let me know how you solved this? Trying my luck out here 🤞

Aggelos Karalias

07/27/2023, 7:00 AM

Seems like you share the stickyness problem. But we had this issue even on single node installations as well and the issue was the Avatica's connectionIdleTimeout vs Hikari's keepaliveTime and maxLifetime. If Hikari has longer timeouts than Avatica then on reconnect the broker couldn't find the connection. I suspect you have a stickiness issue since we got "No such connection: null" instead of "No such connection: guid" but checking for valid timeouts is a legit move in any case.

👍 1

Ganesh Asokan

07/27/2023, 1:33 PM

I tried the above so that Avatica is more than Hikari timeout. Also reduced the number of brokers and router to 1. So ideally there should not be any stickiness related issue. But I am still getting:

ERROR [qtp200377362-194] org.apache.druid.sql.avatica.DruidMeta - No such connection: 1479364d-d077-4ade-bd05-f1277ade7ec1

23 Views

Open in Slack

Previous Next