This message was deleted.
# troubleshooting
s
This message was deleted.
g
IIRC the main way this happens is when the client connects to a Broker that is different from the one that originally served the connection
Do you have more than one Broker?
Using the Router would achieve this
Also, there is a driver-side ability to transparently reconnect when this happens. See
transparent_reconnection
on https://calcite.apache.org/avatica/docs/client_reference.html
Generally, for Druid, since it's read-only, it's good to set it to
true
n
Thanks a. lot @Gian Merlino Will try these and let you know if that helps. Appreciate your help!
Do you have more than one Broker?
Yes, we have multiple brokers, 3 at this time.
@Gian Merlino in the router config doc, it says-
Copy code
If no druid.router.avatica.balancer property is set, the Router will also default to using the Rendezvous Hash Balancer.
Were you referring to this configuration (
druid.router.avatica.balancer
)? If yes, given we are not setting it explicitly, it's defaulting to
Rendezvous Hash Balancer
. Won't this be sufficient to take care of stickiness?
Also, documentation for avatica-query-balancing gives an example of runtime.prioperties at the bottom that has the following two properties-
Copy code
# Number of threads used by the Router proxy http client
druid.router.http.numMaxThreads=100

druid.server.http.numThreads=100
The doc explains
druid.router.http.numMaxThreads
for router. However, I didn't find
druid.server.http.numThreads
explained anywhere for the router. What's the difference between the two?
If so, make sure you're getting sticky connections:
@Gian Merlino I added the following configuration to our routers. The number of
org.apache.calcite.avatica.NoSuchConnectionException
exceptions actually increased and they are more consistently happening now.
g
hmm. yes I would think the default settings would be sufficient…
couple questions: 1) if you point your JDBC client at a Broker directly do you still see this? [if not: it's likely something related to stickiness] 2) to debug stickiness, try enabling DEBUG logging for the
org.apache.druid.server.router
package. you'll see logs like this:
"Balancer class [%s] sending request with connectionId [%s] to server: %s"
n
Thanks @Gian Merlino Let me try this and get back.
One thing I forgot to mention, our Druid cluster is deployed in K8S with istio service mesh. Druid nodes communicate with each other via envoy proxy sidecar. I am now suspecting the issue is related to that. Likely, stickiness of the connections is not being maintained when requests are proxied through the sidecar. Will dig into it more but let me know if you have any thoughts or ideas that might be useful. Thanks again for all your help with this.
d
I think you are on the right track @NKorade . Make sure every single layer of your HTTP proxy maintains the stickiness and same connection timeout settings.
g
Hi @NKorade I am getting "org.apache.druid.sql.avatica.DruidMeta - No such connection: guid" quite a lot too. • Using K8s and istio • Connecting to router and not broker from client • Using JDBC • Have 2 brokers and 2 routers Hence it seem my issue has a lot of overlap with yours. Can you let me know how you solved this? Trying my luck out here 🤞
a
Seems like you share the stickyness problem. But we had this issue even on single node installations as well and the issue was the Avatica's connectionIdleTimeout vs Hikari's keepaliveTime and maxLifetime. If Hikari has longer timeouts than Avatica then on reconnect the broker couldn't find the connection. I suspect you have a stickiness issue since we got "No such connection: null" instead of "No such connection: guid" but checking for valid timeouts is a legit move in any case.
👍 1
g
I tried the above so that Avatica is more than Hikari timeout. Also reduced the number of brokers and router to 1. So ideally there should not be any stickiness related issue. But I am still getting:
ERROR [qtp200377362-194] org.apache.druid.sql.avatica.DruidMeta - No such connection: 1479364d-d077-4ade-bd05-f1277ade7ec1