This message was deleted Apache Druid #dev

Join Slack

This message was deleted.

# dev

Slackbot

10/03/2022, 10:46 PM

This message was deleted.

Cory Johannsen

10/03/2022, 10:46 PM

Copy code

if (httpResponseStatus.getCode() == 400) {
            // don't bother retrying if it's a bad request
            throw new IAE(exceptionMessage.toString());
          } else {
            throw new IOE(exceptionMessage.toString());
          }

Cory Johannsen

10/03/2022, 10:50 PM

When using HTTP discovery the status code is 400 when the request produces a "no route to host" resulting from the removal of a pod.

Cory Johannsen

10/03/2022, 10:59 PM

Since this situation can always occur if a pod is spontaneously lost, this becomes a normal condition. I'm considering handling this as an exception I can propagate that retains the information about the failure. 2 questions: 1. Is this the correct way to handle this? 2. Would it make sense to have another exception in addition to

NoTaskLocationException

and

TaskNotRunnableException

that represents this situation? Or does

TaskNotRunnableException

already cover this? I'm not clear on the exact intent of

TaskNotRunnableException

Cory Johannsen

10/03/2022, 11:16 PM

Oh, it looks like NoRouteToHost bubbles up as an IOException that I can trap in my task client. Perhaps this will work as-is.

Cory Johannsen

10/03/2022, 11:38 PM

Confirmed. I can handle the NoRouteToHost in my supervisor.

Gian Merlino

10/04/2022, 5:22 AM

makes sense that it can be detected this way: it's not really an HTTP protocol-level error — it's something lower level

Abhishek Agarwal

10/04/2022, 6:04 AM

curious - which component here is producing 400 for "No route to host" since that isn't technically a 400.

Cory Johannsen

10/04/2022, 4:20 PM

Here is the full exception from the logs

Copy code

{
  "level": "ERROR",
  "@timestamp": "2022-10-04T16:18:56.598Z",
  "thread": "HttpServerInventoryView-3",
  "message": "failed to get sync response from [<http://10.4.154.126:8091/_1664898749884>]. Return code [0], Reason: [null]",
  "exception": {
    "exception_class": "io.netty.channel.ChannelException",
    "exception_message": "Faulty channel in resource pool",
    "stacktrace": "io.netty.channel.ChannelException: Faulty channel in resource pool\n\tat org.apache.druid.java.util.http.client.NettyHttpClient.go(NettyHttpClient.java:125)\n\tat org.apache.druid.server.coordination.ChangeRequestHttpSyncer.sync(ChangeRequestHttpSyncer.java:218)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: io.netty.channel.AbstractChannel$AnnotatedNoRouteToHostException: Host is unreachable: /10.4.154.126:8091\nCaused by: java.net.NoRouteToHostException: Host is unreachable\n\tat java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)\n\tat java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777)\n\tat io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337)\n\tat io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:710)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:658)\n\tat io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:584)\n\tat io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496)\n\tat io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:995)\n\tat io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\n"
  },
  "hostName": "storage--druid-coordinator-589db54d45-x4ndp"
}

Cory Johannsen

10/04/2022, 4:21 PM

ChangeRequestHttpSyncer

is the Druid class making the call

2 Views

Open in Slack

Previous Next