This message was deleted.
# dev
s
This message was deleted.
c
The NPE that is produced
Copy code
java.lang.NullPointerException: null
        at org.apache.druid.client.HttpServerInventoryView$2.toDruidServer(HttpServerInventoryView.java:181) 
        at org.apache.druid.client.HttpServerInventoryView$2.lambda$nodesRemoved$1(HttpServerInventoryView.java:163)
comes from
Copy code
private DruidServer toDruidServer(DiscoveryDruidNode node)
              {

                return new DruidServer(
                    node.getDruidNode().getHostAndPortToUse(),
                    node.getDruidNode().getHostAndPort(),
                    node.getDruidNode().getHostAndTlsPort(),
                    ((DataNodeService) node.getServices().get(DataNodeService.DISCOVERY_SERVICE_KEY)).getMaxSize(),
                    ((DataNodeService) node.getServices().get(DataNodeService.DISCOVERY_SERVICE_KEY)).getServerType(),
                    ((DataNodeService) node.getServices().get(DataNodeService.DISCOVERY_SERVICE_KEY)).getTier(),
                    ((DataNodeService) node.getServices().get(DataNodeService.DISCOVERY_SERVICE_KEY)).getPriority()
                );
              }
            }
Specifically,
((DataNodeService) node.getServices().get(DataNodeService.DISCOVERY_SERVICE_KEY))
returns null for the
PEON
as its
services
map is empty.
I am now stuck trying to determine why the
PEON
event is causing the problem.
I have determined that the underlying issue with the k8s extension is that the
PEON
and
INDEXER
services that are announced by an indexer pod when it starts up are both on the same host/port. This results in a collision in the
DruidNodeDiscoveryProvider
serviceDiscoveryMap
which uses the host and port string as a key. I patched the k8s extension to prevent the notification of the downstream listeners if a pod deletion arrives for a druid node with no services. That prevents the issue, but makes me wonder if this
PEON
should even be getting announced. It has no services declared, so it can't be assigned any work. If this
PEON
announcement is the issue then the real fix is prevent it. Is it expected that the indexer announces a peon with no services at startup? If yes, what is the purpose of this peon. If no then I can determine why the k8s extension is making this announcement.
At indexer starup the coordinator sees the following
Copy code
│ 2023-01-11T22:02:02.433753502Z [K8sDruidNodeDiscoveryProvider-ListenerExecutor] Node[DiscoveryDruidNode{druidNode=DruidNode{serviceName='druid/indexer', host='10.0.183.191', bindOnHost=false, port=-1, plaintextPort=8091, enablePlai │
│ ntextPort=true, tlsPort=-1, enableTlsPort=false}, nodeRole='PEON', services={}}] discovered but doesn't have service[lookupNodeService]. Ignored.                                                                                       │
│ 2023-01-11T22:02:02.434313196Z [K8sDruidNodeDiscoveryProvider-ListenerExecutor] Node[DiscoveryDruidNode{druidNode=DruidNode{serviceName='druid/indexer', host='10.4.151.172', bindOnHost=false, port=-1, plaintextPort=8091, enablePlai │
│ ntextPort=true, tlsPort=-1, enableTlsPort=false}, nodeRole='PEON', services={}}] discovered but doesn't have service[lookupNodeService]. Ignored.
Indicating that the peon that has been announced is essentially invalid. This leads me to believe that either the peon announcement should not occur or that it is missing information.
Copy code
[main] Announcing DiscoveryDruidNode[DiscoveryDruidNode{druidNode=DruidNode{serviceName='druid/indexer', host='10.0.96.19', bindOnHost=false, port=-1, plaintextPort=8091, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeRole='INDEXER', services={dataNodeService=DataNodeService{tier='_default_tier', maxSize=0, serverType=indexer-executor, priority=0}, workerNodeService=WorkerNodeService{ip='storage--druid-indexer-577957494-zhrhs', capacity=128, version='0', category='_default_worker_category'}, lookupNodeService=LookupNodeService{lookupTier='__default'}}}]
[main] Announced DiscoveryDruidNode[DiscoveryDruidNode{druidNode=DruidNode{serviceName='druid/indexer', host='10.0.96.19', bindOnHost=false, port=-1, plaintextPort=8091, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeRole='INDEXER', services={dataNodeService=DataNodeService{tier='_default_tier', maxSize=0, serverType=indexer-executor, priority=0}, workerNodeService=WorkerNodeService{ip='storage--druid-indexer-577957494-zhrhs', capacity=128, version='0', category='_default_worker_category'}, lookupNodeService=LookupNodeService{lookupTier='__default'}}}]
[main] Announcing DiscoveryDruidNode[DiscoveryDruidNode{druidNode=DruidNode{serviceName='druid/indexer', host='10.0.96.19', bindOnHost=false, port=-1, plaintextPort=8091, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeRole='PEON', services={}}]
[main] Announced DiscoveryDruidNode[DiscoveryDruidNode{druidNode=DruidNode{serviceName='druid/indexer', host='10.0.96.19', bindOnHost=false, port=-1, plaintextPort=8091, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeRole='PEON', services={}}]
Above is the announcements inside the indexer. Note that the
PEON
announcements have
services={}
s
Definitely good to fix this, my understanding is limited, but aren't Peons supposed to communicate on different ports configured with
druid.indexer.runner.startPort
and
druid.indexer.runner.endPort
. I'm not sure what the Peon announcement is for. On a side note, have you looked at the new MM-less K8s extension `druid-kubernetes-overlord-extensions` to avoid the need for MMs completely? It's new and experimental, but sounds like it would fit your deployment well.
c
I tried setting the port range for the runners explicitly to
8100-8103
but the announced peon still uses
8091
, which is why Im really suspicious of it.
I am super interested in this extension, I'm definitely going to look into running it
👍 1
I have created a PR to upstream master to patch the bug in the kubernetes extensions and would greatly appreciate a review https://github.com/apache/druid/pull/13667
a
@Nick Lippis - FYI
I have determined that the underlying issue with the k8s extension is that the
PEON
and
INDEXER
services that are announced by an indexer pod when it starts up are both on the same host/port.
why does this happen, though? two services can't start on the same host and port
c
I would love to understand that!
This is the announcement sequence inside the indexer
Copy code
[main] Announcing DiscoveryDruidNode[DiscoveryDruidNode{druidNode=DruidNode{serviceName='druid/indexer', host='10.0.206.35', bindOnHost=false, port=-1, plaintextPort=8091, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeRole='INDEXER', services={dataNodeService=DataNodeService{tier='_default_tier', maxSize=0, serverType=indexer-executor, priority=0}, workerNodeService=WorkerNodeService{ip='storage--druid-indexer-76c796f796-q4mlf', capacity=128, version='0', category='_default_worker_category'}, lookupNodeService=LookupNodeService{lookupTier='__default'}}}]

[main] Announced DiscoveryDruidNode[DiscoveryDruidNode{druidNode=DruidNode{serviceName='druid/indexer', host='10.0.206.35', bindOnHost=false, port=-1, plaintextPort=8091, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeRole='INDEXER', services={dataNodeService=DataNodeService{tier='_default_tier', maxSize=0, serverType=indexer-executor, priority=0}, workerNodeService=WorkerNodeService{ip='storage--druid-indexer-76c796f796-q4mlf', capacity=128, version='0', category='_default_worker_category'}, lookupNodeService=LookupNodeService{lookupTier='__default'}}}]

[main] Announcing DiscoveryDruidNode[DiscoveryDruidNode{druidNode=DruidNode{serviceName='druid/indexer', host='10.0.206.35', bindOnHost=false, port=-1, plaintextPort=8091, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeRole='PEON', services={}}]

[main] Announced DiscoveryDruidNode[DiscoveryDruidNode{druidNode=DruidNode{serviceName='druid/indexer', host='10.0.206.35', bindOnHost=false, port=-1, plaintextPort=8091, enablePlaintextPort=true, tlsPort=-1, enableTlsPort=false}, nodeRole='PEON', services={}}]
Note that I have the peon port explicitly set to
8100
Copy code
druid_indexer_runner_startPort: "8100"
  druid_indexer_runner_endPort: "8100"
I;m trying to determine where in the source the indexer process is getting started so I can debug why the port is incorrect for the peon
a
can you attach the full logs of indexer?
I tried this locally and I don't see an announcement for
peon
. I tried with curator node discovery but that shouldn't matter. what kind of indexing task are you running?
c
There are no tasks running, this occurs at startup.
Above is the full logs including the announcement of the peon with no services.
a
@Cory Johannsen - FYI I recently learned that the problem you encountered is fixed by https://github.com/apache/druid/pull/12640 (24.0 release)
@Clint Wylie also added a safety check anyway here in this PR - https://github.com/apache/druid/pull/13930