This message was deleted Apache Druid #troubleshooting

Join Slack

This message was deleted.

# troubleshooting

Slackbot

06/13/2023, 10:31 AM

This message was deleted.

Tejas Parbat

06/13/2023, 11:14 AM

This alert means that the Coordinator has been wanting to add a replica of a segment for at least

replicantLifetime

runs (by default: 15 runs) and has not been able to. By default, Coordinator runs occur once a minute, so that's 15 minutes. However, this can be overridden by setting

druid.coordinator.period

to something else. If an environment does happen to have a more aggressive setting for

druid.coordinator.period

then I would suggest either restoring the default setting, or increasing

replicantLifetime

to make the alerting less sensitive. Search on this page for

replicantLifetime

and

druid.coordinator.period

for more details: https://druid.apache.org/docs/latest/configuration/index.html Also check if any issues with datanodes , are they often going down?

victor regalado

06/13/2023, 7:53 PM

The datanodes are not going down. We are using the default for coordinator.period which is 1 min. I decreased the replicant lifetime to 3 to see if this would decrease the amount the errors. We also experiencing this issue https://apachedruidworkspace.slack.com/archives/C0309C9L90D/p1686513388513769 This is the distribution of errors.

victor regalado

06/13/2023, 7:59 PM

I just noted and my coordinator heap is much smaller. Im gonna increase its heap to see if it helps

Copy code

You can set the Coordinator heap to the same size as your Broker heap, or slightly smaller: both services have to process cluster-wide state and answer API requests about this state.

victor regalado

06/13/2023, 11:19 PM

Well i dont think that worked 🙂

Open in Slack

Previous Next