Hi team I m experiencing some issues in pinot `0 10 0` tryin Apache Pinot #troubleshooting

Hi team, I'm experiencing some issues in pinot `0....

Scott deRegt

10/04/2022, 3:45 PM

Hi team, I'm experiencing some issues in pinot

0.10.0

trying to rebalance an offline table after some offline-servers reached

dead

state (and have been replaced with new, healthy nodes), was hoping to get some extra 👀 on it.

✅ 1

Scott deRegt

10/04/2022, 3:46 PM

I started by dropping

tags

on the

dead

servers. I can see the

ideal state

of the table still contains the

dead

servers. I attempt to

rebalance servers

reassign instances

+ no

downtime

- see the new proposed

targetAssignment

removes the

dead

nodes. Can see in

controller

logs seem initially healthy before being blocked in this wait and eventually timeout:

Copy code

WARN [TableRebalancer] [jersey-server-managed-async-executor-6] Caught exception while waiting for ExternalView to converge for table: daily_user_metrics_by_channel_enterprise_bucketed_OFFLINE, aborting the rebalance
java.util.concurrent.TimeoutException: Timeout while waiting for ExternalView to converge

Scott deRegt

10/04/2022, 3:48 PM

Any tips on further debugging why

ExternalView

is not converging on

IdealState

Scott deRegt

10/04/2022, 3:55 PM

Also, at what point is the updated

Ideal State

persisted in Zookeeper? I still see zookeeper holding the

Ideal State

that contains the

dead

nodes after running rebalance.

Scott deRegt

10/04/2022, 3:58 PM

Wondering if

ExternalView

is failing to converge on

IdealState

due to

IdealState

failing to persist its update and still referencing

dead

nodes? 🤔

Scott deRegt

10/04/2022, 5:03 PM

Ah, I see that controller waits for external view to converge before updating ideal state

Mayank

10/04/2022, 5:50 PM

Has EV converged with IS now @Scott deRegt?

Scott deRegt

10/04/2022, 5:53 PM

It has not. Due to the Timeout exception on

rebalance

operation's

waitForExternalViewToConverge

, IS is not getting updated. Therefore, current IS still includes the

dead

nodes, so impossible for EV to converge on it.

Mayank

10/04/2022, 5:54 PM

What’s the replication?

Scott deRegt

10/04/2022, 5:55 PM

I have tested with 2 tables, one with

replication

and the other with

Mayank

10/04/2022, 5:55 PM

And both have this issue?

Scott deRegt

10/04/2022, 5:58 PM

Yes that's correct. Logs above are for table with rep factor of

. I also confirmed for this table that the 2

dead

offline servers in the cluster are in different logical instance assignment groups, meaning every segment of this table has at least 2/3 replicated segments available.

Scott deRegt

10/04/2022, 6:00 PM

For

Rebalance Servers

, using a

Minimum Available Replicas

Mayank

10/04/2022, 6:08 PM

The code you pointed to is not for writing the ideal state of rebalance, it is to detect the change in ideal state when rebalance was happening. So IS should have been updated with removed dead nodes.

Mayank

10/04/2022, 6:09 PM

In the dry run do the dead nodes appear in the new IS returned?

Mayank

10/04/2022, 6:10 PM

Also just want to ensure you are following the following for removing dead nodes? https://docs.pinot.apache.org/operators/operating-pinot/rebalance/rebalance-servers

Scott deRegt

10/04/2022, 6:12 PM

In dry run, the

dead

nodes do not appear in the new IS returned. Similarly, in controller logs, the

targetAssignment

does not include the

dead

nodes in this log message when running w/ dry run = false.

Mayank

10/04/2022, 6:19 PM

What’s the current IS and EV, and do they differ (if so, what’s the difference)? Wondering if they differ even before the rebalance (for some reason) and unable to converge which causes timeout even before new rebalanced IS could be applied

Scott deRegt

10/04/2022, 6:21 PM

The only difference is the IS still contains the

dead

nodes and the EV does not. I do not see any segments in

ERROR

status.

Scott deRegt

10/04/2022, 6:23 PM

Will DM you sample diff between IS and EV.

Mayank

10/04/2022, 6:23 PM

sounds good

Mayank

10/04/2022, 6:24 PM

In the meanwhile, if this is for testing, could you try with downtime? I am wondering if the code thinks there is no way to move forward without downtime?

Scott deRegt

10/04/2022, 6:31 PM

Yes, could test with downtime in this case. We do want to have a good handle on how to recover from a lost server node (expecting this to be fairly common occurrence) w/o downtime as we get ready to go live in a production env.

Mayank

10/04/2022, 6:31 PM

After offline sync, here’s what’s happening: • In 0.10, the rebalance code first waits for existing EV to converge with existing (pre-rebalance) IS to drain any prior changes, before starting rebalance. • In case a node goes down, IS and EV won’t converge and the code times out. In 0.11, this has been addressed as follows: • When rebalancing, choose “downtime” as well as the newly added options of “minimizeDataMovement”. • If the dead servers have been replaced with healthy ones and tagged appropriately before above, this will do the rebalance without downtime (even though you had to choose it).

Scott deRegt

10/04/2022, 6:32 PM

So in

0.10

, there is no way to recover and re-achieve

good

table status after a lost

server

node w/ no downtime?

Scott deRegt

10/04/2022, 6:36 PM

Does this also require

bootstrap

to be toggled so that we start with an empty instance assignment and ignore current pre-rebalance IS?

Mayank

10/04/2022, 6:38 PM

There might be a way, let me dm.

Scott deRegt

10/04/2022, 8:31 PM

Leaving this here for reference, the other best practice / proposed solution by @~Kishore is logically naming servernames consistently such that if there is a hardware failure and a node is replaced, then the new node will have the same logical server name, such that the existing Ideal State remains valid and the External View is able to converge on it.

➕ 1

Lee Wei Hern Jason

01/09/2024, 11:09 AM

Hi @Scott deRegt did you find a way to do this ? im facing the same issue now

Scott deRegt

01/09/2024, 4:39 PM

In our case, upgrading to pinot `0.11.0`+ and doing a table rebalance with

downtime=true

and

minimizeDataMovement=true

worked for us (as mentioned here). Without more information on your cluster setup, I cannot confidently say whether or not that will solve your problem though.

Open in Slack

Previous Next