https://pinot.apache.org/ logo
#troubleshooting
Title
# troubleshooting
e

Elon

02/23/2022, 5:04 AM
Hi, we noticed that the download url for all segments in our upsert tables is null, but the segments exist in deepstore. Has anyone ever seen that issue before? Thinking to manually update all the segment metadata in zk since they are realtime segments. lmk if that is how you resolve this.
m

Mayank

02/23/2022, 5:05 AM
Hmm, this seems quite odd. Personally, I haven’t seen this. Are there any errors in the logs during segment commit?
e

Elon

02/23/2022, 5:06 AM
I will search for it, since we're using peer download should we look in the server or the controller?
also, we rebalanced the upsert tables after scaling up a tenant, not sure if that had something to do with it.
thanks for responding @Mayank! I will search the logs and see what I can find
Seeing messages like this:
Copy code
2022/02/20 02:28:50.884 WARN [PinotFSSegmentUploader] [enriched_station_orders_v1_16_2_upsert__4__81__20220219T0227Z] Failed to upload file /var/pinot/server/data/index/enriched_station_orders_v1_16_2_upsert_REALTIME/enriched_station_orders_v1_16_2_upsert__4__81__20220219T0227Z.tar.gz of segment enriched_station_orders_v1_16_2_upsert__4__81__20220219T0227Z for table java.util.concurrent.TimeoutException
But it seems to have uploaded the files
I will download a few just to check them
It seems that all the upsert table segments have that error, but we have realtime tables that do not have the error
Is there a way to force pinot to retry the upload the segments from the servers on the upsert tenant?
I see the timeout is hardcoded, I can try to increase it - but same cluster has servers which upload all segments within 10 seconds:
Copy code
PinotFSSegmentUploader.DEFAULT_SEGMENT_UPLOAD_TIMEOUT_MILLIS
m

Mayank

02/23/2022, 5:45 AM
@Yupeng Fu any ideas ^^
e

Elon

02/23/2022, 5:45 AM
I extrapolated the segment upload time from some of our metrics - maybe gcs is slow for us, it's only 225mb per segment so it shouldn't take 10 seconds 🤷‍♂️
I actually see the file there on gcs
y

Yupeng Fu

02/23/2022, 6:25 AM
if it times out and fails to upload, then no url is expected?
m

Mayank

02/23/2022, 2:57 PM
But @Yupeng Fu Elon mentioned that the segment was indeed pushed to deepstore (or at least exists there).
y

Yupeng Fu

02/23/2022, 4:15 PM
well, it's possible that the file uploaded but fail to receive response from client side
also, i think the segment upload flow is the same for non-upsert tables, so i dont see anything special with upsert
e

Elon

02/23/2022, 4:28 PM
Thanks! I'm trying with an increased timeout
FYI, @Yupeng Fu @Mayank @Kishore G - I increased the timeout in the Pinot fs segment uploader and verified that we no longer get those timeouts and null segment url's
Would it be worth it to add a config? I think running in the cloud adds some latency for us.
lmk, I would be happy to do it if no one is already working on it
m

Mayank

02/24/2022, 9:50 PM
Hey @Elon yeah go ahead and file the PR (or issue). If anyone has concern (hope not), we can discuss there.
e

Elon

02/24/2022, 9:51 PM
sure, sounds good:)
sorry to bug you about something else: I posted about setting batch message mode, whenever you have some time
just posted in main thread