https://pinot.apache.org/ logo
h

Harold Lim

02/02/2021, 6:07 PM
I have a pinot setup using helm charts. I have setup a realtime table. The server ran out of disk space and it was stuck at the last segment (not consuming any new data). I updated the disk size and restarted the pinot server, and in the web UI, the last segment status was shown as BAD. So, I deleted the segment from the UI, but it still hasn't start consuming new data. Is there a way around this?
w

Will Briggs

02/02/2021, 6:36 PM
Is it the server or the controller that ran out of space? I have seen this behavior when the server cannot push completed segments to the controller.
h

Harold Lim

02/02/2021, 7:03 PM
It's the server.
x

Xiang Fu

02/02/2021, 7:39 PM
after you restart the pinot server, have you tried to enter into pinot server and do df to see the disk size reflection?
if disk is enough, then pinot server restart should recover the consuming
1
h

Harold Lim

02/02/2021, 9:25 PM
I increased it to 20G and df shows that it has increased. /dev/sdb ext4 19G 3.4G 16G 18% /var/pinot/server/data
But I don't see new segments getting created/data consumed.
So, originally after restart the status in the UI for segment 0_12 shows as BAD. I tried reloading segment, it didn't help. Then, I tried deleting it. so segment. 0_12 is now gone, but it didn't seem to have recovered consuming
segment 0_0 to 0_11 shows status as DONE
x

Xiang Fu

02/02/2021, 11:14 PM
hmmm, will it be recovered if the consuming segment got deleted ? @Neha Pawar
n

Neha Pawar

02/02/2021, 11:18 PM
yes, validation manager will create the new CONSUMING segment
validation manager runs every 15 minutes i think. So it should’ve recovered already
h

Harold Lim

02/02/2021, 11:41 PM
It seems that it hasn't recovered. Any steps that I need to verifY?
n

Neha Pawar

02/02/2021, 11:44 PM
what does this API return for that partition:
Copy code
curl -i -X GET "http://<controller host>:<controller port>/tables/<your table name>/consumingSegmentsInfo"
h

Harold Lim

02/02/2021, 11:46 PM
seems to be empty
Pinot-Controller-Host: pinot-controller-0.pinot-controller-headless.pinot-quickstart.svc.cluster.local Pinot-Controller-Version: Unknown Access-Control-Allow-Origin: * Content-Type: application/json Content-Length: 33 {"_segmentToConsumingInfoMap":{}}
n

Neha Pawar

02/02/2021, 11:46 PM
and can you share the ideal state for the table?
h

Harold Lim

02/02/2021, 11:47 PM
I'm still new to Pinot. What do you mean by ideal state?
n

Neha Pawar

02/02/2021, 11:48 PM
try this
Copy code
curl -i -X GET "http://<controller host>:<controller port>/tables/<your table name>/idealstate"
h

Harold Lim

02/02/2021, 11:49 PM
got it.
Copy code
HTTP/1.1 200 OK
Pinot-Controller-Host: pinot-controller-0.pinot-controller-headless.pinot-quickstart.svc.cluster.local
Pinot-Controller-Version: Unknown
Access-Control-Allow-Origin: *
Content-Type: application/json
Content-Length: 1615

{"OFFLINE":null,"REALTIME":{"prometheus__0__0__20210130T1836Z":{"Server_pinot-server-0.pinot-server-headless.pinot-quickstart.svc.cluster.local_8098":"ONLINE"},"prometheus__0__10__20210201T1730Z":{"Server_pinot-server-0.pinot-server-headless.pinot-quickstart.svc.cluster.local_8098":"ONLINE"},"prometheus__0__11__20210201T1751Z":{"Server_pinot-server-0.pinot-server-headless.pinot-quickstart.svc.cluster.local_8098":"ONLINE"},"prometheus__0__1__20210131T0716Z":{"Server_pinot-server-0.pinot-server-headless.pinot-quickstart.svc.cluster.local_8098":"ONLINE"},"prometheus__0__2__20210131T0737Z":{"Server_pinot-server-0.pinot-server-headless.pinot-quickstart.svc.cluster.local_8098":"ONLINE"},"prometheus__0__3__20210131T0759Z":{"Server_pinot-server-0.pinot-server-headless.pinot-quickstart.svc.cluster.local_8098":"ONLINE"},"prometheus__0__4__20210131T0821Z":{"Server_pinot-server-0.pinot-server-headless.pinot-quickstart.svc.cluster.local_8098":"ONLINE"},"prometheus__0__5__20210131T0842Z":{"Server_pinot-server-0.pinot-server-headless.pinot-quickstart.svc.cluster.local_8098":"ONLINE"},"prometheus__0__6__20210131T0904Z":{"Server_pinot-server-0.pinot-server-headless.pinot-quickstart.svc.cluster.local_8098":"ONLINE"},"prometheus__0__7__20210131T0926Z":{"Server_pinot-server-0.pinot-server-headless.pinot-quickstart.svc.cluster.local_8098":"ONLINE"},"prometheus__0__8__20210131T0948Z":{"Server_pinot-server-0.pinot-server-headless.pinot-quickstart.svc.cluster.local_8098":"ONLINE"},"prometheus__0__9__20210131T1009Z":{"Server_pinot-server-0.pinot-server-headless.pinot-quickstart.svc.cluster.local_8098":"ONLINE"}}}
It was at 0__12 before, but then the server ran out of disk space. after resizing the pvc size and restarting pinot, the state of the segment went BAD. So I deleted it and restarted the server again
n

Neha Pawar

02/02/2021, 11:51 PM
do you have access to the cluster manager UI? can you go to the Zookeeper browser tab, and check if you see the segment 0__12 under PROPERTY_STORE/SEGMENTS/tableName
h

Harold Lim

02/02/2021, 11:54 PM
I do see 0__12 in the local disk of the server though. in /var/pinot/server/data/index/<table name>/_tmp
it doesn't have it.
n

Neha Pawar

02/03/2021, 12:00 AM
hmm, deleting the segment removed the metadata, so it got into a state where the partition is unable to recover.. i dont know if we can do anything other than manually add the metadata back. @Subbu Subramaniam any ideas how to recover from this?
s

Subbu Subramaniam

02/03/2021, 12:14 AM
Best to let the validation manager do its job. Check to see if you have disabled it. The controller logs should indicate somethig about realtime validationmanager
n

Neha Pawar

02/03/2021, 12:14 AM
this case cannot be solved by validation manager
s

Subbu Subramaniam

02/03/2021, 12:16 AM
why not?
n

Neha Pawar

02/03/2021, 12:17 AM
just traced it. we will log
"Got unexpected status: {} in segment ZK metadata for segment: {}"
and do nothing
s

Subbu Subramaniam

02/03/2021, 12:17 AM
The segment has disappeared from idealstate as well as metadata right?
is that for segment 12 or 11?
n

Neha Pawar

02/03/2021, 12:18 AM
segment 0 to 11 have status DONE and ONLINE
segment 12 is nowhere
s

Subbu Subramaniam

02/03/2021, 12:19 AM
the log message shiould indicate the segment name. what is it saying/
Also, it may be easier and faster to just drop the table and recreate it
n

Neha Pawar

02/03/2021, 12:20 AM
i dont know what it is saying. But i’m sure we are not handling this. Check it 🙂
line 1024 PinotLLCRealtimeSegmentManager
s

Subbu Subramaniam

02/03/2021, 12:21 AM
You mean to say we print
{}
and not the actual segment name in the log message?
n

Neha Pawar

02/03/2021, 12:21 AM
that is not what i’m saying
we dont handle this case. we will not recover - is what i’m saying
s

Subbu Subramaniam

02/03/2021, 12:22 AM
Are you saying we dont handle the case where the last segment is missing from ideal state and metadata?
That definitely worked, so it is a recent bug ontroduced.
we do have test cases to cover that afaik
n

Neha Pawar

02/03/2021, 12:26 AM
looks like it was always like this
anyway, @Matt is it possible to recreate the table? we can fix this on our end
@Harold Lim can you restart the controller? Maybe you have disabled the ReatlimeValidationManager?
controller.realtime.segment.validation.frequencyInSeconds
h

Harold Lim

02/03/2021, 12:32 AM
I already tried restarting both controller and server earlier. It didn't work.
I just stepped out. Let me check that when I get back
n

Neha Pawar

02/03/2021, 12:35 AM
it is not the same case @Subbu Subramaniam. That test will delete the CONSUMING segment and also make the previous segment CONSUMING. It simulates failure in step 2 exactly.
but in this case previous segment is ONLINE
why dont you trace it in the validator
k

KY

02/03/2021, 12:35 AM
apart from the problem at hand wrt BAD segment, how is the space management done for the real time segments backed by persistent volume ? Or is it the ingesting consumers' responsibility ?
n

Neha Pawar

02/03/2021, 12:42 AM
could you elaborate what about space mamagement you wanted to know?
k

KY

02/03/2021, 12:48 AM
are there any checks/balances to make sure that have enough space of the volume before accepting the next segment ?
n

Neha Pawar

02/03/2021, 12:54 AM
no such checks afaik. The ingestion assumes sufficient resources. @Xiang Fu anything you’d like to add here to address this concern?
m

Matt

02/03/2021, 1:19 AM
@Neha Pawar, I was planning to use quota.storage part of tableConfig to specify a fixed size per Pinot server to protect disk. Do you think this will work?
n

Neha Pawar

02/03/2021, 1:38 AM
i believe the storage quota only applies to offline segment pushes.
m

Matt

02/03/2021, 1:39 AM
ok Thanks for confirming
n

Neha Pawar

02/03/2021, 1:39 AM
@Mayank what happens in LI production environments? any practices followed to prevent realtime tables from going over desired storage. Or just monitoring?
s

Subbu Subramaniam

02/03/2021, 1:56 AM
@KY you can use the RealtimeProvisoiningHelper to decide on storate requirements.
@Neha Pawar we use RealtimeProvisioningHelper for memory requirements. We never exceed disk requirements since the realtime tables we have are small in retention.
Ah correct, we don't expect idealstate to NOT have a segment in one of CONSUMING or OFFLINE state since we update the idealstate in one shot. Here is an idea. It involves manual steps, and has never been tried: 1. Cook up a segment name for
__12
. Add metadata for that segment by copying the metadata for previous segment. Set the
endoffset
of segment
11
to the start offset of the new segment. 2. Add the segment
12
in OFFLINE state in idealstate 3. Wait for realtime segment fixer to kick in. Before you update the metadata, you may want to post the prev metadata so that either neha or I can help create the new metadata. DISCLAIMER: This has never been tried before
k

KY

02/03/2021, 2:37 AM
@Subbu Subramaniam thanks for the pointer. That would still be static config. If the rate of ingestion changes due to variations in the workload, we might still need dynamic checks before we pack the next segment. That would have prevented BAD segment scenario for us.
s

Subbu Subramaniam

02/03/2021, 4:19 AM
@KY the helper allows you to choose a segment size, and Pinot automatically adjusts to that segment size.
If the rate of ingestion is low, it consumes for longer time, and if it is high it will consume for shorter time
h

Harold Lim

02/03/2021, 4:29 AM
@Subbu Subramaniam But that doesn't take into account current available disk space? So in this test set up, the pvc ran out of disk space so it had issues in the last segment.
m

Matt

02/03/2021, 4:09 PM
From my limited experience and that works for me is to have multiple partitions and match it with no of pinot servers. So one server will consume one partition. Also provide 50% additional disk space than estimated size per server plus label all servers properly so the segments will be spread across evenly. Let me know of any better way.
m

Mayank

02/03/2021, 4:20 PM
I think this is more about graceful handling in case of error situations like disk-full. From the thread it seems like we get into an error state that seems like to require multiple steps to resolve. If so, I’d recommend filing an issue. Perhaps we can stop consumption (and emit metrics) if no more resources left, and either auto recover or have a simple recovery mechanism to get out of the error state.
w

Will Briggs

02/03/2021, 4:23 PM
With respect to this error state, I think that with no manual intervention (other than increasing the size of the disks), this would have recovered if the user had waited for the validation manager to recover the segment. Perhaps better logging / notification / documentation on idealstate and how the validation manager works could prevent user errors like this in the future?
k

KY

02/03/2021, 4:28 PM
How long does the validation take ?
h

Harold Lim

02/03/2021, 4:30 PM
Right. I think there are a couple of things here. 1. There is no notification or any indication in the UI. This started when I just noticed that no new data was getting consumed (i.e., it was stuck at segment 0__12) for a long time. Then I checked the logs and the server and discovered that the disk was full. 2. After resizing the pvc and restarting server, segment 0_12 went into BAD state in the UI. Waited a while and it's still in bad state and restarted a few more times, and it was still in bad state. Finally, tried if deleting segment 0__12 from the UI helps (and possibly just made it worst).
s

Subbu Subramaniam

02/03/2021, 4:52 PM
@Harold Lim once you choose the segment size and you have a retention of the table, you can compute the max disk size needed (the tool computes this for you for various configurations and you can select the one you like most).
That being said, we can look for specific exceptions that JVMs may return on specific Operating Systems. While consuming rows the memory used is all memory mapped. Since JVM offers no parameters to limit this, and some operating systems (Linux, for example) overcommit memory only wedge themselves later, there is no control on this.
The other aspect where disk is used is during segment build. That failure is handled gracefully by stopping consumption.
Other than introducing another config variable max memory mapped, I am not sure how the consuming segment issue can be resolved. I am open to ideas.
@Harold Lim do you have text indexing turned on? If so, your disk full condition could be due to a file leak that we observed in our systems.
m

Matt

02/05/2021, 5:06 PM
@Subbu Subramaniam what do you mean by file leak for text index ? Is there a PR in github so that I can read more abt the issue ?
h

Harold Lim

02/05/2021, 5:08 PM
I don't have text indexing turned on. I do have multi-valued columns and have inverted index on them. The disk full is due to the streaming data being ingested vs the available disk space in the server to store the completed segments. It's a test setup and only used the default config specified in the helm charts in the repo so only 4G pvc was configured. We didn't configure any deep storage. Q: Do completed segment continue to be served from the same server? Or does Pinot have policy to move it to some other servers?
s

Subbu Subramaniam

02/05/2021, 5:26 PM
@Matt we are still investigating. Will file an issue when we know better.
m

Matt

02/05/2021, 5:27 PM
@Subbu Subramaniam Thanks, do you have a work around for time being ? Why I am asking is I am relying heavily on text index.
s

Subbu Subramaniam

02/09/2021, 4:50 PM
@Sajjad Moradi has some details here, I will let him follow up. Afaik, there is no work-around. It is a small leak, so it is unlikely that an installation runs out of disk space due to this. The number of files could grow, however. @Matt ^^
s

Sajjad Moradi

02/09/2021, 5:21 PM
@Subbu Subramaniam I just created an issue for this:
Based on the above analysis, the effect should be minimal as there will be one file left undeleted for each consuming segment and the size of that file is zero.
s

Subbu Subramaniam

02/09/2021, 5:28 PM
If neglected, we could grow to too many files in the system. I think these files are not opened -- if they are, then we could also run out of file handles. Thanks for looking into this, @Sajjad Moradi
s

Sajjad Moradi

02/09/2021, 6:04 PM
Yeah, I too believe they're not open as
releaseSegment
function consequently call
.close()
of the lucene index which releases the lock files.