Hi folks! I'd like to ask some questions about the...
# general
d
Hi folks! I'd like to ask some questions about the next release (0.9.0): • Is it going to include any sort of table truncation, even if rudimentary? • Is there a rough estimation for when it will be released?
m
Hi @User, There's a delete all segments api you could use? For the release we have started the discussions already (so a few more weeks). Is there anything in particular you are looking for in the next release?
d
Hey man! I tried using the "delete all segments" endpoint but that didn't seem to work well for me - after deleting them Pinot didn't automatically create one afterwards, leaving me with no way of ingesting new data. So I found this method to not be very reliable, and instead having some sort of "truncation" mechanism that is just a straightforward way of cleaning up a whole table would be the ticket for me. I was pointed to the code to try to look at how to create new segments manually but I found that to be very confusing, especially since my project is in Python and not Java, so it's harder to translate what has to be done.
RDBMSs in general have this truncation feature which is beautiful for when we screw up with not-so-relevant data or are just testing stuff, and this is exactly what I need, but I don't really want to have to dig deep into Pinot's internals to be able to accomplish that.
k
@User ^^
@User can you please upvote the truncate issue, we discussed this multiple times but we did not decide on a solution because there were too many error conditions especially with real-time table.. maybe, we can start off with supporting truncate for offline tables
s
It seems to me (from "after deleting them Pinot didn't automatically create one afterwards") that @User is trying to truncate a realtime table
One way to do this is to drop the table and re-create it.
But yes, please upvote the issue on table truncation.
k
I think that one was also non-deterministic because deletion is asynchronous and segments might still exist
d
I tried dropping the table and then recreate it, but this didn't work reliably either - after recreating the table I was getting multiple segments in it, which was unexpected. And this was in an integration test, so being done in a very controlled way, including checking when the table was deleted and when it was already created so that I could add data again.
But I'll upvote, yes. Thanks, guys! 🙂
s
So, dropping a table takes a while. You need to make sure that all the segments are gone from the externalview. I agree this is not the best way to describe it. A better way would be for the "drop" command to wait until segments have disappeared from externalview before returning to the user. Once the table is full dropped, you should be able to recreate the table.
d
Hmmm, got it. Well, that would be a great improvement, I like that. Because then the behavior would be more deterministic, IMO. If that gets implemented, it could be a "poor man's truncation" initially - at least from a testing perspective.
The part that was most surprising for me was that I dropped a table, recreated it and it still had segments - as if the table was never deleted -, so this is the part I don't like. Therefore, making the table delete endpoint actually wait until it's all clear for that table sounds like a good idea to me. (Hopefully it doesn't timeout though.)
s
That is a good thing to have, but must be designed in carefully. Maintaining state inside of the API means that the API should be idempotent. If the controller (for example) got restarted while it was in the middle of executing the API , then a second invocation of the same API should be able to take off where it left off. I think table drop is do-able in this fashion, I am not sure about the
truncate
API (see the notes in the issue).
d
Ah, got it. Makes sense to me, good point.