hi <@UDSRQR2GP> just following up on this pull req...
# general
j
hi @User just following up on this pull request (https://github.com/apache/incubator-pinot/pull/3849) Ting submitted back in Apr. Would you mind share your main concern/blocking issue over this diff? if there wasn't any major concern can we merge this diff in soon as it is bug for anyone relies on external deep storage?
t
Subbu and I talked in another thread about this issue. This bug should not affect Low Level Consumer if we let the server directly downloads it from deep storage. But both of us agreed the bug is still an issue for Pinot controller using deep storage. We need a fix this download REST API.
s
I will get back on this in the next couple of days. In general, my view is that the server should download from the external storage if the segment is there. We should change the server to commit to the external storage, and there is a first step PR towards that. https://github.com/apache/incubator-pinot/pull/4713
If we extend this to inject a segment committer, then you can write your own segment committer that commits a segment to an external storage directly.
The only concern I have with the PR is that every segment download in the local storage case will take one additional disk copy, something I would like to avoid.
@Neha Pawar any other ideas?
j
for the concerns over the extra storage for local storage, I think ting has explained in the pull requests the files are supposed to be temporary and short-lived. At most there will N files on disk where N is the number of concurrent download requests. If we need to further clarify about that part or harden that part of logics we could definitely work on that.
my thinking is if there is no significant negative impact (assuming extra disk storage is not an issue), should we prioritize the bug fix over optimal solutions?
I have been thinking about the recent article I read about google code review standard: https://google.github.io/eng-practices/review/reviewer/standard.html and I wonder if we should move closer to that. Would love to hear your thoughts on that
n
have you tried setting the right URI in the segment metadata ? Subbu and I were just discussing, and this line seems to be the culprit? https://github.com/apache/incubator-pinot/blob/master/pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/realtime/PinotLLCRealtimeSegmentManager.java#L483 If this is fixed, everything else is wired already. The download will use the
PinotFSSegmentFetcher
.
t
Thanks Neha. I think it is the right place to change and fix LLC download. We can give it a test. I think PR #3849 is still needed to fix the public API for segment download from controller. Since the extra disk copy for local FS is the only concern, I propose to add an interface method to PinotFS isLocalFS(). For localFS, we can skip to file copy step. Let me know why you think. @Neha Pawar @Subbu Subramaniam
s
Not sure why the download fix is needed if the URI is fixed.
Ah, ok if someone is downloading using curl?
Better than adding isLocal, can we add a method to return a File handle of the segment? The localFs can just translate URI to a path and return it, but the other FS can copy to local and return the temp. Not sure how to remove the temp file, though....
t
yes. PinotSegmentUploadRestletResource is a public controller API.
The current PR already gets the handle of the segment file as a URI: final URI segmentFileURI = ControllerConf.getUriFromPath(StringUtil.join("/", provider.getBaseDataDirURI().toString(), tableName, URLEncoder.encode(segmentName, URL_ENCODING_SCHEME)));
We can use it to test if it is local file or not and then apply different return strategy. Right?
s
we can wait on this PR, the other one is more important (fixing the download URL). We dont' have any use case of public download of realtime segments. Do you?
t
@Subbu Subramaniam At this time, we are running on an internal fork for Pinot with patches I filed as in various PRs. My main focus now is to resolve these patches one way or another in the Apache master so that we do not need to carry them in our next release later this quarter or early next year.
So far, these are the PRs I want to merge in master (we talked about them in our last meeting):
1. Download URL fix as @Neha Pawar suggested. This actually requires some testing as well.
3. The controller download segment API fix in https://github.com/apache/incubator-pinot/pull/3849 because we actually use them in purpose like offline segment download or debugging.
So in summary, we want to see resolution of all the above PRs before our next release.
To your earlier question, " We dont' have any use case of public download of realtime segments. Do you?" I did find a few use cases on our end to use this API beside LLC. One is our offline segment download. While you can argue that we can migrate that as well, the fact this API is available on controller and now already used in various place makes fixing it necessary.
s
I approved your PR on offset not found being treated as a transient error.
Are you expecting this in the next open source release? We will be cutting then next open source release from f8c2cf3d8 plus one hotfix. as in branch https://github.com/apache/incubator-pinot/tree/selection_comparator_hotfix
t
Thanks for the review. For us, if that is in the next release, it would be great.
But that is not necessary if hotfix now is too troublesome for you guys.
s
Let me talk to my colleagues here and make a call. We generally do not like to put out something that has NOT been run in our production environment, but these are minor changes.
t
thanks. Again no urgency here from our side.
I have kick off the test now.
s
@Ting Chen I merged it
t
great. thank you. Will work on the other PRs mentioned above.
s
@Neha Pawar and I will come to a resolution on the download PR.
n
@Ting Chen I just commented on your PR. Subbu and I discussed that the
isLocal
check seems to be an okay approach for now, amongst all the options
t
@Neha Pawar thanks for looking into this. I will modify the PR for this. The current PR is too old and Github seems to have a bug and lost linkage to it. I will file a new PR with the above change.
n
sure, thanks for working on this!
@Neha Pawar @Subbu Subramaniam Please take a look of this issue when free.
👍 1