https://pinot.apache.org/ logo
#general
Title
# general
k

Ken Krugler

09/23/2021, 9:34 PM
We have about 1500 segments in our HDFS deep store directory. We push these to our Pinot cluster via a metadata push, so only the URI is sent to the controller, which works well. But when we add a single new segment, our push job still has to download/untar all 1500 segments, because we can’t specify a pattern to filter the output directory files to only the new file. We could add per-month subdirectories in HDFS to restrict the number of files being processed this way, but is there a better approach? Note that the files in HDFS can’t be moved around, as their deep store URIs are part of the Zookeeper state.
m

Mayank

09/23/2021, 9:43 PM
Perhaps we can support file name / pattern to select?
k

Ken Krugler

09/23/2021, 9:43 PM
That was one thought I had, yes.
Essentially there’s an implicit pattern currently, as the filename has to end with “.tar.gz”
m

Mayank

09/23/2021, 9:47 PM
Yeah, should be easy to enhance.
k

Kishore G

09/23/2021, 11:09 PM
per month and per day sub-directory is not a bad idea.