Hi all - we’ve noticed when doing a big metadata p...
# pinot-dev
k
Hi all - we’ve noticed when doing a big metadata push (1200 segments) that a lot of the time is spent downloading/expanding the segment from deep store (HDFS) to the local machine, so that the metadata file from the segment can be extracted and used to build the request to the controller. It should be possible to open a stream to the file and extract only that bit of the segment. Does this make sense as a reasonable enhancement?
s
I don't understand the "build the request to the controller" part. You are doing a data push to pinot by placing data in hdfs and then a metadata push to the controller, right? Are you concerned about the servers loading the segments from deepstore? that cannot be avoided, right? Or am I getting the problem wrong?
r
I think what Ken's getting at is do we need to read (or even download) the entire segment file? Can we start streaming the file, read the metadata and then abort once it's been read. Did I get you right @User?
k
Hi @User by “build the request”, I mean that the metadata file in the segment has to be read and converted into the HTTP request to the controller (thus “building the request”). And yes, @User is right about what I’m proposing. I tried it with this snippet of code
Copy code
File inputFile = new File("/Users/kenkrugler/Downloads/adbeat/pinot-segments-ads/ads_us_2020-11_00.tar.gz");
        File outputFile = new File("./build/metadata.properties");
        TarArchiveInputStream tis = new TarArchiveInputStream(new GZIPInputStream(new FileInputStream(inputFile)));
        TarArchiveEntry tarEntry;
        
        while ((tarEntry = tis.getNextTarEntry()) != null) {
            if (tarEntry.isFile() && tarEntry.getName().endsWith("/metadata.properties")) {
                FileOutputStream fos = new FileOutputStream(outputFile);
                IOUtils.copy(tis, fos);
                fos.close();
                break;
            }
        }
        
        tis.close();
Given the ordering of files in segments I’ve seen, the metadata.properties occurs before the big pieces (columns.psf & star_tree_index), e.g. 15K of reading/decompression vs. 200-300MB for my segments.
Hmm, we’d also need the creation.meta file, which is 16 bytes, and located just before the metadata.properties file. See
SegmentPushUtils.generateSegmentMetadataFile()
for where it downloads and unpacks/untars the entire segment to get a file used by
sendSegmentUriAndMetadata()
.
s
"read" from where? Which component is the reader we are talking about? Controller? Server? Segment pusher?
k
We’re talking about the metadata push job runner
See above, for call path to
SegmentPushUtils.generateSegmentMetadataFile()
s
ah, so the segment pusher. Hopefully the same entity as gthe segment generator. What we could do was during segment generation, copy the metadata file into another area and then read only that one. (and perhaps creation.meta). This has to be done before compressing the segment, of course
k
I think performance would be fine to just do what I did above, since those two (small) files occur before the big files in the tarball.
s
Do we want to rely on this being the case all the time?
k
I’d have to look at the writer code, to see how hard it would be to enforce that constraint
s
Also, the file is compressed in to tar.gz, so we may have to download the file and uncompress it anyway?
k
No, streaming it works - snippet of code above was successful
s
we don't use this in LinkedIn, so if it works for u, great...
r
separate metadata files feels cleaner to me
k
Then you have the issue of duplicated data that has to be in sync
r
but that's hardly insurmountable
k
No, but a man with one watch always knows what time it is 🙂 My POV is that this is a very simple change that will dramatically improve performance for at least our use case (batch metadata push of many segments), without changing anything about the data layout.
r
if this works, it's a good workaround, just need to be sure there are tests which break if the tarball gets reordered inadvertently
s
exactly