https://pinot.apache.org/ logo
#general
Title
# general
n

Noah Prince

10/28/2020, 2:02 PM
Does a segment always consist of
columns.psf, creation.meta, index_map, metadata.properties
? I’m thinking for the s3 lazy loading, it might make sense to have separate caching settings for metadata vs
columns.psf
. Like you may want to eagerly load all or most of the metadata since it’s small and means segments can be eliminated quickly.
m

Mayank

10/28/2020, 2:06 PM
Yes, all segments have these file. But these are not exposed as individual files. One issue I can think of with the approach is when a segment is refreshed, the cached metadata can get out of sync, and would need some sort of invalidation/reload.
n

Noah Prince

10/28/2020, 2:08 PM
How does a segment get refreshed? I thought the idea was that data is immutable?
And what do you mean they aren’t exposed as individual files? Do they get compressed at some point?
m

Mayank

10/28/2020, 2:09 PM
Having said that, I do see some merit in eager loading of metadata, Perhaps it would make sense to write down the idea and check against cases to handle.
As in, the interface doesn’t allow you to query a file from segment
n

Noah Prince

10/28/2020, 2:09 PM
Oh. The interface expects the full segment to be there?
m

Mayank

10/28/2020, 2:12 PM
I mean there is no api grtColumnPsfFile()
n

Noah Prince

10/28/2020, 2:13 PM
Added it as a comment on the lazy loading issue. I think first we do lazy loading of the whole segment. Then add this as an optimization later.
m

Mayank

10/28/2020, 2:13 PM
There is getSegmentMetadata() though
Yeah, I think your idea is good. Just saying we need to think through to design the right apis, and ensure all cases handled