Does a segment always consist of `columns.psf, cre...
# general
n
Does a segment always consist of
columns.psf, creation.meta, index_map, metadata.properties
? I’m thinking for the s3 lazy loading, it might make sense to have separate caching settings for metadata vs
columns.psf
. Like you may want to eagerly load all or most of the metadata since it’s small and means segments can be eliminated quickly.
m
Yes, all segments have these file. But these are not exposed as individual files. One issue I can think of with the approach is when a segment is refreshed, the cached metadata can get out of sync, and would need some sort of invalidation/reload.
n
How does a segment get refreshed? I thought the idea was that data is immutable?
And what do you mean they aren’t exposed as individual files? Do they get compressed at some point?
m
Having said that, I do see some merit in eager loading of metadata, Perhaps it would make sense to write down the idea and check against cases to handle.
As in, the interface doesn’t allow you to query a file from segment
n
Oh. The interface expects the full segment to be there?
m
I mean there is no api grtColumnPsfFile()
n
Added it as a comment on the lazy loading issue. I think first we do lazy loading of the whole segment. Then add this as an optimization later.
m
There is getSegmentMetadata() though
Yeah, I think your idea is good. Just saying we need to think through to design the right apis, and ensure all cases handled