I'm trying to understand the difference between Se...
# general
a
I'm trying to understand the difference between Segment URI Push and Segment Metadata Push. I was using Segment URI Push and I filled up the disk on the controller. That seems to make sense to me since the controller had to download all the segments. A couple related questions: 1. If I use metadata push, my understanding is that the controller will direct one of the servers to download the segment instead, is that right? 2. Does that mean the controller will use less disk in that case? 3. Is the final state after URI Push and Metadata Push different? I'd assume in both cases, you should end up with segments distributed across servers, is that right? So I'm just curious why the controller's disk filled up, is it supposed to clean up and isn't doing that, or is this behavior expected?
m
Copy code
1. Yes. It is true for all pushes though, controller always directs servers to download segments from deepstore (or whatever is configured PinotFs).
2. Yes. Controller only needs to look at segment metadata. There's no need for it to have the entire segment. So metadata push is an optimization to achieve this.
3. Yes, final state is the same. If controller disk filled, you should check what is filling it. if it is the segments, then perhaps controller is not using the deepstore (dataDir not configured)?
a
Thank you! I'll see if I didn't configure dataDir correctly
Hmm so from (1), maybe I don't understand the difference between URI and metadata push
It's just that the controller doesn't download from deep store?
I do have dataDir configured, here's from the controller conf:
Copy code
controller.data.dir=<s3://pinot/>
controller.local.temp.dir=/usr/scratch/pinot
controller.enable.split.commit=true
pinot.controller.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
pinot.controller.segment.fetcher.protocols=file,http,s3
pinot.controller.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher

# This is required
pinot.controller.storage.factory.s3.region=xxx
pinot.controller.storage.factory.s3.accessKey=xxx
pinot.controller.storage.factory.s3.secretKey=xxx
pinot.controller.storage.factory.s3.endpoint=xxx
OH interesting
I screwed up and set controller.data.dir twice in the config.
First to a path on local disk, second to s3
I'm assuming it took the first data dir?
m
Seems so.
a
Works now, thanks for the help
FWIW it seems like it concatenated the two data dirs 🙂
So it was writing to like,
/usr/.../s3:/pinot
on local disk
m
I see. Do you mind adding these into the FAQ, for the rest of the community? #C023BNDT0N8