Hi team, when should I set `exclude.sequence.id` ?...
# general
s
Hi team, when should I set
exclude.sequence.id
? Is it used just for naming the segment? If I create 3 segments, each with a unique name but having the same time-range, can I set
exclude.sequence.id
true all the time?
x
it typically used when your input diretory is root and it contains multiple days and you want to each day to have the same segment name when you re-run the job
s
I see. Does the name mean anything else to the query engine? Does the engine look at the name of the segment for filtering?
x
e.g. we bootstrap a root directory
/my/data/
and it contains
/my/data/yyyy=2020/mm=1/dd=1/20200101.avro
and
/my/data/yyyy=2020/mm=1/dd=2/20200102.avro
with
exclude.sequence.id
you will see two segments named
myTable_20200101_2020101
and
myTable_20200102_2020102
no
it doesn’t do anything to query path
Pinot uses segment name for data replace
s
Got it. If I'm going to replace 3 segments with 1 big one (to compact small segments into big one), is it possible to do this seamlessly?
x
which means if you generate a segment name with
exclude.sequence.id=false
, in above example, you will see segment name
myTable_20200102_2020102_1
and then if you just want to replay segment creation on 2020-01-02, it will generate segment name:
myTable_20200102_2020102_0
👍 1
which won’t replace the old segment
hmm, it’s a transactional segment replacement. I don’t see a way to do it seamlessly right now. @Seunghyun is adding support for group segments replacement, how is it going
🎉 1