Slackbot
08/25/2023, 12:03 AMkfaraz
08/28/2023, 2:56 AMdynamic
partitioning is when you are okay with best-effort rollup and want ingestion to be relatively faster. This is typically true for streaming ingestion, where you also don't have the entire data for a time interval yet, so you can't decide the right distribution for hash
and range
partitioning anyway.
But if you have already made up your mind to use hash
or range
, there is no reason why you would want to have best-effort rollup. You might as well use perfect rollup. The check here should probably be reversed, i.e. we throw an error only if you have selected dynamic
and are looking for perfect rollup. If you have selected hash
or range
, Druid can just go ahead and use perfect rollup (and probably spit out a warning message to this effect).Michael Schiff
08/28/2023, 7:17 PMThe check here should probably be reversed, i.e. we throw an error only if you have selectedI agree, the condition that needs to be checked/prevented is asking for dynamic partitioning + perfect rollupand are looking for perfect rollup.dynamic
Michael Schiff
08/28/2023, 7:18 PMIf you have selectedI dont completely agree here -- Druid has the option to disable rollup (use-cases might desire not rolling up data). I think that partitioning is orthogonal to rollup, except when the user has requested guaranteed perfect rollup, in which case Dynamic partitioning is not a valid choiceorhash
, Druid can just go ahead and use perfect rollup (and probably spit out a warning message to this effect).range
Michael Schiff
08/28/2023, 7:30 PMkfaraz
08/29/2023, 2:41 AMI dont completely agree here -- Druid has the option to disable rollup (use-cases might desire not rolling up data).Yeah, you are right. I guess users might want to retain the multiple events even if they are for the same timestamp and dimensions.
kfaraz
08/29/2023, 2:46 AMI think that partitioning is orthogonal to rollup, except when the user has requested guaranteed perfect rollup, in which case Dynamic partitioning is not a valid choiceI think this is almost true, in that, the choice of rolling up is indeed orthogonal to partitioning, but the type of rollup isn't. So
range
and hashed
will always perform perfect rollup when rollup
is enabled.
And dynamic
will always perform best-effort rollup when rollup
is enabled.