Hey team :wave: We are currently setting-up partit...
# troubleshooting
n
Hey team đź‘‹ We are currently setting-up partition based segment pruning for OFFLINE table scenario (in addition to existing time based pruning). We have Replica-Group Instance Assignment already setup and planning to do Partitioned Replica-Group Instance Assignment to be more effective for our partition based segment pruning. Given the current
replicaGroupPartitionConfig
, any recommendation on how to distribute the (50) servers assignment to (16) partitions for (2) Replica Groups i.e. recommended value for
numInstancesPerPartition
?
Copy code
"segmentPartitionConfig": {
        "columnPartitionMap": {
          "team_id": {
            "functionName": "Murmur",
            "numPartitions": 16
          }
        }
      }
Copy code
"replicaGroupPartitionConfig": {
          "replicaGroupBased": true,
          "numInstances": 0,
          "numReplicaGroups": 2,
          "numInstancesPerReplicaGroup": 25,
          "numPartitions": 0,
          "numInstancesPerPartition": 0
        }
m
One best practice is to evenly distribute the partitions across servers (such that each server has equal number of partitions).
What’s the total data size? 50 servers seem a lot for 16 partitions (unless you have a lot of data per partition)
n
Total size for the table is ~ 12.25 TB. We are partitioning it by day (folder for each ds) & then bucketing by team_id using murmur2 (16 files within each ds folder). Followed your recommendation from this thread.
Not sure how to distribute/arrive at
numInstancesPerPartition
for this scenario: (50) servers assignment to (16) partitions for (2) Replica Groups. 16 is based on numPartitions in columnPartitionMap. Also, is Partitioned Replica-Group Instance Assignment mandatory for partition based segment pruning to take effect?
m
Replica-group and partitioning are independent concepts.
If data is partitioned (as seen by Pinot), partition based pruning will happen
It will also reduce the fanout for all queries where partition key has equality predicate.
Replica-group can be used to limit fanout when you don’t have partitioning
n
Thanks @Mayank! We are implementing both time & partition (murmur) based segment pruning. Does that mean that we can skip doing Partitioned Replica-Group Instance Assignment (as we already have partition pruning enabled)? Our thought process was having Partitioned Replica-Group Instance Assignment in addition to partition based pruning will help reduce the latency & better qps. Looks like that's not really the case as they are more or less doing the same thing based on this doc?