Manish Soni
01/27/2022, 12:31 PMMayank
Manish Soni
01/27/2022, 2:57 PMMayank
Manish Soni
01/28/2022, 4:37 AMMayank
Mohemmad Zaid Khan
02/03/2022, 8:12 AMManish Soni
02/03/2022, 11:24 AMMohemmad Zaid Khan
02/04/2022, 10:45 AMColumnValuePartitioner
which can be used. How I have done it -
• Added below two new config properties in RealtimeToOfflineSegmentsTask
(subject is column in transcript example table)
"RealtimeToOfflineSegmentsTask": {
"partitionerType": "COLUMN_VALUE",
"partitionColumn": "subject"
}
• Generate PartitionerConfig
of PartitionerType *COLUMN_VALUE*
based on these above configs and set it in SegmentProcessorConfig
• Add the partitionId in custome.map
of segment metadata inside the SegmentProcessorFramework.build
method.
What do you think @User?Mohemmad Zaid Khan
02/04/2022, 10:49 AMtableIndexConfig.segmentPartitionConfig
as config only support TABLE_PARTITION_CONFIG
partitioner type implicitly.Mohemmad Zaid Khan
02/08/2022, 7:12 AMMohemmad Zaid Khan
02/10/2022, 10:55 AMMohemmad Zaid Khan
02/10/2022, 10:55 AMMayank
Mohemmad Zaid Khan
02/10/2022, 2:24 PMRealtimeToOfflineSegmentsTask
minion task process with modified code that I have locally.Mohemmad Zaid Khan
02/10/2022, 2:25 PMMohemmad Zaid Khan
02/10/2022, 2:41 PMMayank
Mohemmad Zaid Khan
02/11/2022, 8:43 AMMohemmad Zaid
02/11/2022, 11:43 AMMayank
Jackie
02/11/2022, 5:31 PMXiaobing
02/12/2022, 1:32 AMMohemmad Zaid Khan
02/14/2022, 3:06 AMVaibhav Mittal
02/14/2022, 11:36 AMJackie
02/14/2022, 7:08 PMJackie
02/14/2022, 7:10 PMMohemmad Zaid Khan
02/15/2022, 12:05 PMMohemmad Zaid Khan
02/15/2022, 12:09 PMMohemmad Zaid Khan
02/15/2022, 12:14 PMPartitionFunction.getPartition(Object value)
can return String
and we implement a NoOpPartitionFunction for simply returning column value as partitionId. (a column value can be any type).Jackie
02/15/2022, 6:06 PMString
as the partition id (partition id should be from 0
to numPartitions - 1
). Any specific reason you want to directly use column value as the partition id? Does hash-based or modulo-based partitioning work for your case?Mohemmad Zaid Khan
02/16/2022, 4:20 AMMohemmad Zaid Khan
02/16/2022, 4:22 AMMohemmad Zaid Khan
02/17/2022, 12:06 PMBoundedColumnValue
which is enum based, One can configure the different values for partition column on which he/she wants to partition segments. PartitionId would remain integer value. Broker can also use this partition function to prune segments. An example config would look like -
Here, User want to partition segments on these three subjects given in columnValues
. PartitionId would be 1
for Maths, 2
for English and so on.
PartitionId 0
is reserved for any other subject which are not present in given config but may occur as a value for column.
"tableIndexConfig": {
"segmentPartitionConfig": {
"columnPartitionMap": {
"subject": {
"functionName": "BoundedColumnValue",
"functionConfig": {
"columnValues": "Maths|English|Chemistry"
}
}
}
}
The functionConfig
is persisted along with functionName
into metadata.properties as well as in segment metadata in zookeeper.
In addition to this, I have also looked into multiple column partitioning for offline table.
Please have a look on this PR and provided your feedback on design.
https://github.com/kmozaid/pinot/pull/1/files
Please have a look on following PR https://github.com/kmozaid/pinot/pull/1Jackie
02/17/2022, 6:06 PMBoundedColumnValue
partition function idea looks good to me. I would suggest just adding the new partition function, and try to support multiple partition columns in a separate PR. Currently pinot asserts that there is only one partition column in multiple places, and we need to revisit all of them to ensure multiple partition columns work properlyMohemmad Zaid Khan
02/18/2022, 5:33 AMMohemmad Zaid Khan
02/18/2022, 6:59 AM