Tanmay Krishna
04/01/2022, 2:16 PMTanmay Krishna
04/01/2022, 2:17 PMTanmay Krishna
04/01/2022, 2:20 PMTanmay Krishna
04/01/2022, 2:22 PMRichard Startin
04/01/2022, 2:34 PMRichard Startin
04/01/2022, 2:34 PMRichard Startin
04/01/2022, 2:35 PMRichard Startin
04/01/2022, 2:35 PMTanmay Krishna
04/01/2022, 2:36 PMso having a dictionary would be wastefulYes dictionaries are not needed they won’t be used in filters or groupBy in queries.
Tanmay Krishna
04/01/2022, 2:47 PM{
"schemaName": "ztest_schema_max",
"dimensionFieldSpecs": [
{
"name": "column1",
"dataType": "STRING",
"maxLength": 1000000
},
{
"name": "column2",
"dataType": "STRING",
"maxLength": 1000000
}
],
"dateTimeFieldSpecs": [
{
"name": "producer_timestamp",
"dataType": "LONG",
"format": "1:SECONDS:EPOCH",
"granularity": "1:SECONDS"
}
]
}
Schema2
{
"schemaName": "ztest_schema",
"dimensionFieldSpecs": [
{
"name": "column1",
"dataType": "STRING"
},
{
"name": "column2",
"dataType": "STRING"
}
],
"dateTimeFieldSpecs": [
{
"name": "producer_timestamp",
"dataType": "LONG",
"format": "1:SECONDS:EPOCH",
"granularity": "1:SECONDS"
}
]
}
And inserted same data into both these tables. On pinot UI the size is reported as 730kb for both these tables.Tanmay Krishna
04/01/2022, 2:48 PM"tableIndexConfig": {
"enableDefaultStarTree": false,
"enableDynamicStarTreeCreation": false,
"aggregateMetrics": false,
"nullHandlingEnabled": false,
"rangeIndexVersion": 1,
"autoGeneratedInvertedIndex": false,
"createInvertedIndexDuringSegmentGeneration": false,
"streamConfigs": {
"streamType": "kafka",
"stream.kafka.consumer.type": "LowLevel",
"stream.kafka.topic.name": "events.router.v2.live",
"stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
"stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
"stream.kafka.broker.list": "kafka-kafka-bootstrap.kafka.svc.cluster.local:9092",
"stream.kafka.consumer.prop.auto.offset.reset": "smallest",
"realtime.segment.flush.threshold.rows": "1",
"realtime.segment.flush.threshold.time": "1d",
"realtime.segment.flush.threshold.segment.size": "1m"
},
"loadMode": "MMAP"
}
Total number of rows ingested = 206, Each row is just a duplicate of this.
column1 column2 producer_timestamp
router.optimizer_visibility_event router 1648643989
Tanmay Krishna
04/01/2022, 2:49 PMdictionarized strings will be padded to that lengthGoing by this I would expect table with 1MB columns should be significantly larger than the other. But that doesn’t seem to be the case here. Can you please help me understand why?
Tanmay Krishna
04/01/2022, 2:51 PM