Kishore G
Bruce Ritchie
07/13/2021, 4:26 PMMatt Landers
Luis Fernandez
09/01/2021, 5:01 PMxtrntr
09/05/2021, 6:27 PMKishore G
xtrntr
09/07/2021, 10:31 PMConnectionFactory.fromHostList(brokerUrl)
? im not all that familiar with ZK and i dont see a way in the API to retrieve broker addresses from the zookeeper category of APIs exposed by the controller
https://docs.pinot.apache.org/users/clients/javaXiang Fu
RZ
09/16/2021, 10:44 AMarun muralidharan
09/21/2021, 3:47 PMKamal Chavda
10/08/2021, 8:11 PMPriyank Bagrecha
11/02/2021, 6:09 AM"stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.avro.KafkaAvroMessageDecoder",
"stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory"
table status says bad in cluster manager and i am trying to figure out what i am missing. i am looking at the code in github, and seems like i need to provide schema for parsing however there is a comment saying not to use schema as it will be dropped in future release. any pointers will be greatly appreciated. thanks in advance!Priyank Bagrecha
11/02/2021, 6:18 AMSimpleAvroMessageDecoder
? even that one has the same comment
Do not use schema in the implementation, as schema will be removed from the params
Priyank Bagrecha
11/02/2021, 6:34 AMNiteesh Hegde
11/02/2021, 10:35 AMPriyank Bagrecha
11/02/2021, 6:07 PMNeha Pawar
"stream.kafka.decoder.prop.schema" : "<your avro schema here>"
Priyank Bagrecha
11/02/2021, 6:24 PMOrbit
11/08/2021, 9:44 PMPriyank Bagrecha
11/09/2021, 1:14 AMdimensionsSplitOrder
or even removing one - what happens to the index and the segments? same for functionColumnPairs
. I am thinking of editing as adding a new one and dropping the old one.Priyank Bagrecha
11/09/2021, 7:27 PMSELECT col1, col2, col3, DISTINCTCOUNT(col4) AS distinct_col4
FROM table
GROUP BY col1, col2, col3
the star-tree index looks like
"starTreeIndexConfigs": [
{
"dimensionsSplitOrder": [
"col1",
"col2",
"col3"
],
"skipStarNodeCreationForDimensions": [],
"functionColumnPairs": [
"DISTINCTCOUNT__col4"
],
"maxLeafRecords": 1
}
],
can i also add DistinctCountHLL__col4
and DistinctCountThetaSketch__col4
to functionColumnPairs
and evaluate the performance for all 3 for this query?Jackie
11/09/2021, 9:05 PMdistinctcounthll
because it's intermediate result size is boundedJackie
11/09/2021, 9:05 PMlimit
to the query, or it defaults to 10Priyank Bagrecha
11/09/2021, 9:56 PMPriyank Bagrecha
11/15/2021, 9:46 AMtotalDocs
is 2x/3x for table with inverted index in comparison to table with star-tree index. if it matters, i started querying tables after ~5-10 mins of creating them. i also confirmed this by running
select count(*) from <table_name>
is this expected?Priyank Bagrecha
11/15/2021, 10:11 AMgroup.id =
(basically empty) as so maybe both pinot tables are using the same group id.Priyank Bagrecha
11/15/2021, 12:05 PM"streamConfigs": {
"streamType": "kafka",
"stream.kafka.consumer.type": "lowLevel",
"stream.kafka.topic.name": <topic_name>,
"stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.avro.SimpleAvroMessageDecoder",
"stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
"stream.kafka.broker.list": <broker_list>,
"realtime.segment.flush.threshold.size": "0",
"realtime.segment.flush.threshold.time": "24h",
"realtime.segment.flush.desired.size": "50M",
"stream.kafka.consumer.prop.auto.offset.reset": "largest",
"stream.kafka.consumer.prop.group.id": <group_id>,
"stream.kafka.decoder.prop.schema": <schema>
}
and
"streamConfigs": {
"streamType": "kafka",
"stream.kafka.consumer.type": "highLevel",
"stream.kafka.topic.name": <topic_name>,
"stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.avro.SimpleAvroMessageDecoder",
"stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
"stream.kafka.hlc.bootstrap.server": <broker_list>,
"realtime.segment.flush.threshold.size": "0",
"realtime.segment.flush.threshold.time": "24h",
"realtime.segment.flush.desired.size": "50M",
"stream.kafka.consumer.prop.auto.offset.reset": "largest",
"stream.kafka.consumer.prop.group.id": <group_id>,
"stream.kafka.decoder.prop.schema": <schema>
}
and
"streamConfigs": {
"streamType": "kafka",
"stream.kafka.consumer.type": "highLevel",
"stream.kafka.topic.name": <topic_name>,
"stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.avro.SimpleAvroMessageDecoder",
"stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
"stream.kafka.hlc.bootstrap.server": <broker_list>,
"realtime.segment.flush.threshold.size": "0",
"realtime.segment.flush.threshold.time": "24h",
"realtime.segment.flush.desired.size": "50M",
"stream.kafka.consumer.prop.auto.offset.reset": "largest",
"stream.kafka.consumer.prop.hlc.group.id": <group_id>,
"stream.kafka.decoder.prop.schema": <schema>
}
and none of those worked. finally after looking at code i tried
"streamConfigs": {
"streamType": "kafka",
"stream.kafka.consumer.type": "lowLevel",
"stream.kafka.topic.name": <topic_name>,
"stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.avro.SimpleAvroMessageDecoder",
"stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
"stream.kafka.broker.list": <broker_list>,
"stream.kafka.consumer.prop.auto.offset.reset": "largest",
"stream.kafka.group.id": <group_id>,
"stream.kafka.decoder.prop.schema": <schema>,
"realtime.segment.flush.threshold.size": "0",
"realtime.segment.flush.threshold.time": "24h",
"realtime.segment.flush.desired.size": "50M"
},
and that was able to consume from kafka but i don't see it in the list of kafka consumer groups. logs still say group.id is empty. any help / pointers are appreciated.Priyank Bagrecha
11/15/2021, 12:24 PM"streamConfigs": {
"streamType": "kafka",
"stream.kafka.consumer.type": "highLevel",
"stream.kafka.topic.name": <topic_name>,
"stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.avro.SimpleAvroMessageDecoder",
"stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
"stream.kafka.hlc.bootstrap.server": <broker_list>,
"stream.kafka.consumer.prop.auto.offset.reset": "smallest",
"stream.kafka.hlc.group.id": <group_id>,
"stream.kafka.decoder.prop.schema": <schema>,
"realtime.segment.flush.threshold.size": "0",
"realtime.segment.flush.threshold.time": "24h",
"realtime.segment.flush.desired.size": "50M"
},
but it doesn't consume any events from kafka at all.Neha Pawar
Caesar Yao
06/06/2023, 2:29 AM