You don t need the group id or any of the properties that sa Apache Pinot #getting-started

You don't need the group id or any of the properti...

Neha Pawar

11/15/2021, 3:55 PM

You don't need the group id or any of the properties that say "hlc". Your tables might be out of sync because you've set offset criteria "largest". Each table will start consuming from the last message in the topic, so if your rate of events is high, second table will miss out on events that were emitted between creation of first and second table

Priyank Bagrecha

11/15/2021, 6:34 PM

I tried with smallest instead of largest first and that's where I was seeing the difference and then I started using largest after that. I did see in the code that Pinot uses <table_name>_<timestamp> as a default group id. I am still confused why I don't see it in the list of consumer groups. I'll try again today.

Neha Pawar

11/15/2021, 6:44 PM

The concept of consumer group is not used in low level consumer

Priyank Bagrecha

11/15/2021, 6:45 PM

And I am creating tables in both clusters at the same time using the same topic. if anything I would expect the difference to be smaller and not 2-3x of one another as event rate is low.

Priyank Bagrecha

11/15/2021, 6:46 PM

i see. also forgot to mention that i am using 0.7.1 and kafka 2.x

Priyank Bagrecha

11/15/2021, 6:47 PM

how do i use consumer group with high level consumer? clearly i am missing something when configuring that as well.

Priyank Bagrecha

11/15/2021, 6:57 PM

do i need to use

stream.kafka.hlc.zk.connect.string

and

stream.kafka.zk.broker.url

? i see those in the example table configs in the github repo for high level consumer. kafka cluster has its own zookeeper, and each pinot cluster have their own zookeeper as well.

Neha Pawar

11/15/2021, 7:42 PM

you shouldn’t be using high level, and hence shouldn’t have to worry about consumer group

Priyank Bagrecha

11/15/2021, 7:45 PM

I see. Could you please go into a little bit about why you recommend that?

Neha Pawar

11/15/2021, 7:46 PM

we’ve stopped actively developing high-level consumer and would likely deprecate it soon. All you need for properties is https://docs.pinot.apache.org/basics/data-import/pinot-stream-ingestion/import-from-apache-kafka

Priyank Bagrecha

11/15/2021, 7:47 PM

Got it. Thank you so much once again for your help and time.

Neha Pawar

11/15/2021, 7:53 PM

still doesn’t solve your missing events issue though..is there a way for you to run some queries (like min/max timestamps, or count(*) group by timestamp) to verify that you’re indeed seeing events being missed?

Priyank Bagrecha

11/15/2021, 7:54 PM

Yeah let me try those queries and share results with you.

Priyank Bagrecha

11/15/2021, 8:36 PM

i am setting up everything to be able to run those queries. in the mean time i have few more questions. does low level consumer use group id by itself? or am i wrong in understanding that it uses a default group id based on table name and timestamp? if it is doing that, would merely using a different table name help? if it is using a group id internally i don't understand why kafka-consumer-groups doesn't show it? i do empty space as one of the consumer group. if it is not using a group id, then wouldn't the two tables compete with each other to consume from the same topic in the same kafka cluster?

Priyank Bagrecha

11/15/2021, 9:22 PM

output for

select min(upload_time), max(upload_time) from table

for table with inverted index

Priyank Bagrecha

11/15/2021, 9:22 PM

output for

select min(upload_time), max(upload_time) from table

for table with star-tree index

Priyank Bagrecha

11/15/2021, 10:02 PM

looks like the one with

star-tree

index is lagging behind.

Priyank Bagrecha

11/15/2021, 10:57 PM

used

largest

instead of

smallest

and they tend to be more or less doing similarly well. i think it also helped that i used different table names for table with inverted index v/s table with star-tree index. i don't have any proof othe than what i am seeing 😂 . thank you neha for all the help, your time and patience. much appreciated!

Neha Pawar

11/15/2021, 11:25 PM

oh cool..

Neha Pawar

11/15/2021, 11:26 PM

regarding

does low level consumer use group id by itself? or am i wrong in understanding that it uses a default group id based on table name and timestamp? if it is doing that, would merely using a different table name help? if it is using a group id internally i don't understand why kafka-consumer-groups doesn't show it? i do empty space as one of the consumer group.

- we dont use group id even internally.

Neha Pawar

11/15/2021, 11:28 PM

``if it is not using a group id, then wouldn't the two tables compete with each other to consume from the same topic in the same kafka cluster` - Not sure what you mean by 2 tables should compete with each other. If you’re saying that 2 tables will interfere with each other, such that the messages they each receive are exclusive to the other - then no, that is not what happens. We directly consume from offsets inside the pinot-server, maintaining our own checkpointing

👍 1

Neha Pawar

11/15/2021, 11:29 PM

this might help: https://www.confluent.io/resources/kafka-summit-2020/apache-pinot-case-study-building-distributed-analytics-systems-using-apache-kafka/ This talks about how and why we moved away from high-level to low level, and how it works internally

Priyank Bagrecha

11/15/2021, 11:32 PM

thank you. i do have questions around consuming from kafka and offset management. i'll go through this case study first.

Open in Slack

Previous Next