https://pinot.apache.org/ logo
s

Sadim Nadeem

05/18/2021, 8:15 AM
<!here> how can we increase pinot ingestion throughput if we are consuming messages from kafka topic . .say the output topic have only 1 partition .. will increasing partition of the kafka topic from where pinot is consuming automatically increase pinot throughput ... means pinot is smart enough to increase the no of consumers consuming from that topic and storing into tables/segments ..
m

Mohamed Sultan

05/18/2021, 8:18 AM
Hi @Mayank Glad to say, He is also my team mate who is working on data engineering stuffs.
x

Xiang Fu

05/18/2021, 8:24 AM
pinot will handle Kafka topic expansion
👍 1
when you use low-level kafka consumer, pinot will create one segment per kafka topic partition
👍 1
so you can scale your ingestion throughput accordingly
👍 1
I think current pinot only handles scale up not scale down , meaning you can only increase kafka topic partitions
👍 1
s

Sadim Nadeem

05/18/2021, 8:27 AM
@Mayank regarding the discussion u had with @Mohamed Sultan.. why we need to create a new pinot cluster and restore the backup there.. so the use case is like .. suppose my gcp service account is changed and i need to migrate to a new gcp vpc .. then how can we restore the pinot backup from one cluster to another pinot cluster @Mohamed Sultan @Pugal @Mohamed Kashifuddin @Shailesh Jha please post what is the blocker we faced in restore
thanks a lot @Xiang Fu.. that helps ..
cc: @Mohamed Hussain
m

Mayank

05/18/2021, 12:58 PM
@Sadim Nadeem if you can copy data from old to new vpc then you can simply point the new cluster to the new vpc.
👍 1
Note though, please don’t have two clusters point to same vpc and shot same tables
👍 1
s

Sadim Nadeem

05/18/2021, 2:00 PM
sure mayank .. ofcourse pointing two clusters to same tables will cause issue .. but restoring should not be a blocker @Mohamed Sultan
m

Mayank

05/18/2021, 4:03 PM
Yes, restore works.
s

Sadim Nadeem

05/24/2021, 7:10 AM
@Xiang Fu @Mayank where will pinot store the kafka topic offsets(Checkpointing) while consuming messages from kafka .. since if pinot restarts .. then it should start consuming from the last processed message .. to ensure atleast once processing of all the messages published on the kafka topic from where pinot table consumes
means offsets will be stored in disk or some other kafka topic or some db table etc
x

Xiang Fu

05/24/2021, 8:05 AM
in segment metadata
the consuming segment has the start offset of kafka topic
segment metadata is stored in zk.
it’s at least once
s

Sadim Nadeem

05/24/2021, 8:20 AM
Thanks @Xiang Fu .. actually the problem was before we were using only one partition on kafka topic and we were using low level kafka consumer .. but in few tables .. those inbuilt consumers of pinot seems to getting stuck occasionally and unable to consume msgs from topic even though new msgs/events are published on the topic .. thus thinking of increasing partitions on the kafka topic to ensure reliability , resiliency and higher throughput since as u said earlier that eventually pinot will also increase no of consumers as the no of partitions will increase for low level kafka consumer ..
@Pugal FYI
x

Xiang Fu

05/24/2021, 8:23 AM
this table conf is good
just curious, is it possible that you occasionally publish null message into kafka topic?
s

Sadim Nadeem

05/24/2021, 8:25 AM
might be some garbage data may get published
x

Xiang Fu

05/24/2021, 8:27 AM
ic, we recently fixed a bug in kafka consumer. null message could cause kafka consumer hanging there: https://github.com/apache/incubator-pinot/pull/6950
s

Sadim Nadeem

05/24/2021, 8:28 AM
ok so we need to upgrade the pinot with helm version
is the HelmCharts for Pinot upgraded with this latest fix
I mean which release have these latest fix
Also can you please review this file .. values.yaml pinot chart whether zookeeper url is given correctly or not since I see url empty here mentioned by @Mohamed Sultan .. check attached screenshot .. Is there any correction needed here @Xiang Fu.. Also attaching the file values.yaml for review
x

Xiang Fu

05/24/2021, 8:43 AM
You can try latest docker image
s

Sadim Nadeem

05/24/2021, 8:43 AM
we are using Google Kubernetes Engine deployment of Pinot
x

Xiang Fu

05/24/2021, 8:44 AM
Change pullPolicy to Always
👍 1
s

Sadim Nadeem

05/24/2021, 8:44 AM
Running Pinot in Kubernetes - Apache Pinot Docs https://docs.pinot.apache.org › kubernetes-quickstart 1. Start Pinot with Helm — Pinot repo has pre-packaged HelmCharts for Pinot and Presto. Helm Repo index file is here.
x

Xiang Fu

05/24/2021, 8:45 AM
Then restart all Pinot pods
s

Sadim Nadeem

05/24/2021, 8:45 AM
ok means even if zookeeper url is empty .. its not incorrect
Change pullPolicy to Always -> Sure .. will change that
and then restart all the pinot pods including broker,controller,minion and server
m

Mohamed Sultan

05/24/2021, 8:47 AM
seems like pinot is already comes with zookeper by default. @Xiang Fu can you guide us how the default zookeeper connects with pinot?
x

Xiang Fu

05/24/2021, 8:59 AM
zk url will be auto filled when deploy the helm
👍 1
It’s generated dynamically based on the helmChart name
👍 1
Along with Pinot, helm also deploys a zookeeper
👍 1
s

Sadim Nadeem

05/24/2021, 9:06 AM
ok so checkpoints/offsets are stored in this zookeeper only .. thanks
also will change this to always
x

Xiang Fu

05/24/2021, 9:13 AM
Yes
s

Sadim Nadeem

05/24/2021, 9:14 AM
Thanks a lot @Xiang Fu... very grateful for that
m

Mohamed Sultan

05/24/2021, 3:56 PM
CC: @Shailesh Jha
s

Sadim Nadeem

06/01/2021, 10:37 AM
@Mohamed Sultan