<!here> how can we increase pinot ingestion throug...
# troubleshooting
s
<!here> how can we increase pinot ingestion throughput if we are consuming messages from kafka topic . .say the output topic have only 1 partition .. will increasing partition of the kafka topic from where pinot is consuming automatically increase pinot throughput ... means pinot is smart enough to increase the no of consumers consuming from that topic and storing into tables/segments ..
m
Hi @Mayank Glad to say, He is also my team mate who is working on data engineering stuffs.
x
pinot will handle Kafka topic expansion
👍 1
when you use low-level kafka consumer, pinot will create one segment per kafka topic partition
👍 1
so you can scale your ingestion throughput accordingly
👍 1
I think current pinot only handles scale up not scale down , meaning you can only increase kafka topic partitions
👍 1
s
@Mayank regarding the discussion u had with @Mohamed Sultan.. why we need to create a new pinot cluster and restore the backup there.. so the use case is like .. suppose my gcp service account is changed and i need to migrate to a new gcp vpc .. then how can we restore the pinot backup from one cluster to another pinot cluster @Mohamed Sultan @Pugal @Mohamed Kashifuddin @Shailesh Jha please post what is the blocker we faced in restore
thanks a lot @Xiang Fu.. that helps ..
cc: @Mohamed Hussain
m
@Sadim Nadeem if you can copy data from old to new vpc then you can simply point the new cluster to the new vpc.
👍 1
Note though, please don’t have two clusters point to same vpc and shot same tables
👍 1
s
sure mayank .. ofcourse pointing two clusters to same tables will cause issue .. but restoring should not be a blocker @Mohamed Sultan
m
Yes, restore works.
s
@Xiang Fu @Mayank where will pinot store the kafka topic offsets(Checkpointing) while consuming messages from kafka .. since if pinot restarts .. then it should start consuming from the last processed message .. to ensure atleast once processing of all the messages published on the kafka topic from where pinot table consumes
means offsets will be stored in disk or some other kafka topic or some db table etc
x
in segment metadata
the consuming segment has the start offset of kafka topic
segment metadata is stored in zk.
it’s at least once
s
Thanks @Xiang Fu .. actually the problem was before we were using only one partition on kafka topic and we were using low level kafka consumer .. but in few tables .. those inbuilt consumers of pinot seems to getting stuck occasionally and unable to consume msgs from topic even though new msgs/events are published on the topic .. thus thinking of increasing partitions on the kafka topic to ensure reliability , resiliency and higher throughput since as u said earlier that eventually pinot will also increase no of consumers as the no of partitions will increase for low level kafka consumer ..
@Pugal FYI
x
this table conf is good
just curious, is it possible that you occasionally publish null message into kafka topic?
s
might be some garbage data may get published
x
ic, we recently fixed a bug in kafka consumer. null message could cause kafka consumer hanging there: https://github.com/apache/incubator-pinot/pull/6950
s
ok so we need to upgrade the pinot with helm version
is the HelmCharts for Pinot upgraded with this latest fix
I mean which release have these latest fix
Also can you please review this file .. values.yaml pinot chart whether zookeeper url is given correctly or not since I see url empty here mentioned by @Mohamed Sultan .. check attached screenshot .. Is there any correction needed here @Xiang Fu.. Also attaching the file values.yaml for review
x
You can try latest docker image
s
we are using Google Kubernetes Engine deployment of Pinot
x
Change pullPolicy to Always
👍 1
s
Running Pinot in Kubernetes - Apache Pinot Docs https://docs.pinot.apache.org › kubernetes-quickstart 1. Start Pinot with Helm — Pinot repo has pre-packaged HelmCharts for Pinot and Presto. Helm Repo index file is here.
x
Then restart all Pinot pods
s
ok means even if zookeeper url is empty .. its not incorrect
Change pullPolicy to Always -> Sure .. will change that
and then restart all the pinot pods including broker,controller,minion and server
m
seems like pinot is already comes with zookeper by default. @Xiang Fu can you guide us how the default zookeeper connects with pinot?
x
zk url will be auto filled when deploy the helm
👍 1
It’s generated dynamically based on the helmChart name
👍 1
Along with Pinot, helm also deploys a zookeeper
👍 1
s
ok so checkpoints/offsets are stored in this zookeeper only .. thanks
also will change this to always
x
Yes
s
Thanks a lot @Xiang Fu... very grateful for that
m
CC: @Shailesh Jha
s
@Mohamed Sultan