https://pinot.apache.org/ logo
e

Eugene Ramirez

06/16/2021, 3:53 AM
Hi, I am evaluating Pinot for possible production use in my company. I am encountering problem on
back up/restore
feature. I appreciate if anyone can help. Here is my setup. Kubernetes: EKS 1.20.4 Pinot version: 0.7.1 So I enable S3 as deep storage based from this link. Then ingested Parquet data from S3 based on this instruction. Data loaded fine and I can query the expected data from Pinot. Next I simulated replacing the cluster, by uninstalling all pods and its related volumes(therefore losing all state) but kept the segment files in s3 segment location(therefore backup is intact in deep store). Next I reinstalled cluster, and reconfigured the tables. I was expecting that the servers would automatically fetch the segments from deep store as mentioned in previous post, but it does not seem to be happening. Am i missing a step? Thanks in advance. https://apache-pinot.slack.com/archives/C011C9JHN7R/p1623336369017900?thread_ts=1623327667.015000&cid=C011C9JHN7R
k

Kishore G

06/16/2021, 4:20 AM
You cannot undeploy zookeeper
Zookeeper stores the metadata/list of segments
e

Eugene Ramirez

06/16/2021, 4:23 AM
Thank for the reply. May I know what should be the steps in case I have to replace the cluster? Should I keep a backup of zookeeper and restore it to the new cluster?
k

Kishore G

06/16/2021, 4:23 AM
Yes
Or upload all the segments again to new cluster using upload api call
It can be simple script over the segments in S3
e

Eugene Ramirez

06/16/2021, 4:31 AM
Got it. Looking at the
UploadSegment
command, the parameter
segmentDir
requires a local path. This means i have to download the segments first to upload. Is there a way to use the previous cluster’s s3 segment path as source location to new cluster s3 segment path upload?
k

Kishore G

06/16/2021, 4:38 AM
Use uri based or metadata based push
e

Eugene Ramirez

06/16/2021, 4:39 AM
awesome. Thanks. I will try this.
m

Mayank

06/16/2021, 6:11 AM
Do you have realtime component as well?
e

Eugene Ramirez

06/16/2021, 6:18 AM
I can think of several use-cases where Pinot might be useful to us. • As a main backend of our analytics dashboard. Currently we are using Druid, GreenPlum, Tidb etc, but each one have drawbacks • As one of data sources for our Machine Learning Jobs. Currently we are using Athena or direct files from S3, but Athena have upper bound throughput while S3 file is too limited. • As a backend sink of Kafka to complement our real time prediction in production serving.
m

Mayank

06/16/2021, 6:27 AM
Yeah, these sound like great use cases for Pinot. We are here to help you use Pinot successfully for these.
❤️ 1
s

Sadim Nadeem

07/01/2021, 9:16 AM
cc: @Shailesh Jha @Mohamed Sultan @Manju Priyadharshini @Mohamed Hussain @Mohamed Kashifuddin @Pugal