Hey Everyone :raised_hand_with_fingers_splayed: I ...
# general
a
Hey Everyone 🖐️ I just started using Pinot and loving it already! I am using stream ingestion with Apache Kafka to import data. I just can't get a grasp on deep storage concept. I don't have HDFS or cloud storage setup currently. •Do segments still get flushed to local disk periodically in the absence of a deep storage? •If i restart my Pinot Servers, would it recover old segments in the absence of a deep storage? •What is the main purpose of a deep storage in a Pinot cluster setup? •Is deep storage a must in a production Pinot cluster setup? I would appreciate if you could share your knowledge with me.
m
Welcome @User :
Copy code
You can think of deep store as the persistent store to keep backup copy of the data ingested into Pinot.

- Serving nodes flush data to "local" disk periodically, but that is their local copy. It goes through a "commit" protocol that involves saving a copy of the data in deep-store to consider the data committed into Pinot.

- Local disk attached to Pinot servers is not viewed as a persistent store. However, that is where Pinot server first looks to load the data (that the Controller asks it to load in IdealState). Only if it doesn't find it locally, it will download it from deep-store.

- As mentioned above, it is used as persistent copy of the data ingested into Pinot, for servers to download (note new servers may join the cluster), disaster recovery, etc

- Yes, in a production setup it is recommended to have a storage that is shared across the controllers. It could be something like NFS, or something like S3, ADLS, GCS, etc.
👍 1
a
Thanks for your detailed answer @User. The matter is clear to me now😊
👍 1