I have some high-level questions. I'm looking at t...
# general
h
I have some high-level questions. I'm looking at the architectural page in the docs: https://docs.pinot.apache.org/basics/architecture 1. Regarding the terminology, is segment store the same as deep store? 2. I assume that if we didn't explicitly configure any storage (i.e., s3, hdfs), it will use the node's local disk? 3. What does Load Segment mean in the figure (specifically, what does the Server do)? Does it copy the segment to a server's local disk? Does it load it in memory? 4. Let say we have a Real Time table ingesting from Kafka and the Pinot cluster is not configured with any specific disk storage (i.e., just using local disk). Once the current consuming segment completes, the server commits it and then writes it to the segment store (i.e., local disk). Will this completed segment be loaded by another server (and start being served by another server)? Or will it just continue to be served by the same server? Are there any policies/settings/config related to this?
n
1. Yes 2. It will use the Controller’s local disk as segment store 3. Loading a segment entails downloading the segment from the segment store onto the server’s local disk, and then memory-mapping it 4. Here local disk will be the controller’s disk. The completed segment is persisted to the segment store, and also loaded by the same server that completed it (in this case, no download, just load in memory). If replication is configured, the other replicas would download the segment from the segment store (unless they also were able to complete the exact same segment). By default, the segment just stays on these og servers. We have a table setting, which moves completed segments to another set of servers, allowing you to separate the consumption and completed parts. We also have table setting for tiered storage, which can move these completed segments to another set of servers based on age of the segments.
h
Thanks for the reply. A few more questions: Are all segments in the segment store loaded into the servers (at all times)? If yes, does this mean that the total local disk (across all server) must be enough to serve the segments? Is there a doc that describes these table settings that you described above?
a
Thanks @Neha Pawar for quick answers. A few follow ups: 1) How is deep storage accessed? For example, when S3 is configured as deep storage, does Pinot uses S3 APIs or mounts it as a file system. What about HDFS?
2) Are all segments loaded in memory (based on partitioning, etc.) by the servers responsible for serving them always? Or they are loaded lazily only when needed as queries are served? If yes, how do you determine which segment will be needed for a particular query (as the indices are kept in the segment itself)? Do you use the segment metadata in the zookeeper to decide which segments are loaded in the memory?
x
@Ashish, 1) it’s using native s3 api, no mounting from pinot
2) not all in memory, all segments are memory mapped.
n
We have these PinotFS implementations for each of these deep store options if you’re curious for 1: https://docs.pinot.apache.org/developers/plugin-architecture/write-custom-plugins/pluggable-storage
Lazy loading is on the roadmap
a
I see - but the PinotFS cannot be used for mmap - because mmap is Java requires RandomAccessFile. Right?
So that's why the segments needs to be copied to local filesystem for loading. Right?
x
right, for query serving path, all segments are downloaded from pinotFS to pinot server local
deep store is used for backup purpose not on query path right now. Lazy loading is on the roadmap
a
Thanks @Xiang Fu and @Neha Pawar. Could you please point to the design document for Lazy loading if available?