Hello, I've been reading the Pinot documentation a...
# general
p
Hello, I've been reading the Pinot documentation and I'm a bit confused regarding the data that Controller & Server are responsible for respectively. My understanding is that Server instances store actual data segments/partitions of a table. Controllers store only a mapping of which servers store which segments for a given table. If this is the case, what does it mean when a segment is uploaded to a Controller? As mentioned in: "Controller - When a segment is uploaded to controller, the controller saves it in the DFS configured."
My use-case is, I want to create a realtime table with upsert capabilities that consumes from a partitioned kafka topic. As I understand it: • The Pinot Server instances are responsible for consuming from the kafka partitions into local segments. • The Pinot Controller instances contain a mapping of which servers contain certain segments. • Once the segments are completed (whether by space or time requirements) they are uploaded to a distributed file system.
m
Servers store a local copy of the data for faster query serving. Controller maintains a mapping of segment to server, but also stores a golden copy of segments in the plugged in storage (HDFS/S3/etc)
The distributed file system you are referring to is hooked up to the controller
p
What is the golden copy for?
Don't the servers upload their segments to the plugged storage?
m
Yes, that is correct. The copy in the plugged storage is the one I referred to as golden copy