Hello I ve been reading the Pinot documentation and I m a bi Apache Pinot #general

Hello, I've been reading the Pinot documentation a...

Pedro Silva

05/06/2021, 10:23 AM

Hello, I've been reading the Pinot documentation and I'm a bit confused regarding the data that Controller & Server are responsible for respectively. My understanding is that Server instances store actual data segments/partitions of a table. Controllers store only a mapping of which servers store which segments for a given table. If this is the case, what does it mean when a segment is uploaded to a Controller? As mentioned in: "Controller - When a segment is uploaded to controller, the controller saves it in the DFS configured."

Pedro Silva

05/06/2021, 10:28 AM

My use-case is, I want to create a realtime table with upsert capabilities that consumes from a partitioned kafka topic. As I understand it: • The Pinot Server instances are responsible for consuming from the kafka partitions into local segments. • The Pinot Controller instances contain a mapping of which servers contain certain segments. • Once the segments are completed (whether by space or time requirements) they are uploaded to a distributed file system.

Mayank

05/06/2021, 1:09 PM

Servers store a local copy of the data for faster query serving. Controller maintains a mapping of segment to server, but also stores a golden copy of segments in the plugged in storage (HDFS/S3/etc)

Mayank

05/06/2021, 1:10 PM

The distributed file system you are referring to is hooked up to the controller

Pedro Silva

05/06/2021, 2:40 PM

What is the golden copy for?

Pedro Silva

05/06/2021, 2:52 PM

Don't the servers upload their segments to the plugged storage?

Mayank

05/07/2021, 5:06 PM

Yes, that is correct. The copy in the plugged storage is the one I referred to as golden copy

Open in Slack

Previous Next