https://pinot.apache.org/ logo
Join Slack
Powered by
# fix_llc_segment_upload
  • s

    Subbu Subramaniam

    05/05/2021, 5:43 PM
    We can have a per table cache, so no need to deduce table names from segment names
  • t

    Ting Chen

    05/05/2021, 5:48 PM
    I do not get why it is not the case when cache is empty. When a controller periodic job starts and it finds the segment name cache is empty, shouldn't it try to scan the realtime tables to check if there is segment required to be fixed?
  • c

    Chang Liu

    05/05/2021, 5:56 PM
    I think it comes to the case when the table with all segments in deep store. The cache is empty for that table all the time. So we should avoid the scan for this kind of tables just because of the cache is empty?
  • c

    Chang Liu

    05/05/2021, 6:03 PM
    But what about the tables which are created after the controller restart or leadership change?
  • c

    Chang Liu

    05/05/2021, 6:06 PM
    If we only do ZK scan during (1) a controller starts, (2) leadership changes , these tables will be ignored.
  • c

    Chang Liu

    05/05/2021, 6:12 PM
    That’s why I think probably we could do this: https://apache-pinot.slack.com/archives/C02130N2128/p1620167285000800
    Copy code
    The above implementation might be over complicated. I think probably we can just build a counter map to count how many rounds it passed since the the last ZK access per table level? i.e.  if the threshold is 10, get the segment list from ZK in round 1 and try to fix, the 9 remaining rounds just ignore this table.
  • s

    Subbu Subramaniam

    05/05/2021, 8:51 PM
    I am not sure I follow. I proposed that we should keep a list of esgments in mem -- segments for which deep store was not available during commit. In case of restart or contrller mastership change. re-fetch the list.
  • c

    Chang Liu

    06/01/2021, 9:38 PM
    Hi @User, @User, I refactored the ZK access when we get the list of segments for upload retry. Would you mind taking another look? 1. During commit phase, enqueue the segment without download url. 2.
    uploadToSegmentStoreIfMissing
     reads in-memory segments list to fix, only removes the segment from the in-memory list after successful upload. 3. Only
    prefetchLLCSegmentsWithoutDeepStoreCopy
    has access to ZK, which is triggered when setup the periodic jobs. Leadership is also checked in this step. And to further alleviate the ZK access, the filter based on segment creation time is added. https://github.com/apache/incubator-pinot/pull/6778
  • c

    Chang Liu

    06/01/2021, 9:39 PM
    @User your concern about shared constant for time limit check is addressed. I added another constant for it:
    MIN_TIME_BEFORE_FIXING_SEGMENT_STORE_COPY_MILLIS
  • s

    Subbu Subramaniam

    06/01/2021, 9:44 PM
    I will look at it in the next couple of days. thanks for addressing allcomments
  • c

    Chang Liu

    06/01/2021, 9:44 PM
    Thanks Subbu
  • s

    Subbu Subramaniam

    06/02/2021, 7:01 PM
    I think you have one major part unimplemented as yet. You should not be fetching the segments of a table when the periodic task starts. I am not sure if by that time, the controller leadership has been decided. Ideally, you should fetch this when the leadership is decided. Please chat with @User to understand this better and see if a callback can be registered with the lead controller manager.
  • c

    Chang Liu

    06/02/2021, 7:05 PM
    OK. I think a callback func will be a right solution
  • s

    Subbu Subramaniam

    06/02/2021, 7:19 PM
    It may be a bit tricky to set up, etc. You may need to introduce a registration and callback mechanism, perhaps scheduled in a thread (like helix does)
  • c

    Chang Liu

    06/02/2021, 7:21 PM
    I think if that is the case, I may need to open up a new pr for this. For this pr, do you think if it’s OK just to fix the segment cached from committing phase?
  • c

    Chang Liu

    06/02/2021, 7:21 PM
    After we add this call back registration, we can add the ZK access part to LLCRealtimeManager
  • c

    Chang Liu

    06/02/2021, 7:22 PM
    What do you think?
  • s

    Subbu Subramaniam

    06/02/2021, 8:58 PM
    That may be fine, but then on a controller restart, we will lose the cache, right?
  • c

    Chang Liu

    06/02/2021, 8:58 PM
    That’s right
  • c

    Chang Liu

    06/02/2021, 8:58 PM
    So we need a ZK scan
  • c

    Chang Liu

    06/02/2021, 8:59 PM
    But ZK scan logic part depends on the controller leadership change, i.e. registration/ callback
  • c

    Chang Liu

    06/02/2021, 8:59 PM
    So I want to separate this two first
  • s

    Subbu Subramaniam

    06/02/2021, 9:03 PM
    If you are ok with that in the short run, then you can check in as it is (after addrsesing some of the other comments) and put a TODO in front of the 
    setupTask
     method that there is a race condition there in that the controller leadership may not be decided by the time the method is called. In the next PR, you can fix it. Before that, you can also check with jack, how to get notified. [2:02 PM] Oh, another solution to this (without introducing callbacks) is to keep a boolean whether it is needed to download the segment names. If the boolean is true, then download it (when the table is being processed), and initialize the queue. Otherwise, use the queue. [2:03 PM] I think this solution may work a little better since we don't download all the table at the same time. We process a table, and then download the next one.
  • c

    Chang Liu

    06/02/2021, 10:03 PM
    👌
  • c

    Chang Liu

    06/02/2021, 11:21 PM
    Hi @User, I just talked with @User about the leadership change callback. We can use
    addPartitionLeader
     and 
    removePartitionLeader
     . But since they are partitioned based, controller can receive multiple state transition within a short period of time.
  • j

    Jack

    06/02/2021, 11:28 PM
    one workaround is to add a sleep time and count the zk access request only once
  • s

    Subbu Subramaniam

    06/02/2021, 11:34 PM
    @User the need here is to get notified on mastership changes, and invalidate the cache (of bad segments).
  • s

    Subbu Subramaniam

    06/02/2021, 11:35 PM
    I am not awre of addPartitinLeader or removePartitionLeader. Are these callbacks already offered?
  • j

    Jack

    06/02/2021, 11:36 PM
    for a single pinot controller, the mastership changes when a helix state transition is received, that’s where
    addPartitinLeader
    removePartitionLeader
    gets called
  • s

    Subbu Subramaniam

    01/06/2022, 12:36 AM
    @User has left the channel