https://pinot.apache.org/ logo
Join Slack
Powered by
# time-based-segment-pruner
  • s

    Sidd

    11/14/2020, 12:04 AM
    segment tree is not your traditional balanced BST physically.. it is an array
  • s

    Sidd

    11/14/2020, 12:04 AM
    so not sure how we can use it to find overlapping intervals
  • j

    jiatao

    11/14/2020, 12:06 AM

    https://youtu.be/q0QOYtSsTg4▾

    , the interval search tree is introduced in this video.
  • j

    jiatao

    11/14/2020, 12:06 AM
    I think there's some interval trees named quite similar, so may have confusions.
  • j

    Jackie

    11/14/2020, 12:07 AM
    Let me learn about it, based on the wiki, it can handle our requirement: https://en.wikipedia.org/wiki/Interval_tree
  • s

    Sidd

    11/14/2020, 12:07 AM
    both segment and interval tree store intervals... I think the difference is what you want to query... in this case we want to query overlapping intervals.. which segment tree can't answer
  • j

    Jackie

    11/14/2020, 12:07 AM
    Before this, I want to discuss if we want to handle the corner case of ZK failures
  • j

    Jackie

    11/14/2020, 12:08 AM
    Currently, if some segment ZK metadata is missing due to temporary ZK failures, we still route the segments. But that is not possible with this approach
  • j

    jiatao

    11/14/2020, 12:11 AM
    Why?
  • j

    jiatao

    11/14/2020, 12:11 AM
    If the ZK metadata is missing, we can set the interval as default [0, LONG.MAX_VALUE]
  • j

    jiatao

    11/14/2020, 12:12 AM
    So it won't get filtered out.
  • j

    Jackie

    11/14/2020, 12:13 AM
    If the ZK metadata is missing, you won't know the existence of the segment, thus not able to put this default
  • j

    Jackie

    11/14/2020, 12:14 AM
    Oh, I took it back, you initialize the ZK metadata based on the online segments
  • j

    Jackie

    11/14/2020, 12:22 AM
    After reading the wiki, now I understand the algorithm. We definitely needs more comments explaining the algorithm, or put a pointer to the wiki
  • j

    Jackie

    11/14/2020, 12:22 AM
    Do you have the perf number comparing this approach with the naive one?
  • j

    jiatao

    11/14/2020, 12:24 AM
    Yeah, i'm running it to print the result, maybe 3 more minutes
  • n

    Noah Prince

    11/14/2020, 12:24 AM
    We should not get into a state where we need to manage millions of (logical) segments in one table
    What else what break down at millions of segments?
  • j

    Jackie

    11/14/2020, 12:24 AM
    ZK, cluster management, segment size check etc
  • j

    Jackie

    11/14/2020, 12:25 AM
    I think the solution should be grouping multiple physical segments into one logical segment, and only store ZK metadata for logical segment, as @User suggested
    n
    k
    • 3
    • 17
  • j

    Jackie

    11/14/2020, 12:27 AM
    Broker will prune based on logical segments, and server can map logical segments into physical segments
  • j

    jiatao

    11/14/2020, 12:35 AM
    The intervalMP is the naive method, performance ratio here means how much better search tree method compared with naive. First experiment I fixed the the percentage of resulted segments, and increase the # of segments/table.
    Copy code
    100 segments/table: intervalMP-> 0.0374 milli seconds, intervalST-> 0.00539 milli seconds, 6.9 performance ratio, average segments/table: 100.0, average result size: 8.1, result size percentage 8.1
    1000 segments/table: intervalMP-> 0.0767 milli seconds, intervalST-> 0.0093 milli seconds, 8.23 performance ratio, average segments/table: 1000.0, average result size: 52.0, result size percentage 5.2
    10000 segments/table: intervalMP-> 0.358 milli seconds, intervalST-> 0.056 milli seconds, 6.3 performance ratio, average segments/table: 10000.0, average result size: 501.3, result size percentage 5
    100000 segments/table: intervalMP-> 6.16 milli seconds, intervalST-> 0.86 milli seconds, 7.15 performance ratio, average segments/table: 100000.0, average result size: 5001.0, result size percentage 5
    10000000 segments/table: intervalMP-> 636.1 milli seconds, intervalST-> 230.5 milli seconds, 2.75 performance ratio, average segments/table: 1.0E7, average result size: 500001.6, result size percentage 5
    💯 1
  • j

    jiatao

    11/14/2020, 12:36 AM
    The second experiment, I fixed the # segments/table, and decreases the percentages of resulting segments
    Copy code
    10000000 segments/table: intervalMP-> 1237.3 milli seconds, intervalST-> 432.7 milli seconds, 2.86performance ratio, average segments/table: 1.0E7, average result size: 5000001.1, result size percentage 50.0
    10000000 segments/table: intervalMP-> 697.3 milli seconds, intervalST-> 224.6 milli seconds, 3.1 performance ratio, average segments/table: 1.0E7, average result size: 500001.3, result size percentage 5.0
    10000000 segments/table: intervalMP-> 593.4 milli seconds, intervalST-> 18.97 milli seconds, 31.27 performance ratio, average segments/table: 1.0E7, average result size: 50001.0, result size percentage 0.5
    10000000 segments/table: intervalMP-> 585.3 milli seconds, intervalST-> 3.18 milli seconds, 183.97 performance ratio, average segments/table: 1.0E7, average result size: 5002.1, result size percentage 0.05
    10000000 segments/table: intervalMP-> 582.2 milli seconds, intervalST-> 0.48 milli seconds, 1203.15 performance ratio, average segments/table: 1.0E7, average result size: 501.3, result size percentage 0.0050
    💯 1
  • j

    jiatao

    11/14/2020, 12:38 AM
    If the resulted # of segments are much smaller than # of all segments, interval search tree performs much better.
  • k

    Kishore G

    11/14/2020, 12:38 AM
    great to see this level of details 👏👏
    👍 1
    thankyou 1
  • j

    Jackie

    11/14/2020, 12:39 AM
    Copy code
    10000000 segments/table: intervalMP-> 1237.3 milli seconds, intervalST-> 432.7 milli seconds, 2.86performance ratio, average segments/table: 1.0E7, average result size: 5000001.1, result size percentage 50.0
    Really? Selecting 50% of the segments, the naive one is 2.86 times the tree solution?
  • j

    jiatao

    11/14/2020, 12:39 AM
    I used Concurrent hash map for naive solution
  • j

    jiatao

    11/14/2020, 12:40 AM
    So it may not perform as good as a normal HashMap.
  • j

    Jackie

    11/14/2020, 12:40 AM
    Anyway, I think if we select less segments, the tree solution is definitely much faster
  • j

    Jackie

    11/14/2020, 12:40 AM
    Good job
    👍 2
    thankyou 2
  • j

    Jackie

    11/14/2020, 12:50 AM
    Based on this experiment, in order to handle large number of segments, we might also want to optimize the partition based pruner to pre-group the segments under each partition