Apache Pinot #time-based-segment-pruner

Join Slack

Sidd

11/14/2020, 12:04 AM

segment tree is not your traditional balanced BST physically.. it is an array

Sidd

11/14/2020, 12:04 AM

so not sure how we can use it to find overlapping intervals

jiatao

11/14/2020, 12:06 AM

https://youtu.be/q0QOYtSsTg4▾

, the interval search tree is introduced in this video.

jiatao

11/14/2020, 12:06 AM

I think there's some interval trees named quite similar, so may have confusions.

Jackie

11/14/2020, 12:07 AM

Let me learn about it, based on the wiki, it can handle our requirement: https://en.wikipedia.org/wiki/Interval_tree

Sidd

11/14/2020, 12:07 AM

both segment and interval tree store intervals... I think the difference is what you want to query... in this case we want to query overlapping intervals.. which segment tree can't answer

Jackie

11/14/2020, 12:07 AM

Before this, I want to discuss if we want to handle the corner case of ZK failures

Jackie

11/14/2020, 12:08 AM

Currently, if some segment ZK metadata is missing due to temporary ZK failures, we still route the segments. But that is not possible with this approach

jiatao

11/14/2020, 12:11 AM

Why?

jiatao

11/14/2020, 12:11 AM

If the ZK metadata is missing, we can set the interval as default [0, LONG.MAX_VALUE]

jiatao

11/14/2020, 12:12 AM

So it won't get filtered out.

Jackie

11/14/2020, 12:13 AM

If the ZK metadata is missing, you won't know the existence of the segment, thus not able to put this default

Jackie

11/14/2020, 12:14 AM

Oh, I took it back, you initialize the ZK metadata based on the online segments

Jackie

11/14/2020, 12:22 AM

After reading the wiki, now I understand the algorithm. We definitely needs more comments explaining the algorithm, or put a pointer to the wiki

Jackie

11/14/2020, 12:22 AM

Do you have the perf number comparing this approach with the naive one?

jiatao

11/14/2020, 12:24 AM

Yeah, i'm running it to print the result, maybe 3 more minutes

Noah Prince

11/14/2020, 12:24 AM

We should not get into a state where we need to manage millions of (logical) segments in one table

What else what break down at millions of segments?

Jackie

11/14/2020, 12:24 AM

ZK, cluster management, segment size check etc

Jackie

11/14/2020, 12:25 AM

I think the solution should be grouping multiple physical segments into one logical segment, and only store ZK metadata for logical segment, as @User suggested

Jackie

11/14/2020, 12:27 AM

Broker will prune based on logical segments, and server can map logical segments into physical segments

jiatao

11/14/2020, 12:35 AM

The intervalMP is the naive method, performance ratio here means how much better search tree method compared with naive. First experiment I fixed the the percentage of resulted segments, and increase the # of segments/table.

Copy code

100 segments/table: intervalMP-> 0.0374 milli seconds, intervalST-> 0.00539 milli seconds, 6.9 performance ratio, average segments/table: 100.0, average result size: 8.1, result size percentage 8.1
1000 segments/table: intervalMP-> 0.0767 milli seconds, intervalST-> 0.0093 milli seconds, 8.23 performance ratio, average segments/table: 1000.0, average result size: 52.0, result size percentage 5.2
10000 segments/table: intervalMP-> 0.358 milli seconds, intervalST-> 0.056 milli seconds, 6.3 performance ratio, average segments/table: 10000.0, average result size: 501.3, result size percentage 5
100000 segments/table: intervalMP-> 6.16 milli seconds, intervalST-> 0.86 milli seconds, 7.15 performance ratio, average segments/table: 100000.0, average result size: 5001.0, result size percentage 5
10000000 segments/table: intervalMP-> 636.1 milli seconds, intervalST-> 230.5 milli seconds, 2.75 performance ratio, average segments/table: 1.0E7, average result size: 500001.6, result size percentage 5

💯 1

jiatao

11/14/2020, 12:36 AM

The second experiment, I fixed the # segments/table, and decreases the percentages of resulting segments

Copy code

10000000 segments/table: intervalMP-> 1237.3 milli seconds, intervalST-> 432.7 milli seconds, 2.86performance ratio, average segments/table: 1.0E7, average result size: 5000001.1, result size percentage 50.0
10000000 segments/table: intervalMP-> 697.3 milli seconds, intervalST-> 224.6 milli seconds, 3.1 performance ratio, average segments/table: 1.0E7, average result size: 500001.3, result size percentage 5.0
10000000 segments/table: intervalMP-> 593.4 milli seconds, intervalST-> 18.97 milli seconds, 31.27 performance ratio, average segments/table: 1.0E7, average result size: 50001.0, result size percentage 0.5
10000000 segments/table: intervalMP-> 585.3 milli seconds, intervalST-> 3.18 milli seconds, 183.97 performance ratio, average segments/table: 1.0E7, average result size: 5002.1, result size percentage 0.05
10000000 segments/table: intervalMP-> 582.2 milli seconds, intervalST-> 0.48 milli seconds, 1203.15 performance ratio, average segments/table: 1.0E7, average result size: 501.3, result size percentage 0.0050

💯 1

jiatao

11/14/2020, 12:38 AM

If the resulted # of segments are much smaller than # of all segments, interval search tree performs much better.

Kishore G

11/14/2020, 12:38 AM

great to see this level of details 👏👏

👍 1

thankyou 1

Jackie

11/14/2020, 12:39 AM

Copy code

10000000 segments/table: intervalMP-> 1237.3 milli seconds, intervalST-> 432.7 milli seconds, 2.86performance ratio, average segments/table: 1.0E7, average result size: 5000001.1, result size percentage 50.0

Really? Selecting 50% of the segments, the naive one is 2.86 times the tree solution?

jiatao

11/14/2020, 12:39 AM

I used Concurrent hash map for naive solution

jiatao

11/14/2020, 12:40 AM

So it may not perform as good as a normal HashMap.

Jackie

11/14/2020, 12:40 AM

Anyway, I think if we select less segments, the tree solution is definitely much faster

Jackie

11/14/2020, 12:40 AM

Good job

👍 2

thankyou 2

Jackie

11/14/2020, 12:50 AM

Based on this experiment, in order to handle large number of segments, we might also want to optimize the partition based pruner to pre-group the segments under each partition