https://pinot.apache.org/ logo
Join Slack
Powered by
# group-by-refactor
  • m

    Mayank

    07/22/2019, 4:54 PM
    @User @User I am looking at the GroupKeyGenerator code, and there are way too many variations (SV/MV/Dict/NoDict/ArrayBased/MapBased/etc). For SQL based order-by we need non-concatenated group key, so we can sort on individual group-by columns.
  • m

    Mayank

    07/22/2019, 4:55 PM
    From code perspective, it might be easier to see if we can modify the existing GroupKeyGenerator to not use single concatenated String key, but an array of objects. What do you guys think?
  • m

    Mayank

    07/22/2019, 4:55 PM
    Otherwise, we will have to implement all the variations for the new GroupKeyGenerator
  • k

    Kishore G

    07/22/2019, 4:58 PM
    we need both right
  • k

    Kishore G

    07/22/2019, 4:59 PM
    we need concatenated group key for fast look up
  • m

    Mayank

    07/22/2019, 4:59 PM
    We need both behaviors of group-by yes. But that is independent of whether group key is single string or individual objects
  • k

    Kishore G

    07/22/2019, 4:59 PM
    no, i meant both concatenated and separate
  • m

    Mayank

    07/22/2019, 4:59 PM
    The only benefit of concatenated I see is performance
  • m

    Mayank

    07/22/2019, 4:59 PM
    Is there any other benefit?
  • k

    Kishore G

    07/22/2019, 5:00 PM
    thats a big one right
  • m

    Mayank

    07/22/2019, 5:00 PM
    I am not sure, we can benchmark how much is the perf gain
  • m

    Mayank

    07/22/2019, 5:00 PM
    We pay the price of concatenate and split (cpu + garbage)
  • m

    Mayank

    07/22/2019, 5:01 PM
    But you are right, we can try to keep both
  • m

    Mayank

    07/22/2019, 5:01 PM
    Without having to duplicate all teh code
  • k

    Kishore G

    07/22/2019, 5:01 PM
    whats the alternative
  • k

    Kishore G

    07/22/2019, 5:01 PM
    right, thats my feeling as well.
  • m

    Mayank

    07/22/2019, 5:01 PM
    Yeah, let me investigate that further
  • k

    Kishore G

    07/22/2019, 5:02 PM
    Think of a List<Comparable[]>
  • k

    Kishore G

    07/22/2019, 5:02 PM
    this is the table
  • m

    Mayank

    07/22/2019, 5:02 PM
    Yes
  • m

    Mayank

    07/22/2019, 5:02 PM
    The table will be in IntermediateResult
  • k

    Kishore G

    07/22/2019, 5:03 PM
    we need a hashmap which will map hashCode(group key) --> list of rows
  • k

    Kishore G

    07/22/2019, 5:08 PM
    can you list out the various scenarios
  • k

    Kishore G

    07/22/2019, 5:08 PM
    no order by
  • k

    Kishore G

    07/22/2019, 5:08 PM
    order by group key
  • k

    Kishore G

    07/22/2019, 5:08 PM
    order by aggregation function
  • m

    Mayank

    07/22/2019, 5:09 PM
    order by some expressions of projected columns (mix of group keys / aggr functions) -> most general form
  • k

    Kishore G

    06/18/2020, 8:57 PM
    @User has left the channel
  • s

    Sashikanth Damaraju

    09/24/2020, 6:32 PM
    @User has left the channel
  • x

    Xiang Fu

    12/07/2020, 11:51 PM
    @User has left the channel