https://pinot.apache.org/ logo
Join Slack
Powered by
# general
  • d

    Damiano

    05/03/2020, 5:46 PM
    Hello everybody, i am creating a custom aggregator, I know that Pinot creates blocks for each segment to scan. At the moment i can understand the order of the values because i am using another column, a timestamp column. (Note: at the moment it is not a real timestamp it is just an autoincrement id). I do a simple example, lets suppose a simple array of values inside a table from 0 to 9, then Pinot creates two blocks [0,1,2,3,4] [5,6,7,8,9] I would like to understand if i should deal with randomness inside each block or "globally" (over all the segment). As you told me Pinot will do the sorting at very end, so how the segments are coming inside the aggregators? reading the example i just wrote, could the blocks have mixed data ? something like [7,5,6,3,2] [4,1,9,8,0] ? if yes, doing a sort on a single block could have holes regarding the timestamp, they could be not consecutive because they are mixed in other blocks. Right?
  • m

    Mayank

    05/03/2020, 5:49 PM
    Need to check, but blocks coming inside aggregator are in the same sequence in the records in the input segment. So if the input segment is sorted on time, block values will be too.
  • m

    Mayank

    05/03/2020, 5:49 PM
    Again, need to check if this indeed the case, and/or good to rely on it
  • k

    Kishore G

    05/03/2020, 5:59 PM
    It is in the order, but better to not rely on that. It’s not part of the contract.
  • d

    Damiano

    05/03/2020, 6:02 PM
    ok
  • d

    Damiano

    05/03/2020, 6:03 PM
    another question
  • d

    Damiano

    05/03/2020, 6:03 PM
    i am working on aggregateGroupBySV method. could anyone explain if i must need the serialized part inside that method? i always see the IF statement that checks DataType.BYTES is it mandatory? Thanks
  • d

    Damiano

    05/03/2020, 9:35 PM
    hmm i am getting NullPointerException in aggregate() method when i try to read more than one column
  • d

    Damiano

    05/03/2020, 9:35 PM
    Copy code
    long[] timestamps = blockValSet.getLongValuesSV();
    double[] values = blockValSet.getDoubleValuesSV();
    i only can read the first
  • d

    Damiano

    05/03/2020, 9:36 PM
    i do myAggr(column1, column2)
  • d

    Damiano

    05/03/2020, 9:37 PM
    i pass the second column name in AggregationFunctionFactory
  • d

    Damiano

    05/03/2020, 9:44 PM
    i think i have to touch *DefaultAggregationExecutor*() because i have seen that Pinot is only passing ONE column, only for distinct() it pass everything
  • k

    Kishore G

    05/03/2020, 9:58 PM
    The latest code allows multiple expressions. @User ^^
  • d

    Damiano

    05/03/2020, 10:00 PM
    @User seems not, i am reading the DefaultAggregationExecutor class, basically only distinct can have more columns
  • m

    Mayank

    05/03/2020, 10:01 PM
    @User do you have the latest code?
  • m

    Mayank

    05/03/2020, 10:01 PM
    Are you creating a new function, or hacking an existing one?
  • d

    Damiano

    05/03/2020, 10:02 PM
    hmm i think the latest
  • d

    Damiano

    05/03/2020, 10:02 PM
    i followed this page: https://docs.pinot.apache.org/basics/getting-started/running-pinot-locally
  • d

    Damiano

    05/03/2020, 10:02 PM
    git clone https://github.com/apache/incubator-pinot.git
  • d

    Damiano

    05/03/2020, 10:02 PM
    that's it
  • m

    Mayank

    05/03/2020, 10:02 PM
    Copy code
    function.aggregate(length, resultHolder, blockValSetMap);
  • m

    Mayank

    05/03/2020, 10:02 PM
    In defaultAggregation executor you will see ^^
  • d

    Damiano

    05/03/2020, 10:03 PM
    yes
  • m

    Mayank

    05/03/2020, 10:03 PM
    Here blockValSetMap can have multiple entries, one for each column you want to provide to the aggregation function
  • d

    Damiano

    05/03/2020, 10:03 PM
    inside
    Copy code
    } else if (function.getType() == AggregationFunctionType.DISTINCT { HERE }
  • d

    Damiano

    05/03/2020, 10:03 PM
    i need myAgg(column1, column2)
  • m

    Mayank

    05/03/2020, 10:04 PM
    Oh, I think this part is in my next PR
  • m

    Mayank

    05/03/2020, 10:04 PM
    Not merged yet
  • d

    Damiano

    05/03/2020, 10:04 PM
    and then i need to hack
    Copy code
    if (AggregationFunctionUtils.isDistinct(functionContexts)) { THIS PART TOO }
  • d

    Damiano

    05/03/2020, 10:04 PM
    because it always dedicated to DISTINCT
1...129130131...160Latest