Apache Pinot #general

Damiano

05/03/2020, 5:46 PM

Hello everybody, i am creating a custom aggregator, I know that Pinot creates blocks for each segment to scan. At the moment i can understand the order of the values because i am using another column, a timestamp column. (Note: at the moment it is not a real timestamp it is just an autoincrement id). I do a simple example, lets suppose a simple array of values inside a table from 0 to 9, then Pinot creates two blocks [0,1,2,3,4] [5,6,7,8,9] I would like to understand if i should deal with randomness inside each block or "globally" (over all the segment). As you told me Pinot will do the sorting at very end, so how the segments are coming inside the aggregators? reading the example i just wrote, could the blocks have mixed data ? something like [7,5,6,3,2] [4,1,9,8,0] ? if yes, doing a sort on a single block could have holes regarding the timestamp, they could be not consecutive because they are mixed in other blocks. Right?

Mayank

05/03/2020, 5:49 PM

Need to check, but blocks coming inside aggregator are in the same sequence in the records in the input segment. So if the input segment is sorted on time, block values will be too.

Mayank

05/03/2020, 5:49 PM

Again, need to check if this indeed the case, and/or good to rely on it

Kishore G

05/03/2020, 5:59 PM

It is in the order, but better to not rely on that. It’s not part of the contract.

Damiano

05/03/2020, 6:02 PM

Damiano

05/03/2020, 6:03 PM

another question

Damiano

05/03/2020, 6:03 PM

i am working on aggregateGroupBySV method. could anyone explain if i must need the serialized part inside that method? i always see the IF statement that checks DataType.BYTES is it mandatory? Thanks

Damiano

05/03/2020, 9:35 PM

hmm i am getting NullPointerException in aggregate() method when i try to read more than one column

Damiano

05/03/2020, 9:35 PM

Copy code

long[] timestamps = blockValSet.getLongValuesSV();
double[] values = blockValSet.getDoubleValuesSV();

i only can read the first

Damiano

05/03/2020, 9:36 PM

i do myAggr(column1, column2)

Damiano

05/03/2020, 9:37 PM

i pass the second column name in AggregationFunctionFactory

Damiano

05/03/2020, 9:44 PM

i think i have to touch *DefaultAggregationExecutor*() because i have seen that Pinot is only passing ONE column, only for distinct() it pass everything

Kishore G

05/03/2020, 9:58 PM

The latest code allows multiple expressions. @User ^^

Damiano

05/03/2020, 10:00 PM

@User seems not, i am reading the DefaultAggregationExecutor class, basically only distinct can have more columns

Mayank

05/03/2020, 10:01 PM

@User do you have the latest code?

Mayank

05/03/2020, 10:01 PM

Are you creating a new function, or hacking an existing one?

Damiano

05/03/2020, 10:02 PM

hmm i think the latest

Damiano

05/03/2020, 10:02 PM

i followed this page: https://docs.pinot.apache.org/basics/getting-started/running-pinot-locally

Damiano

05/03/2020, 10:02 PM

git clone https://github.com/apache/incubator-pinot.git

Damiano

05/03/2020, 10:02 PM

that's it

Mayank

05/03/2020, 10:02 PM

Copy code

function.aggregate(length, resultHolder, blockValSetMap);

Mayank

05/03/2020, 10:02 PM

In defaultAggregation executor you will see ^^

Damiano

05/03/2020, 10:03 PM

yes

Mayank

05/03/2020, 10:03 PM

Here blockValSetMap can have multiple entries, one for each column you want to provide to the aggregation function

Damiano

05/03/2020, 10:03 PM

inside

Copy code

} else if (function.getType() == AggregationFunctionType.DISTINCT { HERE }

Damiano

05/03/2020, 10:03 PM

i need myAgg(column1, column2)

Mayank

05/03/2020, 10:04 PM

Oh, I think this part is in my next PR

Mayank

05/03/2020, 10:04 PM

Not merged yet

Damiano

05/03/2020, 10:04 PM

and then i need to hack

Copy code

if (AggregationFunctionUtils.isDistinct(functionContexts)) { THIS PART TOO }

Damiano

05/03/2020, 10:04 PM

because it always dedicated to DISTINCT