Hi, in the GroupByKeyGenerator, under some circums...
# pinot-dev
a
Hi, in the GroupByKeyGenerator, under some circumstances, raw keys are used (by invoking getInternal method on the dictionary) instead of dictId. This does not go through the datafetcher and hence causes increased latency. Is there a way around this?
m
@Richard Startin may have some ideas here
r
hi @Ashish - I'm not sure I understand your concern. Let me clarify how group bys are evaluated. Assuming there is a dictionary in the first place (best case), the group by implementation has roughly 3 phases: 1. read a block of `dictId`s from each of the grouping columns (this is done via
ProjectionBlockValSet.getDictionaryIds*V
, which indirectly uses
DataFetcher
via the
DataBlockCache
) and compute `groupId`s from their combinations, this step produces an array with a
groupId
for each
docId
(encoded by the position in the array). A side effect of this is knowing how many distinct `groupId`s exist for each block. These values are stored in a
GroupKeyGenerator
. 2. segment local aggregation by
groupId
- this is safe to do because the `dictId`s the `groupId`s derive from are defined within the scope of a segment. An aggregation function is invoked for the aggregated value at each
docId
whenever the
groupId
at that
docId
is repeated. The results are stored in a
GroupByResultHolder
. 3. Server level aggregation - `dictId`s (and therefore `groupId`s) are not well defined beyond the scope of a segment so they can't be used beyond the scope of a segment. The `GroupKeyGenerator`s (which know how to map the `groupId`s to the actual group keys) and the
GroupByResultHolder
s (which map
groupId
s to partial aggregates) are combined in an
AggregationGroupByResult
which is then post-processed, the `groupId`s are translated to
groupKeys
(this is where
Dictionary.getInternal
is called), and then the partial aggregates are aggregated and upserted into an
IndexedTable
so I have a few comments, given the above: • "under some circumstances, raw keys are used (by invoking getInternal method on the dictionary) instead of dictId" - `dictId`s are always mapped to the corresponding raw values once segment boundaries are crossed, not just "under some circumstances". • "This does not go through the datafetcher" - the
DataFetcher
retrieves blocks of
dictId
s and this does happen earlier in the evaluation when there is a dictionary present, but the
DataFetcher
can't produce group keys valid across segments • "This does not go through the datafetcher and hence causes increased latency." - this is the kind of statement I would only make having done an experiment with some numbers in hand. • "Is there a way around this?" - seek solutions once it's confirmed there is a problem. 😄
a
Thanks for taking a look. Will attach the profile/stacktrace soon. The retrieval of dictId is ok - it's the call to getInternal that does not seem to go through the cache and hence we see the same latency when the same query is repeated. Usually, second run of the query runs faster but not in the case we observed
r
note that I'm not telling you that
getInternal
is fast, but that building group keys is completely unrelated to the purpose of the
DataFetcher
you can try this out, it might help https://github.com/apache/pinot/pull/8376
for 1D group bys (1 group by expression) there's no point in doing any caching because each extraction will only happen once
a
Will try it out. Not sure why retrieving dict values through getInternal shouldn't go through the data fetcher and cached - it would have done that if the same column was present in projection only. No?
the
DataFetcher
retrieves blocks of values indexed by docId not by groupId
more tangibly, if you have values
Copy code
| dim1  | dim2  | metric |
|-------|-------|--------|
| "a1"  |  "b1" | 10     | 
| "a1"  |  "b2" | 11     | 
| "a2"  |  "b1" | 12     | 
| "a2"  |  "b4" | 9      |
| "a1"  |  "b1" | 8      |
then a call to the
DataFetcher.getStringValuesSV(dim1)
would retrieve
Copy code
"a1", "a1", "a2", "a2", "a1"
and ``DataFetcher.getStringValuesSV(dim2)` would retrieve
Copy code
"b1", "b2", "b1", "b4", "b1"
but there are only 4 groups:
Copy code
("a1", "b1"), ("a1", "b2"), ("a2", "b1"), ("a2, "b4")
the calls to
Dictionary.getInternal
take place when mapping dim1:0 to "a1", dim1:1 to "a2", dim2:0 to "b1", dim2:1 to "b2", dim2:3 to "b4" (no group has "b3"), basically it's a completely different operation to extracting the values for each docId. One thing that could be improved is if dim1 has cardinality x, and dim2 has cardinality y, you will extract each dictionary value from dim1 up to y times, and each from dim2 up to x times, so if the cardinalities are low enough it makes sense to cache the raw values indexed by dictionary id (not docId)
a
Yes, that makes sense.
r
my measurements suggest that this change should result in up to 20% performance improvement, and up to 50% reduction in allocation when there is a large number of group by keys
so we will probably merge this, it would be good to know if you see any improvement and how much
a
Thanks will try it out