- Hi Team, I have seen that in 0.4.0, pinot has im...
# general
m
• Hi Team, I have seen that in 0.4.0, pinot has implemented the initial version of theta-sketch based distinct count aggregation function, utilizing the Apache DataSketches library. Compared to Druid the latest release which has also included DataSketches extension(Theta sketch, Tuple sketch, Quantiles sketch ,HLL sketch), pinot has any plan to implement other sketchs other than Theta sketch). Thanks.
m
Pinot already supports HLL and TDigest based percentiles. If there's a specific case where you would find DataSketch based implementations more useful, we can definitely explore that. If so, would recommend filing an issue for that.
👍 2
For HLL we use
com.clearspring.analytics.stream.cardinality.HyperLogLog
🙌 1
And for TDigest, we use
com.tdunning.math.stats.TDigest
🙌 1
m
Thanks for quick reply!
m
👍
m
@Mayank we maybe need to pay attention to KLL sketch vs t-digest(pinot impmentation) and seeing the following comparison by datasketches, https://datasketches.apache.org/docs/Quantiles/KllSketchVsTDigest.html
m
Thanks for sharing @Mark.Tang. We can definitely explore adding these if needed.
Also noting that DataSketches includes a latest CPC Sketch: Estimating Stream Cardinalities more efficiently than the famous HLL sketch, which is from https://arxiv.org/pdf/1708.06839.pdf
m
If you could open an issue and add all this there, it would help us track this request @Mark.Tang
m
I will try to open an issue to discuss sketches family @Mayank
m
Thanks @Mark.Tang.
m