https://pinot.apache.org/ logo
Join Slack
Powered by
# pinot-sketches
  • m

    Mayank

    06/01/2023, 8:49 PM
    has renamed the channel from "data-sketches" to "pinot-sketches"
  • m

    Mayank

    06/01/2023, 8:49 PM
    set the channel topic: Discussions for Sketches support in Pinot.
  • m

    Mayank

    06/01/2023, 8:50 PM
    Starting this channel for general discussions around sketches support in Pinot. Please feel free to invite more folks.
    👍 2
  • d

    David Cromberge

    06/09/2023, 12:14 PM
    Hi @balci / @Mayank I’ve created a small PR to upgrade Datasketches within Pinot to 4.0.0. release. One of the improvements that we are interested in is the delta compression in Theta sketches, which can reduce storage overheads by approximately 20% for sketches at capacity. I’m hesitant to enable compression by default because it would require a client upgrade for everyone making use of the raw sketches when querying Pinot. Ideally, this could be configured via Helix variables. What do you think?
    🙏 1
    b
    • 2
    • 1
  • m

    Mayank

    06/20/2023, 5:54 AM
    @David Cromberge @Andi Miller am I reading this right, ST index works with raw version of the function but not with the original version? If so, what’s the reasoning for that (as both share code until the final return)? https://github.com/apache/pinot/pull/10288#issuecomment-1433895884
    a
    • 2
    • 8
  • b

    balci

    07/13/2023, 11:59 PM
    Created a PR to introduce two new datasketch based aggregation functions:
    FrequentLongsSketch
    and
    FrequentStringsSketch
    . Feel free to add comments. https://github.com/apache/pinot/pull/11098 cc @David Cromberge @Mayank
  • b

    balci

    07/14/2023, 12:03 AM
    Also added documentation for
    PercentileKLL
    function[s] here: https://github.com/pinot-contrib/pinot-docs/pull/206/. @Mayank please take a look.
    m
    • 2
    • 3
  • d

    David Cromberge

    10/13/2023, 3:18 PM
    I have now marked my contribution as ready for review and would appreciate any further comments. Thanks Caner for your initial review.
    🚀 1
    🙌 3
  • a

    Andi Miller

    10/19/2023, 4:33 PM
    I've added this one too as an alternative https://github.com/apache/pinot/pull/11835
    🙌 1
  • b

    balci

    10/25/2023, 6:10 PM
    Hi Folks, adding some documentation for the recently introduced FrequentLongsSketches and FrequentStringsSketches. Please take a look, and feel free to merge if it looks good. https://github.com/pinot-contrib/pinot-docs/pull/253 cc @David Cromberge @Mayank
    🙏 1
    👍 1
  • d

    David Cromberge

    10/27/2023, 10:25 PM
    Here is the PR for the other sketches that we have added recently as well: https://github.com/pinot-contrib/pinot-docs/pull/254 cc @balci @Mayank
  • b

    balci

    04/26/2024, 9:04 PM
    Hi Folks, We got Pinot featured on Apache Datasketches website, showcasing the integration and supported functions. 🎉 Thanks for all the contributions which made Pinot one of the best platforms to use Datasketches with. Feel free to review and propose changes if you have other things you’d like to add. cc @Mayank @David Cromberge @Andi Miller
    🎉 5
    m
    a
    d
    • 4
    • 3
  • r

    raghav

    06/12/2024, 6:19 AM
    Hi Folks, Is there support for UDD Sketch in pinot?
  • r

    raghav

    06/19/2024, 12:04 PM
    Hey team, We have a use-case where we need to ingest kll sketches in pinot. Currently we are ingesting raw data in pinot using which we will create sketch. Is it possible/recommended to create sketches internally within pinot as a minion task or we should ingest prebuilt sketches in pinot? Thanks!
    m
    s
    b
    • 4
    • 10
  • s

    Saif Ali Khan

    07/11/2024, 11:00 AM
    Hello everyone, is it alright to do
    distinctCountThetaSketch
    with
    SET_INTERSECT
    along with
    GROUP BY
    on a dimension? For instance, this query gives
    0
    count along with
    GROUP BY
    but gives correct result without the grouping -
    Copy code
    select
      seg_id,
      distinctCountThetaSketch(
    	uid_sketch,
    	'nominalEntries=1024',
    	'seg_id IN (1, 2)',
    	'seg_id = 3',
    	'SET_INTERSECT($1, $2)'
      ) as users
    from analytics
    where seg_id IN (1, 2, 3)
    group by 1
    outputs this result-
    Copy code
    seg_id	users
    1	    0
    2	    0
    3	    0
    • 1
    • 1
  • r

    raghav

    12/23/2024, 9:04 AM
    Hey Everyone, I need help with a PR review. I have added support KLL sketch merge in minion jobs. https://github.com/apache/pinot/pull/14702
  • r

    raghav

    02/11/2025, 7:21 AM
    Hey Team, Does pinot supports creation of CPC/Theta/Tuple sketches during ingestion? From the docs it seems like only HLL sketch are supported. https://docs.pinot.apache.org/developers/advanced/ingestion-level-aggregations
    m
    • 2
    • 1
  • r

    Raunak Binani

    03/03/2025, 3:42 AM
    Hey Team, While using percentileKLL I am getting Caused by: org.apache.calcite.sql.validate.SqlValidatorException: Invalid number of arguments to function 'PERCENTILEKLL'. Was expecting 2 arguments https://docs.pinot.apache.org/release-1.0.0/configuration-reference/functions/percentilekll Docs says it can accept 3 values. Using pinot verison 1.0.0.