https://pinot.apache.org/ logo
Join Slack
Powered by
# general
  • r

    Raj

    03/10/2023, 2:36 PM
    Hi, Any thoughts on this OLAP benchmark. Not sure how reliable/verified this benchmark is. https://benchmark.clickhouse.com/ Pinot is in the middle of the pack
    f
    t
    +3
    • 6
    • 5
  • s

    Saubhagya Awaneesh

    03/11/2023, 2:31 AM
    Hi team, how to 1) limit user access to pinot rest apis? Only selected userid can query via api. 2) setup api token by userid - users who can access api but have dedicated api token. 3) (optional) acl at column / row level?
    m
    • 2
    • 3
  • a

    Ashish Kumar

    03/14/2023, 8:23 AM
    Hi Team, Is it possible to recover deleted realtime segments back into REALTIME table? Context: We had a REALTIME table DRIVER_METRIC, we accidentally deleted the table from pinot console which used 7d default retention of segments. So we can still see the deleted segments in S3 but table is not available for query anymore. Now, we have created DRIVER_METRIC REALTIME table again which is available for query and consuming data from Kafka but it only has last 1day of data (kafka topic retention). Now, we want to push old data into this table from deleted segments folder in S3. Is it possible? If possible how? cc: @Lee Wei Hern Jason
    g
    l
    • 3
    • 5
  • y

    Yarden Rokach

    03/14/2023, 12:42 PM
    Apache Pinot Roadmap 2023 meetup | March 23 For the Community- by the Community 🍷 • In this meetup we’ll be featuring the Apache Pinot roadmap for 2023 ; • Get to hear from Linkedin, Uber, StarTree and more, what they have in store this year for Pinot. • Explore what other community members are working on and Hear what the community wants to see in Pinot. RSVP here >> https://www.meetup.com/apache-pinot/events/291954166/?isFirstPublish=true
  • w

    Weixiang Sun

    03/15/2023, 1:39 AM
    Quick question: How is`EXPLAIN PLAN FOR` generated for Hybrid table? My testing result is that
    EXPLAIN PLAN FOR
    for hybrid table is same as realtime table which is different from offline table. Is it expected?
  • r

    Rohit Yadav

    03/16/2023, 10:16 AM
    Hi community, I am trying to use the lucene text index for a hybrid table setup. I was able to set it up for realtime table without much effort. For offline table part, we rely on a spark job to generate and upload the segment URIs. Do I need to create segments with text index and then upload or does Pinot create the text indexes automatically after the segments without indexes are uploaded?
    k
    • 2
    • 3
  • p

    piby

    03/16/2023, 2:34 PM
    Hi community, Is there any way to specify table and column description in the schema json? We ideally want to store some table metadata right within the schema and not use an external solution for it.
    k
    • 2
    • 3
  • a

    abhinav wagle

    03/16/2023, 9:40 PM
    Hellos, Are there docs available on how to make Udf's work with helm setup.
  • s

    Sameer Awasekar

    03/17/2023, 4:20 AM
    Hi Community, I am exploring the RealtimeToOffline Minion Task. I wanted to confirm if the upload of converted offline segments and update of
    watermark metadata
    is atomic? I do see the segment replacement protocol but I think it doesn't come into picture for RealtimeToOfflineTask but for Merge task.
    m
    • 2
    • 3
  • v

    vishal

    03/17/2023, 7:14 AM
    can we download segments and put data into csv? @saurabh dubey
    s
    m
    • 3
    • 12
  • a

    Ashish Kumar

    03/17/2023, 3:28 PM
    Hi Team, what's the difference between
    LaunchSparkDataIngestionJobCommand
    &
    LaunchDataIngestionJobCommand
    ? When using batch ingestion job (https://docs.pinot.apache.org/basics/data-import/batch-ingestion/spark) which one should be the main class?
    k
    • 2
    • 1
  • n

    Nizar Hejazi

    03/17/2023, 11:52 PM
    I have a Kafka topic with AVRO encoding. The time column is of type (long). The logical type is (timestamp-micros). Any way to convert it to milliseconds and defined a datetime field spec like the following (without having to create a new column):
    Copy code
    {
      "name": "event_time_ms",
      "dataType": "TIMESTAMP",
      "format": "1:MILLISECONDS:TIMESTAMP",
      "granularity": "1:MILLISECONDS"
    }
    m
    t
    j
    • 4
    • 7
  • j

    Jason MacLulich

    03/18/2023, 6:18 AM
    Hi Guys where is the best place to ask about Pinot SQL querying engine? and how it behaves specifically using a very long array expression for the
    IN
    operator?
    s
    • 2
    • 2
  • d

    Deena Dhayalan

    03/20/2023, 7:48 AM
    Hi can anyone make and share me a complete doc for how to start pinot with docker with hdfs setup ?
  • p

    Pratik Tibrewal

    03/20/2023, 5:29 PM
    Hey, Recently we saw very high disk usage for some of our hosts. On investigating, we found in our servers, directories something like this for a table:
    Copy code
    _tmp/tmp-<segment_name>-<timestamp>/tmp-<uuid>
    The segment name in this path^ does not exist anymore for that table (deleted by retention). The contents of the directory are of this manner:
    Copy code
    0	col1.sv.sorted.fwd
    0	col2.mv.fwd
    0	col3.sv.sorted.fwd
    0	col4.sv.sorted.fwd
    0	col5.sv.sorted.fwd
    0	col6.sv.sorted.fwd
    4.0K	col1.dict
    4.0K	col2.dict
    4.0K	col3.dict
    26G	    col4.dict
    132G	col5.dict
    148G	col6.dict
    Any idea what this
    _tmp
    folder signifies and why are they getting created?
    u
    j
    • 3
    • 5
  • a

    Andi Miller

    03/20/2023, 5:37 PM
    is there a recommended way to apply rollups to an offline data import that's come in with
    SegmentGenerationAndPushTask
    ? do I need to trigger a
    MergeRollupTask
    and hope it does it?
    m
    s
    • 3
    • 2
  • a

    abhinav wagle

    03/20/2023, 8:00 PM
    hellos, Any ideas on how folks are providing
    ssl.truststore.location
    as mentioned here part of the pinot Deployment using helm. Is it local
    jks
    file being packaged as part of the docker or being added post cluster deployment. Any ideas/best practices around this ? Thanks !
    m
    • 2
    • 3
  • b

    Bobby Richard

    03/20/2023, 8:44 PM
    ls there any way to backfill segments in a realtime only table?
    m
    e
    • 3
    • 7
  • m

    Mingmin Xu

    03/20/2023, 9:51 PM
    Hello team, I'm looking for some suggestions on how to setup graceful shutdown properly, on brokers and servers, similar as how trino/presto works. Out Pinot cluster is deployed in K8S, to avoid downtime when a pod is restarted, • a server node need to commit any consuming segments, and mark as inactive to avoid new queries coming; • a broker is marked as inactive to avoid new queries, and wait until active queries are finished. cc @Grace Lu
    m
    • 2
    • 5
  • t

    Tim Berglund

    03/21/2023, 5:27 PM
    If you haven’t seen the silly parody videos my team has been making, today is your day: https://www.linkedin.com/posts/startreedata_pinot-s-kafka-hes-a-friend-from-work-activity-7043973647237042177-tzKb/
    🍷 2
  • t

    Tim Berglund

    03/21/2023, 5:30 PM
    All of this madness is remind you of rtasummit.com. Go there and check out the details, look at the program, and register. There’s a lot of Pinot content, and I’d love to see this community there in force.
  • t

    Tim Berglund

    03/21/2023, 5:31 PM
    PM me if you want a discount code. đź’Ą
  • g

    Grace Lu

    03/21/2023, 9:06 PM
    Hi team, want to consult about one of our high cardinality use case and see how to set it up properly with pinot or whether it is a proper use case for pinot. We have a metrics table that contains hundreds of daily level metrics columns that associated with uuids, the data updates daily to add more than 50 millions unique rows every day (add one row for each uuid everyday, and there is millions uuids). A simplified table schema looks like:
    Copy code
    uuid    date    group    metrics_1,    metrics_2.     … metrics_xxxx
    And a typical simplified query we want to run on this table is selecting a bunch of metrics aggregation for certain groups of uuids across days and then aggregate them again by group, eg:
    Copy code
    select 
       group,
       avg(m1),
       sum(m2),
       ...
       avg(mxxx)
    from 
    (
        select 
            uuid,
            group,
            avg(metrics_1) as m1,
            sum(metrics_2) as m2,
            …
            avg(metrics_xxx) as mxxx
        from metrics_table where group in (xxx) and date between aa and bb
        group by 1, 2
    ) group by 1
    When we did preliminary testing previously, we ran into issues of simple aggregation query on uuid takes very long to return, or query return inaccurate approximations due to high cardinality, we want to get some suggestions about whether it is a good use case with pinot, and if it is how to model this with proper cluster config and index config, thank you! cc @Mingmin Xu
    m
    • 2
    • 3
  • a

    Ashish Kumar

    03/22/2023, 2:04 PM
    Hi, 1. what's the difference between building pinot-0.12.0 from source code with
    -Pbuild-shaded-jar
    and without it? 2. Is it possible to shade
    org.apache.hadoop
    being used in main pom.xml in pinot-0.12.0, seems like it's using different version then hadoop being used in our team's cluster. I believe, if we can shade it and build pinot from source code, then it should be fine.
  • y

    Yarden Rokach

    03/22/2023, 3:54 PM
    Join us TOMORROW- Apache Pinot Roadmap 2023 meetup 💥 🍷 In this meetup we’ll be featuring the Apache Pinot roadmap for 2023 ; Get to hear from Linkedin, Uber, StarTree and more, what they have in store this year for Pinot. Explore what other community members are working on and Hear what the community wants to see in Pinot. Meet, chat, and deepen your knowledge in Real-Time Analytics with Pinot. RSVP here: https://www.meetup.com/apache-pinot/events/291954166/?isFirstPublish=true
  • y

    Yarden Rokach

    03/22/2023, 4:15 PM
    https://www.linkedin.com/posts/startreedata_pinot-s-kafka-hes-a-friend-from-work-acti[…]647237042177-tzKb?utm_source=share&amp;utm_medium=member_desktop Just making sure you all saw the latest release… 🤣 Lord of Pinot is in the house @Tim Berglund
    t
    d
    • 3
    • 3
  • k

    Ken Krugler

    03/22/2023, 9:54 PM
    So, way less fun than a Thor remix - https://www.thenile.dev/blog/things-dbs-dont-do. Interesting input for the Pinot roadmap…
    🔥 1
    đź‘€ 1
    t
    • 2
    • 5
  • d

    David G. Simmons

    03/23/2023, 11:48 AM
    Speaking of less interesting than a Thor parody, I thought I'd point y'all to my first blog post for StarTree... Enjoy!
    đź’Ą 2
  • d

    David G. Simmons

    03/23/2023, 11:51 AM
    I also wrote a thing for DZone on Pinot and IoT, if you're at all interested. 🙂 https://dzone.com/articles/real-time-analytics-for-iot
  • t

    Tim Berglund

    03/23/2023, 4:03 PM
    Time for the Apache Pinot Roadmap Meetup! https://www.meetup.com/apache-pinot/events/291954166/
1...606162...160Latest