https://pinot.apache.org/ logo
Join Slack
Powered by
# general
  • l

    Lars-Kristian Svenøy

    01/13/2022, 6:38 PM
    Hey everyone 👋 When querying Pinot through Presto, is there any way to call the Pinot UDF functions? For example, I would like to use the LASTWITHTIME function, but I am not sure how to do this through Presto.
    Copy code
    SELECT dimension FROM table WHERE some_id = 'identifier' GROUP BY LASTWITHTIME(dimension, timestamp, 'String')
    j
    x
    • 3
    • 4
  • b

    bharath holla

    01/14/2022, 12:56 PM
    Hi everyone! I’m Bharath, and I lead data engineering at Zynga. Pinot seems like a very interesting technology. Currently reading docs to see if it can serve any use cases in our infrastructure. Gonna sit on the sidelines and watch for a bit at the moment.
    👋 3
    d
    m
    • 3
    • 3
  • s

    Sandeep R

    01/18/2022, 2:24 AM
    Hi, I would like to define consumer group in apache pinot table config, So I can monitor the Kafka topic with consumer group Lag(pinot is consuming) Currently, I am running 3K TPS with 3 instance and I noticed even after Kafka ingestion stopped, I still see hug lag and after 20 mins consumer lag is drained, I would like to monitor this consumer group? Is there we can define consumer group here? Also, if we don't define consumer group from consumer application, Kafka will define a random consumer group name like mentioned below, It's very hard to track consumer info from Kafka side, So I would like to define from pinot ?
    Copy code
    console-consumer-32555
    console-consumer-37046
    console-consumer-3568
    console-consumer-11198
    table config:
    Copy code
    {
      "REALTIME": {
        "tableName": "uapi-testing_REALTIME",
        "tableType": "REALTIME",
        "segmentsConfig": {
          "schemaName": "uapi-testing",
          "timeColumnName": "timestamp",
          "replication": "2",
          "replicasPerPartition": "2"
        },
        "tenants": {
          "broker": "DefaultTenant",
          "server": "DefaultTenant",
          "tagOverrideConfig": {}
        },
        "tableIndexConfig": {
          "streamConfigs": {
            "streamType": "kafka",
            "stream.kafka.topic.name": "uapitranlog2",
            "stream.kafka.broker.list": "localhost:6667",
            "stream.kafka.consumer.type": "lowlevel",
            "stream.kafka.consumer.prop.auto.offset.reset": "smallest",
            "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
            "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
            "realtime.segment.flush.threshold.rows": "0",
            "realtime.segment.flush.threshold.time": "240h",
            "realtime.segment.flush.segment.size": "100M"
          },
          "createInvertedIndexDuringSegmentGeneration": false,
          "invertedIndexColumns": [],
          "rangeIndexColumns": [],
          "autoGeneratedInvertedIndex": false,
          "sortedColumn": [],
          "bloomFilterColumns": [],
          "loadMode": "MMAP",
          "noDictionaryColumns": [],
          "onHeapDictionaryColumns": [],
          "varLengthDictionaryColumns": [],
          "enableDefaultStarTree": false,
          "enableDynamicStarTreeCreation": false,
          "aggregateMetrics": false,
          "nullHandlingEnabled": false
        },
        "metadata": {},
        "quota": {},
        "routing": {},
        "query": {},
        "ingestionConfig": {},
        "isDimTable": false
      }
    }
    m
    p
    • 3
    • 3
  • p

    Prashant Pandey

    01/19/2022, 12:07 PM
    Hi folks, we want to redact a part of a string column that contains text as CSV. For example:
    val1,val2,token,val3
    to
    val1,val2,[redacted],val3
    My Google-fu hasn’t been of much help. How can I do this though the query console?
    m
    m
    • 3
    • 14
  • p

    Prashant Pandey

    01/19/2022, 2:33 PM
    Hi folks, it’s me again 😄 What’s the command for deleting rows from Pinot tables?
    DELETE from tableName where predicate1
    gives me a:
    Copy code
    org.apache.pinot.sql.parsers.SqlCompilationException: Caught exception while parsing query: DELETE FROM from baseballStats where baseOnBalls=0
    y
    • 2
    • 4
  • z

    Zsolt Takacs

    01/19/2022, 4:01 PM
    We are looking for a way to limit the aggregation result dimensions, i.e. to select the count of something for every day, and break it down by two dimensions, then get results for all the days, but only the top x dimension pairs for every day. Is there a feature like this planned?
    m
    • 2
    • 3
  • m

    Mesut Özen

    01/19/2022, 10:08 PM
    Hi Team, Is there anyway to create schema template in pinot ? I have dozens of tables which share the same schema. For example, I want to create a schema named "my_schema_*", then that schema will apply on every table starts with my_schema_ prefix.
    m
    m
    • 3
    • 3
  • s

    Sahar

    01/20/2022, 2:39 PM
    Hi 👋 is there a way to provide more than one kafka topic for ingestion in a table definition ("stream.kafka.topic.name") ? I provided a comma separated list but am getting a 500 error saying my kafka topic is not valid. I have two identical topics each from a different db shard that I would like them both landing in the same table
    m
    s
    s
    • 4
    • 12
  • l

    Laxman Ch

    01/20/2022, 7:45 PM
    Hi Folks, We are exploring upsert feature in Pinot. Have few questions around this. Please help me to understand the feature. 1. We are using managed offline flow with 2 days as the buffer time which means they get converted to OFFLINE segments after 2 days. However our REALTIME segments rollup at every 1 hour/partition.  Does the upsert can handle any update within this 2 days time period? 2. How is this handled in managed offline flow. Does these multiple update records for same row gets merged to single row? 3. I'm going through the related design documents available here. But for one document access is closed. Can you please provide access.
    m
    y
    • 3
    • 11
  • s

    suraj kamath

    01/21/2022, 10:19 AM
    Hi Team, We have a dimension table and we want to replace the data of the table on regular basis, regards to which we have the following questions: 1. On checking the documentation we understand that we can provide a fixed name to a particular segment Eg: SEGMENT_NAME How can we achieve the same incase of multiple segments eg: SEGMENT_NAME_1 , SEGMENT_NAME_2 2. For OFFLINE tables is there a way to specify the number of records per segment, similar to realtime.segment.threshold.rows in REALTIME table.
    m
    m
    +2
    • 5
    • 25
  • a

    Abhishek Tomar

    01/21/2022, 10:25 AM
    Question: Dose pinot provide real-time api for consume in mobile app and web apps?
    m
    • 2
    • 4
  • l

    Lars-Kristian Svenøy

    01/21/2022, 11:09 AM
    Hello everyone 👋 Looking for some guidance on something I’d like to do, and if it is feasible to do so with the features available in Pinot. I have a table which sees a lot of traffic, and this table is also one of the most popular tables for time-based aggregations. However, all the aggregation queries want to get a “snapshot” for a customer during certain days, not necessarily only what has happened on that day. So for example, if I wanted to summarise the state of that customer every week, I would want to have a snapshot of the most recent state for that organisation every Monday, meaning all changes since then. Will I need to create a separate table to snapshot state to accomplish this, or does someone have an idea for how I could accomplish this?
    k
    • 2
    • 51
  • s

    Sahar

    01/21/2022, 3:25 PM
    Hi 👋 How much storage is on average required for Pinot relative to source data size? For instance if my source database is 1TB, how much EBS do I need on AWS for the Pinot cluster? I'm imagining it'd be more than the source size due to replication and indexing?
    k
    • 2
    • 6
  • a

    abhinav wagle

    01/21/2022, 11:10 PM
    Is there documentation/quick start guide around how to use
    Pinot-Admin
    utilities inside airflow (https://airflow.apache.org/docs/apache-airflow-providers-apache-pinot/stable/installing-providers-from-sources.html)
    m
    • 2
    • 1
  • p

    Prateek Singhal

    01/22/2022, 12:34 AM
    Hey Everyone, I have data in Postgres that I need to send to Pinot dimension table. Can this be done using Apache Spark batch job? What would be the other/better ways of doing it?
    m
    k
    c
    • 4
    • 10
  • d

    Dan DC

    01/24/2022, 11:12 AM
    Hi, is there an easy way to get the version of pinot that each node of the cluster is running? I.e. I want to know the version for each controller, broker, server and minion via a rest endpoint. I know there is
    pinot-admin.sh -v
    but that gives me th version of each jar on the node where it runs, this is not exactly what I'm looking for
    m
    • 2
    • 6
  • k

    Karin Wolok

    01/24/2022, 12:08 PM
    👋 Welcome to all the new Apache Pinot 🍷 slack members! 👋 Please take a minute and tell us who you are and what brought you here! 😃 @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User
    🖐️ 8
    m
    • 2
    • 2
  • h

    Hassane Moustapha

    01/25/2022, 9:01 AM
    Hello @User. Curiosity... i am comparing Pinot & Druid ...
    d
    k
    m
    • 4
    • 10
  • e

    Evan Galpin

    01/26/2022, 3:55 PM
    Hey folks, I found a little online about the topic but I’m curious if Pinot is well suited for Observability use cases i.e. time-series data representing server metrics like CPU %, mem %, custom metrics. It seems on the surface that Pinot would be very well suited for this shape of data but I thought I’d ask the experts 🙂
    k
    s
    r
    • 4
    • 4
  • d

    Diogo Baeder

    01/26/2022, 6:56 PM
    Hi folks! Just to confirm something: I've been inserting test data in my Pinot tables, and now I want to clean them up, then start inserting production data. The simplest way to achieve this is to drop the tables and then recreate them, right?
    m
    • 2
    • 21
  • m

    Manish Soni

    01/27/2022, 12:31 PM
    Hi Team, We have a REALTIME and OFFLINE User table. We are pushing data to the User table and have setup minion for moving data from REALTIME to OFFLINE table. When data is getting moved to OFFLINE table, is there a way to create segments for an OFFLINE table based on the column value? For Example: Our User table is getting populated from 3 different sources. One of the column of this table is Source and it tells us from which source this particular user data is generated. So, is there a way to create the segments for this table based on the source column value? • The reason we are looking for such use case is - Suppose if the user data from one of the source is wrong, then we can backfill only that particular source segment and not all the other segment.
    m
    m
    +4
    • 7
    • 36
  • s

    Sahar

    01/27/2022, 2:54 PM
    Hi 👋 we are thinking of using Pinot for our user facing reporting data store. Our plan so far had been to take data from source and dump it into Pinot (1:1 mapping between source and Pinot tables). This mean's we'd need PrestoDB/Trino on top of Pinot to address complex JOINs. We are now considering to remodel the data, denormalize and maybe aggregate before pushing data to Pinot. To denormalize the data before pushing it to Pinot, we'd need to have a stream processing framework such as Flink sitting between Kafka and Pinot, right?
    m
    • 2
    • 2
  • s

    Shawn Peng

    01/27/2022, 6:45 PM
    Hi, I see one github there is this issue, the last comment is from 2016 and it said
    Pinot does not support pagination for aggregation/group
    , is it still the case now? Is there any plan to support it?
    ➕ 1
    m
    a
    • 3
    • 4
  • c

    Chris Prokopiak

    01/27/2022, 9:01 PM
    Hi! Are there any rough sizing guidelines for a Pinot cluster? Using anything like data size, documents, etc to be loaded in via Offline data. I'm trying to create a cluster in Kubernetes using the Helm chart and just making wild guesses right now.
    m
    • 2
    • 1
  • s

    Sahar

    01/28/2022, 4:35 PM
    which component in Pinot includes is the sink kafka connector? the reason for asking is that I'm trying to figure out what action is required from our side if Pinot fails to consume messages from a kafka topic and how do we get notified? Is there a metric to watch for?
    m
    • 2
    • 3
  • s

    Sowmiya

    01/31/2022, 9:02 AM
    Could you please suggest a frontend tool(PowerBI, Snowflake etc) for report generation which supports pinot? with documents pls
    k
    s
    • 3
    • 13
  • s

    Shaun Sawyer

    02/02/2022, 3:34 PM
    We read recently that Groovy may be insecure,
    Be careful when you’re using Groovy, with the recent log4J security is definitely important, make sure you know what you’re doing with that feature. I would suggest turning it off if you don’t really need that feature, although it’s very useful because it helps in terms of just writing your own custom function without having to understand Pinot internal details. And once you get the functionality, then you end up going and writing your own UDF.
    (https://www.startree.ai/blogs/apache-pinot-2021-recap-and-2022-roadmap) Is it now recommended to avoid Groovy for ingestion transforms, or perhaps there are certain patterns one should avoid for security reasons?
    m
    k
    • 3
    • 4
  • i

    Ian Chen

    02/02/2022, 7:50 PM
    Hi I am wondering if there's a download link pointing to the latest version?
    👋 1
    m
    • 2
    • 1
  • k

    Karin Wolok

    02/02/2022, 8:21 PM
    Hey everyone! Please help us welcome 👋 all the new Pinot 🍷 slack members! 😃 We're so happy you're here! Would love for you to tell us about who you are and what brought you here! ✏️ @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User
    🍷 4
    t
    • 2
    • 2
  • d

    Dan DC

    02/03/2022, 11:09 AM
    Hello again :) does anyone have any information/insight about the most cost-effective EC2 instance type to deploy each type of pinot node on AWS?
    m
    • 2
    • 4
1...333435...160Latest