https://pinot.apache.org/ logo
Join Slack
Powered by
# general
  • r

    RK

    05/31/2021, 5:45 AM
    Hello Everyone, I am creating one hybrid which can ingest data from Kafka topic(streaming data) as well as from hdfs Location(batch ingestion). I am aware about stream ingestion process to ingest data from Kafka topic and have created multiple realtime tables. Now I am creating one hybrid table for one of the Kafka topic data is also available at hdfs location for the same topic. I am going through the documents but in offline-config-table.json I couldn't find any properties where we are passing source location as hdfs location.Kindly suggest what is the process to ingest from hdfs also in same table.
    x
    • 2
    • 65
  • s

    Sávio Salvarino Teles de Oliveira

    06/01/2021, 12:24 AM
    Hi! I have two dimensions (customers and sellers) with a fact table with order data. We would like to aggregate the order data by customers and sellers, such as aggregate order amount. We would like to use the Star-tree index, but, the customer can change at any time (name, address, etc) and in the Pinot documentation it says that it does not accept upsert using Star-tree index (https://docs.pinot.apache.org/basics/data-import/upsert#limitations…). What would be the best solution using Pinot?
    m
    k
    • 3
    • 4
  • k

    Kaustabh Ganguly

    06/01/2021, 2:24 PM
    I'm a fresh CS grad and just exploring things. I am new to streaming data, kafka and pinot. I want to merge batched data and streaming data and use pinot on top of it. My solution is to use Kafka connect as it's an ideal solution for merging batched and streaming data into topics & partitions. So my pipeline is basically using kafka for merging and then using pinot for streaming from kafka. Is there a better solution that comes across anyone's mind ? Please correct me if there's any fallacy in my logic.
    m
    • 2
    • 6
  • m

    Mayank

    06/01/2021, 3:27 PM
    I’m the incoming null gets translated into default null value and stored in Pinot. So in your example, “default” will be stored
    p
    • 2
    • 1
  • k

    Ken Krugler

    06/02/2021, 12:27 AM
    My ops guy is setting up Docker containers, and wants to know why the base Pinot Dockerfile has
    Copy code
    VOLUME ["${PINOT_HOME}/configs", "${PINOT_HOME}/data"]
    since he sees that there’s nothing being stored in the
    /data
    directory. Any input?
    m
    d
    x
    • 4
    • 42
  • t

    troywinter

    06/02/2021, 3:17 AM
    I’m getting slow regexp_like performance, for 0.3 billion rows, it is costing nearly 2 secs to match a prefix for a column, but in Druid, the same data using
    like
    operator returned instantly. Is there any configs I can apply to speed up this kind of query?
    s
    k
    • 3
    • 17
  • l

    Lakshmanan Velusamy

    06/03/2021, 5:53 AM
    Hello Community, We are trying to add user defined scalar functions. Can these functions be used in star tree index for pre-aggregation ?
    k
    • 2
    • 3
  • j

    Jonathan Meyer

    06/03/2021, 11:51 AM
    Hello 🙂 Is there a whirlwind tour of Pinot's code base available somewhere ? Some pointers on where to start ?
    m
    n
    • 3
    • 6
  • s

    Sávio Salvarino Teles de Oliveira

    06/03/2021, 3:16 PM
    Hello. What happens when upsert during the real-time ingestion with primary key and event time equals? The documentation says: "When two records of the same primary key are ingested, the record with the greater event time (as defined by the time column) is used.". But when there is a tie, what happens?
    k
    y
    +2
    • 5
    • 14
  • p

    Pankaj Thakkar

    06/05/2021, 9:56 PM
    Thanks for the links @User; @User awesome job on the segment lifecycle videos!
    n
    • 2
    • 1
  • s

    Santhosh CT

    06/08/2021, 3:59 AM
    Hi. We have a usecase to store the incoming user events. We have multiple dimensions where we want to query on. We want to use S3 as deep storage. We also have requirements like, the last half hour data will be queried on frequently like a hot shard. Can we use pinot for this use case? How can we model data optimally for this kind of use case? Do we have data retention support where data older than that can be removed after some time?
    m
    • 2
    • 1
  • j

    Jai Patel

    06/08/2021, 11:44 PM
    For an upsert table I have the order columns: timeColumnName set to my updated_at timestamp. It used to be created_at when I was using an offline-only table. I believe this is the correct change. My question is for the sortedcolumn index, do I need to change it too? For my use case I generally still want to be sorting on created_at. But does upsert required the sortedcolumn be the same as the timecolumn?
    j
    • 2
    • 3
  • a

    Alon Burg

    06/09/2021, 11:00 AM
    Is there a way to query the result of the startree-index for time periods? I guess this type of query is probably executed by ThirdEye?
    m
    • 2
    • 4
  • r

    RK

    06/09/2021, 4:06 PM
    Is there any way to increase server memory while starting pinot server in cluster-mode. I have my server on 2 different nodes , whenever I am trying to refresh my superset dashboard it's firing some queries to pinot and fetching data from server. So one of my server automatically showing dead state when I checked the log so it's showing there is insufficient memory for the Java runtime environment to continue and server stopped working there. Is there any way to resolve this issue. @User @User @User
    k
    j
    • 3
    • 5
  • p

    Pedro Silva

    06/09/2021, 4:43 PM
    Hello, What is the difference between
    segmentsConfig.replication
    &
    segmentsConfig.replicasPerPartition
    for a realtime table?
    j
    j
    m
    • 4
    • 6
  • m

    Map

    06/09/2021, 9:13 PM
    For stream ingestion with Kafka, only JSON format is currently supported right? The input formatslisted here https://docs.pinot.apache.org/basics/data-import/pinot-input-formats are only for batch ingestion?
    x
    k
    • 3
    • 7
  • a

    Alon Burg

    06/10/2021, 9:18 AM
    In the article
    Pinot: Realtime OLAP for 530 Million Users
    it says
    Copy code
    At Linkedin, business events are published in Kafka streams and
    are ETL'ed onto HDFS. Pinot supports near-realtime data ingestion by reading events directly from Kafka [19] as well as data
    pushes from offline systems like Hadoop. As such, Pinot follows
    the lambda architecture [23], transparently merging streaming data
    from Kafka and offline data from Hadoop. As data on Hadoop is a
    global view of a single hour or day of data as opposed to a direct
    stream of events, it allows for the generation of more optimal segments and aggregation of records across the time window.
    Is there a general rule of thumb of when should I keep raw events in Pinot vs aggregated data?
    k
    • 2
    • 1
  • c

    Carl

    06/10/2021, 11:39 AM
    Hi, does current Pinot python client support basic auth for querying Pinot? Is there an example showing how to pass the auth header with python client? Thanks.
    k
    a
    • 3
    • 3
  • r

    RK

    06/10/2021, 6:07 PM
    Is there any way to query Pinot table directly from superset without using prestro as a middleware. I.e. To access pinot table through superset I am using pinot prestro connector than in superset I am using this catalog to connect from table so basically whenever I m firing some queries from superset it's going to pinot with the help of prestro. Since I am not using any joins in query so I believe I can also connect directly superset and pinot without using prestro as a middleware. So I think this way queries will be fast. @User kindly suggest.
    m
    s
    +2
    • 5
    • 31
  • c

    Carl

    06/10/2021, 9:13 PM
    In default, Pinot return 10 rows when query select *, is there a way to change and remove this default limit?
    m
    j
    • 3
    • 6
  • r

    RK

    06/11/2021, 8:17 AM
    Hi everyone , How to convert a string into double in pinot with sum.functiom .I have tried these 2 queries but getting this error. In my pinot schema file I have data type as string for this field so I m not able to take sum. While wring the transformation in config file I have assign default value as null for this field so it has both non null values and null where data is not available.i guess it is not able to cast the null values into double /decimal. Any way to ignore nulls . I have tried where gross_amount is not null but didn't work. Kindly suggest
    x
    • 2
    • 5
  • g

    Gagandeep Singh

    06/11/2021, 2:47 PM
    Hello guys👋 For my studies, I am talking about Pinot and its architecture. I created an activity Diagram for demonstrating the query process within the Cluster, but I think some things are missing. I read the original Paper and orient myself on the Query process section. Unfortunately, I was not fully capable of illustrating it. I would appreciate it if some experts could give me some feedback. Thank you very much!
    m
    • 2
    • 1
  • a

    Aaron Wishnick

    06/11/2021, 3:45 PM
    I'm trying to understand the difference between Segment URI Push and Segment Metadata Push. I was using Segment URI Push and I filled up the disk on the controller. That seems to make sense to me since the controller had to download all the segments. A couple related questions: 1. If I use metadata push, my understanding is that the controller will direct one of the servers to download the segment instead, is that right? 2. Does that mean the controller will use less disk in that case? 3. Is the final state after URI Push and Metadata Push different? I'd assume in both cases, you should end up with segments distributed across servers, is that right? So I'm just curious why the controller's disk filled up, is it supposed to clean up and isn't doing that, or is this behavior expected?
    m
    • 2
    • 14
  • p

    Punish Garg

    06/11/2021, 5:07 PM
    Hello Team, i wanted to understand one thing does pinot provide capability of overwrite any segment data ( like we do overwrite partitioned data in hive table)
    m
    • 2
    • 24
  • a

    Ashish

    06/13/2021, 2:53 AM
    Is this issue resolved - https://github.com/apache/incubator-pinot/issues/5261?
    m
    • 2
    • 26
  • h

    Hamza

    06/14/2021, 3:01 PM
    Hello, I'm runnning Pinot in K8s and I have a job that creates my table, my schema and another job that does the ingestion from a GCS storage. This last job creates segments and store them in a GCS bucket. Is there a way for later runs to load these segments directly from the folder without recreating them ?
    m
    • 2
    • 6
  • j

    Jai Patel

    06/14/2021, 10:55 PM
    What’s the option called to disable upsert?
    k
    j
    • 3
    • 4
  • c

    Chundong Wang

    06/14/2021, 11:01 PM
    Is there any document on how theta-sketch columns should be generated? In the Pinot doc of
    DistinctCountThetaSketch
    it mentioned
    thetaSketchColumn
    . Is that column supposed to be serialized binary (hex string I suppose) of Theta Sketch framework?
    Copy code
    UpdateSketch sketch2 = UpdateSketch.builder().build();
      for (int key = 50000; key < 150000; key++) sketch2.update(key);
      FileOutputStream out2 = new FileOutputStream("ThetaSketch2.bin");
      out2.write(sketch2.compact().toByteArray()); // or hexString()
    m
    • 2
    • 9
  • p

    Pedro Silva

    06/15/2021, 11:23 AM
    Hello, Has anyone succesfully configured Pinot to work in Trino in a Kubernetes environment? Following their documentation, they mention that
    The Pinot broker and server must be accessible via DNS as Pinot returns hostnames and not IP addresses.
    , does this mean the actual pods or the services? Can someone share what their configurations look like? I've tried the trino slack unsuccessfully...
    m
    e
    • 3
    • 89
  • m

    Milan Bracke

    06/16/2021, 7:42 AM
    Hi! Is there a way to write a where clause to match entries that do not match a given regular expression? Using
    not
    just results in an error message.
    x
    • 2
    • 6
1...212223...160Latest