https://pinot.apache.org/ logo
Join Slack
Powered by
# getting-started
  • v

    Vibhor Jain

    12/08/2021, 4:40 PM
    Hi Team, Is there any support for UNION/UNION ALL type of queries in Pinot? Tried few but no luck. Sample query:  select 'Poor' as grade union all  select 'Good' as grade
    m
    k
    • 3
    • 4
  • l

    Luis Fernandez

    12/09/2021, 9:24 PM
    does anyone have any best practice advice when it comes to maintain changes to schemas/tables like version control and what not
    k
    • 2
    • 2
  • s

    srikanth rangan

    12/25/2021, 5:39 PM
    Copy code
    Hi team, I have JSON tags and metrics. the problem is that the data has METRIC JSON and the Pinot add schema does not allow me to specify metric field as json. How can I go ahead and define my schema ? :  
    { 
            "tags": { "real_name": "marvelmohinish99", "team": "TN4AF0V5W" },
            "metrics": { "salary": 8520, "performance": 22.5 },
            "mtime" : 1456249661342
    }
    k
    • 2
    • 16
  • t

    Trust Okoroego

    01/10/2022, 6:01 PM
    Hi, Please does Pinot support SASL _SSL with SCRAM-SHA-512 mechanism authentication for Kafka brokers connection?
    Copy code
    security.protocol=SASL_SSL
    sasl.mechanism=SCRAM-SHA-512
    sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="XXXXXX" password="XXXXX";
    m
    c
    a
    • 4
    • 6
  • d

    Dash Debabrata

    01/10/2022, 9:31 PM
    Is there a tool to load data and schema from postgres? I would like to get some data into pinot and see how it works for our data, but creating the schema/index manually is a bit of a pain as the tables have large number of columns.
    k
    k
    • 3
    • 5
  • s

    Sahar

    01/18/2022, 5:48 PM
    Hi, I'm new to Pinot and experimenting with it. I have Pinot container running as well as Kafka and Zookeeper. I have created a schema and a table to ingest from my kafka topic realtime, but no data is making it through from kafka to Pinot. I've tested the Pinot container and it can see the kafka container (no connectivity issues). Not sure how to troubleshoot. There are no logs in the PinotBroker.log, PinoServer.log, etc. Is there a step I'm missing or is there something wrong with my config? I have copied these two files to /opt/pinot and the table is created successfully but it just doesn't ingest anything
    def-table.jsondef-schema.json
    k
    • 2
    • 13
  • j

    James Mnatzaganian

    02/02/2022, 9:00 PM
    👋 Hi all, I have a use case that I wanted to see if Pinot would potentially be a good fit for. Note that I haven't yet used Pinot, I'm merely testing the waters before I dive in deep 🙂 Context: • Very high cardinality: ◦ 100's of millions / hour ◦ Billions / day ◦ Expected to be 10s to 100s of billions / month ◦ The cardinality eventually becomes asymptotic ◦ These big numbers are the "simple" case and it's expected that a single field could be one or two orders of magnitude larger than this case • Low dimensionality: <10 fields • The primary query pattern is as follows: ◦ Perform aggregations (count, count distinct, sum, etc...) across one or more fields ◦ Exact matching filtering on aggregated fields, e.g.
    where foo = 'something'
    • The secondary query pattern is the same as the primary, but inexact matching which could potentially included regex • Data will be treated as immutable (exceptions for data deletions for GDPR) • Data will be bulk loaded and already pre-aggregated Questions: 1. Can Pinot efficiently handle data with cardinalities of this magnitude? 2. Are sub-second response times feasible for exact matching? 3. Can Pinot efficiently handle aggregates across multiple fields with high cardinality or is it better to split them into smaller subsets? 4. Any gut feeling estimate as to how large of a cluster I would need? Even if Pinot can work, the next question is how much will it cost 🙂
    👀 1
    m
    • 2
    • 14
  • l

    Luis Fernandez

    02/02/2022, 9:46 PM
    hey friends, I was reading this doc about Pinot managed flows, and I see the recommendation in general for pinot is to have a hybrid table, for our use case we are planning to have a rentention of 2 years, and so far we only had pinot running on realtime setup. With this I have so many question, I definitely see why we should use offline tables as well as realtime ones and move data from realtime to online once certain time threshold has been met, with these I have several questions. 1. Once in production, how can we move all the completed segments from realtime tables to offline ones. (given that our app has been ingesting data for sometime already) 2. Is a 2 year retention for a realtime table just too much? 3. Are we expected to see performance hit from queries that hit the offline tables? 4. is indexing, and partitioning also available on an offline setup? 5. Do you see benefits in terms of storage once you move data to offline tables?
    m
    • 2
    • 14
  • r

    Raluca Lazar

    02/03/2022, 10:12 PM
    Hi all, reading the documentation on querying Pinot as I'm looking for an example of a UNION query. I see a link to the calcite parser that seems to indicate that UNION is supported, but I'm getting an error when I run an actual query
    java.lang.ClassCastException: class org.apache.calcite.sql.SqlBasicCall cannot be cast to class org.apache.calcite.sql.SqlSelect
    . Can someone please confirm whether UNION is supported or not?
    m
    • 2
    • 4
  • l

    Luis Fernandez

    02/09/2022, 8:33 PM
    hey pinot friends, my team and i are looking into how to deploy our schema/table changes into the pinot cluster, we are doing this with version control in github and PRs, one thing that we were wondering is what can we do at the pr lvl to ensure that whatever change is not suppose to break pinot or the current table, are there any validation or dry runs that we can perform with the API that may help us in this regard? what has been your experience with this? thank you!
    ➕ 2
    k
    n
    • 3
    • 9
  • t

    Trust Okoroego

    02/21/2022, 9:50 AM
    Hello! I need to connect presto to Pinot with basic Auth. Could anyone point point me to how I can set this in the pinot.properties Presto catalog configuration.
    x
    m
    • 3
    • 6
  • f

    francoisa

    02/21/2022, 2:28 PM
    Hi 🙂 Even with the doc I cannot find a proper answer ... Can a dimension table be an hybrid table ?
    m
    • 2
    • 2
  • p

    Pavel Stejskal

    02/28/2022, 4:46 PM
    Hello, how to setup TimeoutPerPartition (RealtimeToOfflineSegmentsTask) for minion job? It’s by default 1 hour and I’m unable to override it.
    👀 1
    m
    x
    l
    • 4
    • 10
  • b

    Bobby Richard

    03/03/2022, 2:49 PM
    How does segment size impact offline tables? Does the offline segment ingestion job always create one segment regardless of the number of records, or is it smart enough to create multiple segments of optimal segment size?
    m
    m
    • 3
    • 4
  • l

    Luis Fernandez

    03/03/2022, 5:23 PM
    hey friends, one question, how can i modify the default limit to be all records on a query, or for a given query return all records, since the default is 10 based on this https://docs.pinot.apache.org/users/user-guide-query/querying-pinot#selection
    m
    • 2
    • 2
  • m

    Mayank

    03/03/2022, 11:56 PM
    Can you point me to the part of doc that says:
    Copy code
    "The new segments will have star-tree indexes generated after applying the star-tree index configs to the table config. Currently, Pinot does not support adding star-tree indexes to the existing segments."
    a
    • 2
    • 1
  • f

    francoisa

    03/11/2022, 3:52 PM
    Hi. I’m still working on a poc with trino / pinot. I’m able to do pretty much all I want on the Pinot side. And I’m looking on a way to do json filtering. On the pinot side I’ve found the JSON indexing pretty efficient with a JSON MATCH. But when looking to filter on TRINO side I loose all the power of the JSON Indexing. Any way to filter from Trino on JSON using the power of the JSON indexing ? 😕
    k
    • 2
    • 3
  • a

    Aaron Weiss

    03/14/2022, 5:15 PM
    Is there any way to query a specific cell in a multi-value column? For instance, in the example below, I know that the first values in each column go together, but I can't seem to query them like "SELECT external_recipient_identifier[0], external_recipient_ipaddress[0]". I did try the map_value function for that email, and it did return me the correct ipaddress, but this seems pretty limited because you have to specify an exact filter for the 2nd parameter.
    Copy code
    map_value(external_recipient_identifier, 'Théoden@gmail.com', external_recipient_ipaddress)
    k
    m
    m
    • 4
    • 10
  • a

    Aaron Weiss

    03/14/2022, 7:37 PM
    follow up question, is there a fn that will return what array position a string is in? so for multi-value string column, if I search for a string the ends up being the third value in the array, I would like it to return 3 (or 2 if start from 0)
    k
    • 2
    • 6
  • c

    coco

    03/24/2022, 8:00 AM
    Hi. team. https://docs.pinot.apache.org/basics/components/table#pre-aggregation I'm looking at this article to check the pre-aggregation feature. How do i verify that metrics is pre-aggregated.
    j
    m
    • 3
    • 5
  • b

    Bhaarat Sharma

    03/31/2022, 11:06 AM
    Hi - If I have a large set of S3 files in a object store, what is a good way to run query analysis on it? Can Pinot be used to solve this use case? The S3 files aren't fully static...they can change from time to time. Would love to be pointed at something
    m
    • 2
    • 3
  • s

    Siddhartha Varma

    04/06/2022, 6:06 AM
    hey! was trying to run pinot + thirdeye, but thirdeye docker image does not run I get
    ERR EMPTY RESPONE
    every time… has it not been updated?
    m
    s
    c
    • 4
    • 4
  • l

    Luis Fernandez

    04/07/2022, 6:33 PM
    i’m trying to import at least 2 years worth of data I was looking to see if I could get some guidance on how to go about this, I have been taking a look at the ingestion job framework, is this the way to go about this? what are some of the considerations we have to make when doing this backfills. I see that data is divided by folders which are the days and each of these days will be a segment on pinot, is that right? how do we ensure that the data we are ingesting will still perform well? and what are some of the tips that you could give when moving a lot of data?
    x
    m
    • 3
    • 13
  • a

    Arkadiusz Chmura

    04/10/2022, 7:48 AM
    1. Can StarTree index be used only with offline tables?
    m
    n
    • 3
    • 4
  • a

    Arkadiusz Chmura

    04/10/2022, 7:48 AM
    2. If I have multiple segments on different nodes that belong to a single table, does every segment contains a separate StarTree index? Or is there always a single StarTree index for every table, no matter how many segments it has?
    m
    • 2
    • 1
  • a

    Arkadiusz Chmura

    04/10/2022, 7:48 AM
    3. If there is a separate StarTree index for every segment, how do Brokers aggregate the results from them? For example, I have a table containing movies ratings. One segment holds ratings from August and the second one from September. I configured my StarTree to calculate the average for the rating column. The first segment has a pre-computed value for August. Let's assume it's 4.7. For the second segment, it's 4.95. Now imagine that I execute a query to get the average rating from those two months (August and September). How can the Broker merge the results from these two segments?
    m
    • 2
    • 2
  • a

    Arkadiusz Chmura

    04/10/2022, 7:49 AM
    4. Does the order of dimensions that we provide when configuring the StarTree index matter when it comes to the performance or storage? If so, are there some general heuristics to help us to choose which attributes should go first or last?
    m
    • 2
    • 1
  • a

    Alice

    04/11/2022, 11:47 AM
    Hi team, if I customize a new plugin and want to deploy it in Kubernetes, what should I do if I don’t want to build a new image?
    m
    j
    • 3
    • 19
  • a

    Alice

    04/11/2022, 11:53 AM
    When I set retentionTimeValue to 1 day for a table, segments older than 1 day will be deleted and moved to Deleted Segments folder . Then how long will the data stay there? Is there a default duration?
    k
    m
    n
    • 4
    • 13
  • k

    kaushal aggarwal

    04/11/2022, 12:15 PM
    which column is used for primary key in pinot?
    m
    • 2
    • 1
12345...11Latest