https://pinot.apache.org/ logo
Join Slack
Powered by
# getting-started
  • p

    Prakhar Pande

    09/22/2022, 7:18 PM
    Hi everyone! I have very recently started exploring Pinot. I am facing a few problems while ingesting data from Kafka topic. 1. Is there a way I can delete data from pinot cluster but keep it in deep store? 2. if suddenly my cluster goes down, then from which state deep store will my restore data?
    h
    • 2
    • 2
  • p

    Prakhar Pande

    09/22/2022, 7:18 PM
    Thanks in advance.
  • m

    Mamlesh

    09/27/2022, 5:53 AM
    Hi Everyone, im facing one issue in stream data ingestion. Ive creted realtime table, where i had set segment.flush.threshold.rows to 0 and setting size to 200M But while querying on data im only able to query on totalDoc 200000, but there are more records in Kafka topic. Ive tried to set threshold size and rows, but facing same issue on another relatime tables also.
    m
    n
    • 3
    • 13
  • e

    Edgaras Kryževičius

    09/27/2022, 10:04 AM
    Hey, we with the team are looking to adapt Pinot to our environment and I have some questions before that: 1. Do you support Delta input type? We have data in Delta format, which are partitioned by date. I can see that you support .parquet files, would it for Delta storage as well? 2. Do you have any benchmarks/performance tests done on Pinot, which you could share with us? Thanks!
  • m

    Mayur Sharma

    09/27/2022, 12:56 PM
    Hello, are there any thoughts on using Pinot as an online feature store?
    m
    • 2
    • 7
  • w

    Wojciech Wasik

    09/28/2022, 5:08 PM
    Hey, I’m working on POC for Pinot for our team, but I have a problem deploying it to AWS. I set up the k8n cluster on EKS by following the instructions https://docs.pinot.apache.org/basics/getting-started/public-cloud-examples/aws-quickstart, then I moved to start Pinot with Helm(https://docs.pinot.apache.org/basics/getting-started/kubernetes-quickstart), but the broker keeps crashing, everything else is in the pending state, logs for broker show
    NullPointerException
    , more context on screenshots. I cannot run locally as well I have M1 macbook. I tried to compile it from the source, but it failed with
    InvocationTargetException
    k
    m
    +3
    • 6
    • 56
  • p

    Prakhar Pande

    09/30/2022, 7:05 AM
    Hi ! Just checked V2 query engine has support for joins. When is the team planning to launch it in stable mode ?
    t
    k
    • 3
    • 3
  • p

    Prakhar Pande

    10/06/2022, 9:43 AM
    Hi, Is there a way to backup zookeeper metadata on RDS (or S3) for disaster recovery? Thanks in advance.
    p
    • 2
    • 1
  • m

    Mamlesh

    10/06/2022, 6:52 PM
    Hi Everyone, i've facing one issue regarding segments retention, as i've used RealtimeTable.json example from pinot 0.10.0 docs "retentionTimeUnit": "DAYS", "retentionTimeValue": "5", but after 5 days segments still there in controller data path also in server data path. did i miss something in my table json.
    m
    • 2
    • 23
  • m

    Madhukar S

    10/06/2022, 8:40 PM
    Hi Team - I am working on a Pinot POC with Helm on AWS EKS (https://docs.pinot.apache.org/basics/getting-started/kubernetes-quickstart). I was able to deploy on EKS 1.21 last week but now iam trying to deploy on EKS 1.22 version. The broker keeps crashing, only zookeeper running everything else is in the pending state.
  • m

    Madhukar S

    10/06/2022, 8:41 PM
    Copy code
    test                pinot-broker-0                                   0/1     CrashLoopBackOff   5 (10s ago)   3m46s
    test                pinot-controller-0                               0/1     Pending            0             3m46s
    test                pinot-server-0                                   0/1     Pending            0             3m46s
    test                pinot-zookeeper-0                                1/1     Running            0             3m46s
    p
    • 2
    • 6
  • a

    Abhishek Dubey

    10/08/2022, 9:42 AM
    Hi Team, we have a requirement to have data model for Pinot for real-time dashboard (superset) use-case. There are multiple tables in data lake of transactional nature for different product lines (fact) and there are some dimensions. dashboard requires to source all transactions (all fact table data in single view segregated by product). 2 questions 1. Which one is preferred way - model the data having all product lines in single schema (merging all columns of N tables - one big table having 150+ columns) ? OR to have each product table separate ? for first option, since the sourcing would be via different streams, is "one-table-fed-by-multiple-source-stream" supported in Pinot ? With second option, additional level of joins would be required and in such case, are joins preferred at application side (superset) or within Pinot ? 2. I could see from docs that joins are not supported (neither subqueries) - https://docs.pinot.apache.org/users/user-guide-query/querying-pinot - but in one of the channels I could see broadcast joins are supported in newer version. However, both fact-to-fact join and broadcast joins are required for different use cases.
  • m

    Mayank

    10/08/2022, 12:39 PM
    1. Do all product lines share the same table schema or have overlap or are mutually exclusive? Also is there a way to segregate tables in a way that reduces number of dimensions and does not require joins? If not, then single table would also work. 2. Joins is being worked on right now, the v2 engine is available as alpha
    a
    • 2
    • 4
  • l

    Larry Meadors

    10/11/2022, 2:07 PM
    i am very new to pinot, but have some high level questions and am happy to RTFM if they are answered somewhere and this is the wrong place to ask them; for example, is pinot suitable for use in high volume read situations? for example, if i have many REST or graphql services hitting it for a several thousand users, is that an appropriate use for it, or am i "holding it wrong"?
    k
    m
    • 3
    • 6
  • f

    francoisa

    10/12/2022, 8:56 AM
    Hi. Just a quick UI question in v0.11.0 many things have been paginated is there a way to set up default number of value per page ? The 10 elems by default is a bit annoying 😄
    j
    • 2
    • 4
  • m

    Mamlesh

    10/12/2022, 9:19 AM
    Hi All, can anyone tell me difference between 'retentionTimeValue' from segment config and 'realtime.segment.flush.threshold.time' in stream config section in table config. i am bit confused now. doc: https://docs.pinot.apache.org/configuration-reference/table
    n
    • 2
    • 2
  • m

    Machhindra

    10/12/2022, 3:37 PM
    Hello everyone, Did anybody try histogram function?
    SELECT HISTOGRAM(value, 0, 200, 10) as hist from metrics
    I am getting following error - “message”: “QueryExecutionError\norg.apache.pinot.spi.exception.BadQueryRequestException Unsupported function: histogram not found I installed pinot using helm on k8s.
    r
    • 2
    • 17
  • s

    Steven Hall

    10/12/2022, 10:35 PM
    Hi Team Section 3.1 of the getting started guide https://docs.pinot.apache.org/basics/getting-started/kubernetes-quickstart Shows setting up Kafka. It looks like the /incubator is deprecated helm search repo incubator | grep kafka incubator/kafka 0.21.5 5.0.1 DEPRECATED Apache Kafka is publish-subscribe me... I have spent some time trying to work around this but I find myself learning about K8s and not Pinot. Anyone have a work around for this?
    r
    • 2
    • 4
  • a

    Abdelhakim Bendjabeur

    10/13/2022, 12:44 PM
    Hello 👋 Does anyone have feedback on Pinot performance when retrieving a large mount of rows? like a 1-year data for a given statistic for +5000 users ! (day granularity or hourly at the worst case) Does Pinot have a decent performance there and the mains problems can only come from the network?
    m
    r
    • 3
    • 4
  • m

    Mamlesh

    10/13/2022, 4:33 AM
    Hi everyone, I am using v0.10.0 Is thier any way to query on only committed segments? in my case ive got my consuming segments on Error state somehow, because of that unable to query.
    a
    • 2
    • 1
  • r

    Rohit Anilkumar

    10/14/2022, 11:22 AM
    Hey quick question, when the data is moved to deep storage (after the retention period) is it still queryable or do we need to reload that back like a batch ingestion job?
    r
    l
    • 3
    • 5
  • g

    Gerrit van Doorn

    10/14/2022, 8:20 PM
    Hi folks, in a JobSpec, the
    outputDirURI
    is generally pointing to …where? Deep Store? Will Pinot download the segments from this place?
    r
    • 2
    • 2
  • a

    Abhishek Dubey

    10/17/2022, 3:02 AM
    Hi Team, we're planning to have multiple fact tables as part of Pinot data model for streaming use-case with timestamp based incremental segments of 30 min. Before presenting, there is a requirement in final dashboard to join the data from these fact tables. Does Pinot support segment based lookup ? eg. lookup on fact2 from fact1 on key columns on selected fact2 segments (2 days) ?
    k
    r
    • 3
    • 11
  • m

    Mamlesh

    10/17/2022, 4:35 AM
    Hi All, Can anyone explain how we can use 'RealtimeProvisioningHelper' As i am using it with complete segment got some error 'java.lang.OutOfMemoryError: Direct buffer memory' everytime. command i've used 'sh pinot-admin.sh sh pinot-admin.sh RealtimeProvisioningHelper -tableConfigFile=/home/mamlesh/pinot/ingestData/massRealtimeTable.json -numPartitions=1 -pushFrequency=Append -numHosts=1,2,3 -numHours=1,2,3,4,5,6,7,8,9,10,11,12 -sampleCompletedSegmentDir=/disk1/pinotData/server/index/MassDataTableIR_REALTIME/MassDataTableIR__8__3__20221012T1842Z -ingestionRate=500 -maxUsableHostMemory=18G -retentionHours=168 ' is there something wrong in command. i was not able to find documentaion for 'RealtimeProvisioningHelper'.
    n
    n
    s
    • 4
    • 17
  • m

    Mamlesh

    10/18/2022, 6:37 AM
    Hi All, Is there any benchmarking for Pinot officially released. i have some questions someone can answer it. 1. i've some issue on ingestion rate, as ive get around 3500 records/sec on my single segment from kafka stream. As ideally increasing servers in pinot cluster. Ingestion rate should be increased but getting same ingestion on all servers. 2. can we fix number of Segments count on specific pinot server. Ex: not more then 15 segment in any server for specific table.
    k
    x
    • 3
    • 5
  • a

    Abhishek Dubey

    10/18/2022, 11:18 AM
    Hi Team, I'm looking to use custom timed ID (eg. time sorted UUID) as partition key for Pinot tables. eg. a sortable timed ID with leading bits denoting time in epoch. I think it will require pre defined range values to be provided to Pinot for auto segment creation. Is such range segment supported ?
    x
    • 2
    • 1
  • m

    Mamlesh

    10/19/2022, 6:15 PM
    Hi All, i am getting output from 'RealtimeProvisioningHelper Tool' like: =============================================== Memory used per host (Active/Mapped) numHosts --> 3 | numHours 1 --------> 17.72G/17.72G | Optimal segment size numHosts --> 3 | numHours 1 --------> 67.79M | Consuming memory numHosts --> 3 | numHours 1 --------> 7.06G | Total number of segments queried per host (for all partitions) numHosts --> 3 | numHours 1 --------> 168 | ============================================== can anyone explain me what the exact difference b/w 'Memory used per host (Active/Mapped)' : '17.72G/17.72G' and 'Consuming memory' : '7.06G' respectively.
    m
    s
    • 3
    • 2
  • g

    Gerrit van Doorn

    10/20/2022, 7:31 PM
    Just to double check (I find this confusing in the documentation). When I create a real time table FOO, and an offline table FOO (same tenants), that basically is a hybrid table right?
    k
    m
    • 3
    • 3
  • r

    Rohit Anilkumar

    10/23/2022, 8:26 PM
    I have a zookeeper quorum on ec2 instances. Just wanted to check if i can provide the zookeeper quorum addresses as a comma seperated values for controller.zk.str and similar params for broker and server? Just wanted to try this out on EC2 instances instead of EKS.
    x
    m
    • 3
    • 21
  • g

    Gerrit van Doorn

    10/24/2022, 10:22 PM
    Hi folks, I’m running a standalone batch import job. What is the reason for this job never exiting? It imports my files and then just sits there. Is there a way to have it exit?
    m
    h
    • 3
    • 34
1...567...11Latest