https://pinot.apache.org/ logo
Join SlackCommunities
Powered by
# troubleshooting
  • m

    Map

    09/24/2021, 4:17 PM
    When we run Trino with Pinot 0.8.0, we get the error
    unsupported data table version: 3
    . It seems Trino still uses Pinot client 0.6.0. Do we have plans on upgrade it to 0.8.0? If it is as simple as bumping up the version, I can help raise a PR.
    m
    e
    x
    • 4
    • 14
  • b

    Bowen Wan

    09/24/2021, 11:59 PM
    Hi, I am trying to do batch ingestion via Spark to a OFFLINE table. It often crashed the broker when the data is large enough. From what I can tell, larger memory allocated to broker can help. But Is there any documents about how to calculate best settings about what should be right memory allocation, disk size, etc for different type of nodes ? For example for 1T data ?
    m
    • 2
    • 1
  • x

    xtrntr

    09/27/2021, 6:09 AM
    is there an API to update the helix config and verify that the changes have taken place? i’ve looked through the swagger API and
    Copy code
    POST /cluster/configs
    seems to be the most likely candidate, but it’s not working for me
    k
    • 2
    • 13
  • a

    Arpita Bajpai

    09/27/2021, 6:44 AM
    Hi Everyone, I am trying deduplication in apache pinot 0.8.0. I have enabled upsert in my REALTIME table and it is working as expected. But when data moves from realtime to offline table the duplicate data appears in offline table. Is there a way by which I can have deduplication enabled in REALTIMEtoOFFLINE flow, so that my offline table contains only distinct values. Can anyone help me with the same?
    👍 1
    k
    y
    n
    • 4
    • 4
  • x

    xtrntr

    09/27/2021, 9:28 AM
    using a lookup table in a SQL query with group by + having clause is extremely slow, it looks like this:
    Copy code
    # old (<1s)
    SELECT user, count(*) FROM events 
    WHERE time BETWEEN 0 AND 31 AND location BETWEEN 1000 AND 1005 
    GROUP BY user HAVING count(*) > 10
    
    # new (>10s)
    SELECT user, count(*) FROM events 
    WHERE time BETWEEN 0 AND 31 AND location BETWEEN 1000 AND 1005 AND lookUp(...)=0
    GROUP BY user HAVING count(*) > 10
    r
    j
    • 3
    • 16
  • e

    eywek

    09/27/2021, 11:49 AM
    Hello, I’m using RealtimeToOfflineSegmentTask to move segments from REALTIME tables to OFFLINE ones, it works pretty well on some of my tables, but 2 of them are ignored. I can’t find anything in the logs. If I check the value of
    Copy code
    <cluster name>/PROPERTYSTORE/MINION_TASK_METADATA/RealtimeToOfflineSegmentsTask/<table name>
    in Zookeeper, I see that the document hasn’t been updated since August 2nd. And I can’t found any task related to this table in
    Copy code
    <cluster name>/CONFIGS/RESOURCE/TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1632701552537
    How is this possible? How can I reset this task? There is somewhere I can look for infos to debug this and prevent it from happening again? I’m using Pinot 0.7.1 (I plan to upgrade soon but I can’t right now)
    k
    n
    • 3
    • 11
  • s

    Saoirse Amarteifio

    09/27/2021, 12:18 PM
    Hello... I am trying to get started with a batch/offline ingestion
    Copy code
    ../apache-pinot-0.8.0-bin/bin/pinot-ingestion-job.sh -jobSpecFile ./pinot_ingest_samples/batch_ingestion_no_comment.yaml
    Using this file
    Copy code
    executionFrameworkSpec:
      name: 'standalone'
      segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
      segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
      segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
      segmentMetadataPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentMetadataPushJobRunner'
    jobType: SegmentCreationAndUriPush
    inputDirURI: '<s3://data-platform/samples/data/>'
    includeFileNamePattern: 'glob:**/*.parquet'
    outputDirURI: '<s3://pinot-development/sample/segments>'
    overwriteOutput: true
    pinotFSSpecs:
      - scheme: s3
        className: org.apache.pinot.plugin.filesystem.S3PinotFS
        configs:
          region: 'us-east-1'
    recordReaderSpec:
      dataFormat: 'parquet'
      className: 'org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader'
    tableSpec:
      tableName: 'sample'
      schemaURI: '<http://pinot-controller.pinot.cluster.svc.local:9000/tables/sample/schema>'
      tableConfigURI: '<http://pinot-controller.pinot.cluster.svc.local:9000/tables/sample>'
    pinotClusterSpecs:
      - controllerURI: '<http://pinot-controller.pinot.cluster.svc.local:9000>'
    pushJobSpec:
      pushAttempts: 2
      pushRetryIntervalMillis: 1000
      segmentUriPrefix: '<s3://pinot-development>'
      segmentUriSuffix: ''
    But i get a class not found exception
    java.lang.RuntimeException: Failed to create IngestionJobRunner instance for class - org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
    r
    k
    • 3
    • 10
  • w

    Will Gan

    09/28/2021, 12:12 AM
    Hi, I'm trying to query json like done here, i.e. without using functions. I have a JSON column with a json index on it, and I ingest it using a
    jsonFormat
    ingestion transform. When I select the column it shows the json as a string, but when I try to select things like
    json_column[0].name
    it returns empty. Anyone know the issue? Thanks in advance!
    m
    j
    • 3
    • 10
  • y

    Yash Agarwal

    09/28/2021, 4:03 PM
    I am trying to setup a new pinot cluster. I have a zookeeper cluster up. When I try to get the first pinot controller up, it gets up, and then fails with the an error
    Copy code
    Pinot Controller instance [Controller_piclx1001.hq.target.com_9000] is Started...
    Started Pinot [CONTROLLER] instance [Controller_piclx1001.hq.target.com_9000] at 13.884s since launch
    Shutting down Pinot Service Manager with all running Pinot instances...
    Trying to stop Pinot [CONTROLLER] Instance [Controller_piclx1001.hq.target.com_9000] ...
    Stopping controller periodic tasks
    Stopping periodic task scheduler
    .
    .
    Instance piclx1001.hq.target.com_9000 is not leader of cluster PinotCluster due to exception happen when session check
    org.I0Itec.zkclient.exception.ZkInterruptedException: java.lang.InterruptedException
    	at org.apache.helix.manager.zk.zookeeper.ZkClient.retryUntilConnected(ZkClient.java:1192) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-ffcf9b991431067c834bd4fb56fd7641c7fec172]
    	at org.apache.helix.manager.zk.zookeeper.ZkClient.readData(ZkClient.java:1326) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-ffcf9b991431067c834bd4fb56fd7641c7fec172]
    	at org.apache.helix.manager.zk.zookeeper.ZkClient.readData(ZkClient.java:1318) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-ffcf9b991431067c834bd4fb56fd7641c7fec172]
    	at org.apache.helix.manager.zk.ZkBaseDataAccessor.get(ZkBaseDataAccessor.java:320) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-ffcf9b991431067c834bd4fb56fd7641c7fec172]
    .
    .
    Closing zkclient: State:CONNECTED Timeout:30000 sessionid:0x1006d21ecf40000 local:/10.59.116.133:53916 remoteserver:<http://pizlx1002.hq.target.com/10.59.116.124:2181|pizlx1002.hq.target.com/10.59.116.124:2181> lastZxid:17179869383 xid:1154 sent:1157 recv:1199 queuedpkts:0 pendingresp:0 queuedevents:0
    Session: 0x1006d21ecf40000 closed
    j
    • 2
    • 1
  • r

    RZ

    09/28/2021, 6:08 PM
    executing command: AddTable -tableConfigFile /tmp/pinot-quick-start/tools-table-offline.json -schemaFile /tmp/pinot-quick-start/tools-schema.json -controllerProtocol http -controllerHost 192.168.1.105 -controllerPort 9000 -user null -password [hidden] -exec Sending request: http://192.168.1.105:9000/schemas to controller: YD-5CG1182FLG, version: Unknown {"code":400,"error":null}
    x
    • 2
    • 2
  • c

    Carl

    09/29/2021, 12:59 AM
    Hi team, we are seeing some Pinot query with avg function returning -Infinity when the where clause returns no records, is there a way to modify the query to return Null for this case?
    x
    • 2
    • 3
  • m

    Mayank

    09/29/2021, 4:17 AM
    Any errors in server logs? Also check the debug endpoint for any errors
    g
    • 2
    • 5
  • m

    Mayank

    09/29/2021, 4:18 AM
    Another thing to check if events confirm to schema in Pinot
    g
    • 2
    • 1
  • s

    Sandeep Das S

    09/29/2021, 5:13 AM
    Hi guys, I am trying to fetch data from pinot using golang application. After fetching the data from table I have to iterate(for loop) it in order to print the data. Since the efficiency of the application is considerably going down in the above method. Is there any alternate method to get the data without iterating? I am following this link for reference - https://docs.pinot.apache.org/users/clients/golang cc: @Arun Prasath
    m
    s
    k
    • 4
    • 11
  • r

    RZ

    09/29/2021, 10:34 AM
    Hello, I'm trying to load the data from my csv file so i generated the .yml file but this error is showing me on terminal. pls help me if you have any idea about this problem!!
    x
    • 2
    • 7
  • a

    Ali

    09/29/2021, 5:03 PM
    Hi, I'd like to run
    bin/quick-start-hybrid.sh
    with
    pinot.broker.enable.query.limit.override=true
    pinot.broker.query.response.limit=10000
    , how can I do this? I have tried
    bin/quick-start-hybrid.sh -configFileName conf/pinot-broker.conf
    but
    -configFileName
    is not accepted as a valid argument.
    x
    j
    • 3
    • 12
  • g

    Gabriel Lucano

    09/29/2021, 9:53 PM
    Hello, Can I use the same zookeeper service that I use with kafka for pinot? Or is it recommended to have 2 zk service for each one ( pinot and kafka )
    d
    k
    • 3
    • 3
  • s

    Sadim Nadeem

    09/30/2021, 6:46 AM
    @Mayank @Xiang Fu @Jackie pinot-server ram usage is getting increased over time without adding garbage collection params in jvmopts in pinot/values.yaml helm .. before we were using jvmopts like "*jvmOpts: "-Xms256M -Xmx1G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:/opt/pinot/gc-pinot-controller.log"*" but after migrating to jdk 11 with these jvmopts .. the pods started crashing and we have to remove these jvmopts and only using the below jvmopts ie "*jvmOpts: "-Xms2M -Xmx8G  -Xloggc/opt/pinot/gc pinot controller.logjavaagent/opt/pinot/etc/jmx_prometheus_javaagent/jmx_prometheus_javaagent-0.12.0.jar=8008:/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml" - server* " but without using garbage collection params .. we are seeing an increase in pinot-server ram usage over time and ram is getting exhausted every day by more than half gb .. what should be the jvmopts we should provide in helm for jdk11 so that pods dont crash and gc happens properly and heap is free .. also for 16 gb server ram .. what should be xmx value for server pod..
    x
    r
    • 3
    • 35
  • t

    Trust Okoroego

    09/30/2021, 8:47 AM
    Config setting:
    Copy code
    # Pinot Cluster name
    pinot.cluster.name=pinot-qua
    
    # Use hostname as Pinot Instance ID other than IP
    pinot.set.instance.id.to.hostname=true
    
    # Pinot Broker Query Port
    pinot.broker.client.queryPort=8099
    
    # Pinot Routing table builder class
    pinot.broker.routing.table.builder.class=random
    m
    • 2
    • 1
  • a

    Arpita Bajpai

    09/30/2021, 12:31 PM
    Hi All, I am trying to enable "UPSERT" mode in REALTIME table config in pinot 0.8.0 and the table is not able to read the records send to kafka topic. No results are displayed in PINOT UI at all, it shows 0 records. Below is the config I added for Upsert: "routing": { "instanceSelectorType": "strictReplicaGroup" }, "upsertConfig": { "mode": "FULL" }, I could not find anything significant in the controller logs as well. But when I remove the UPSERT config and tried, then my RealTime Table is able to read the records and getting displayed in Pinot UI. Any idea why is this happening?
    d
    g
    +2
    • 5
    • 10
  • a

    Amol Jain

    10/01/2021, 7:47 AM
    Hii Pinot Team, my pinot cluster is running inside docker container. I want to monitor the Pinot cluster with Prometheus and for that I have tried to configure Prometheus JMX Exporter inside pinot-controller.conf , pinot-broker.conf and pinot-server.conf respectively like -    controller.jvmOpts= "-javaagent/opt/pinot/etc/jmx prometheus javaagent/jmx prometheus javaagent 0.12.0.jar=8008/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml -Xms256M -Xmx1G"    broker.jvmOpts= "-javaagent/opt/pinot/etc/jmx prometheus javaagent/jmx prometheus javaagent 0.12.0.jar=8008/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml -Xms256M -Xmx1G"    server.jvmOpts= "-javaagent/opt/pinot/etc/jmx prometheus javaagent/jmx prometheus javaagent 0.12.0.jar=8008/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml -Xms256M -Xmx1G" But unable to get the metrics. what should I do? Kindly help. @Mayank
    m
    • 2
    • 1
  • g

    Gabriel Lucano

    10/01/2021, 4:21 PM
    schema table config.txt,upsert_log.txt
    schema table config.txtupsert_log.txt
    m
    • 2
    • 4
  • q

    Qianbo Wang

    10/02/2021, 12:16 AM
    Hi pinot team, I’m getting this error
    Catalog 'pinot' does not support table property 'time_field'
    when creating table with this query:
    Copy code
    CREATE IF NOT EXIST ...
    WITH (
    pinot_table_name = 'enriched_invoices',
    time_field = 'created_at',
    offline_replication = 3,
    offline_retention = 365,
    index_inverted = ARRAY['licensee_id','facility_id'],
    index_bloom_filter = ARRAY['licensee_id','facility_id'],
    index_sorted = 'created_at',
    index_aggregate_metrics = true,
    index_create_during_segment_generation = true,
    index_auto_generated_inverted = false,
    index_enable_default_star_tree = false);
    m
    • 2
    • 4
  • m

    Manish Soni

    10/04/2021, 3:47 AM
    Hi All, I am running a hybrid table setup. If I delete a table for some reason and recreate the table with same name, should the Minion be re-started to make the realtime to offline working for this newly created table with same name?
    m
    • 2
    • 4
  • v

    Vibhor Jain

    10/04/2021, 4:06 AM
    Hi Team, We are planning to add a de-duplication flow for our solution. As part of that, we are doing 2 things: 1. Enable UPSERT in realtime table. - Flink has the key set. - Primary key defined in schema and realtime table using it. - UPSERT working fine. 2. For realtime to offline flow via minion, we found that these duplicates were coming in OFFLINE table. - so we tried with mergeType: dedup (earlier it was concat) Now, the realtime to offline flow has stopped working (no data in OFFLINE table, minion is up and running) Queries: -------- 1. Is our dedup flow proper? UPSERT for realtime and mergeType: dedup for realtime to offline flow? 2. Any pointers around why this realtime to offline flow stopped working after adding these configs?
    m
    n
    j
    • 4
    • 11
  • e

    eywek

    10/04/2021, 9:15 AM
    Hello, I’m looking to enable segment replication on my cluster and I found 2 way of do it in the documentation: • https://docs.pinot.apache.org/configuration-reference/table#segments-config with the
    replication
    key • https://docs.pinot.apache.org/operators/operating-pinot/tuning/routing#replica-group-segment-assignment-and-query-routing with replica group What is the better way to do it? Maybe there is something I misunderstood? Thank you
    n
    • 2
    • 2
  • a

    Arpita Bajpai

    10/04/2021, 11:08 AM
    Hi All, I am trying to to move data from REALTIME to OFFLINE table through minion flow but I am getting the below error: Starting PinotTaskManager with running frequency of 1200 seconds. Start running task: PinotTaskManager Trying to schedule task type: RealtimeToOfflineSegmentsTask, isLeader: true Start generating task configs for table: dataanalytics_REALTIME for task: RealtimeToOfflineSegmentsTask Caught exception while running task: PinotTaskManager java.lang.IllegalStateException: null at shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:429) ~[pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-52272667e51acdf082b90766673dfcb77f6f9cc0] at org.apache.pinot.plugin.minion.tasks.realtime_to_offline_segments.RealtimeToOfflineSegmentsTaskGenerator.getWatermarkMs(RealtimeToOfflineSegmentsTaskGenerator.java:300) ~[pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-52272667e51acdf082b90766673dfcb77f6f9cc0] at org.apache.pinot.plugin.minion.tasks.realtime_to_offline_segments.RealtimeToOfflineSegmentsTaskGenerator.generateTasks(RealtimeToOfflineSegmentsTaskGenerator.java:161) ~[pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-52272667e51acdf082b90766673dfcb77f6f9cc0] at org.apache.pinot.controller.helix.core.minion.PinotTaskManager.scheduleTask(PinotTaskManager.java:405) ~[pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-52272667e51acdf082b90766673dfcb77f6f9cc0] at org.apache.pinot.controller.helix.core.minion.PinotTaskManager.scheduleTasks(PinotTaskManager.java:383) ~[pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-52272667e51acdf082b90766673dfcb77f6f9cc0] at org.apache.pinot.controller.helix.core.minion.PinotTaskManager.processTables(PinotTaskManager.java:477) ~[pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-52272667e51acdf082b90766673dfcb77f6f9cc0] at org.apache.pinot.controller.helix.core.periodictask.ControllerPeriodicTask.runTask(ControllerPeriodicTask.java:68) ~[pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-52272667e51acdf082b90766673dfcb77f6f9cc0] at org.apache.pinot.core.periodictask.BasePeriodicTask.run(BasePeriodicTask.java:120) ~[pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-52272667e51acdf082b90766673dfcb77f6f9cc0] at org.apache.pinot.core.periodictask.PeriodicTaskScheduler.lambda$start$0(PeriodicTaskScheduler.java:85) ~[pinot-all-0.8.0-SNAPSHOT-jar-with-dependencies.jar:0.8.0-SNAPSHOT-52272667e51acdf082b90766673dfcb77f6f9cc0] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) [?:?] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) [?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at java.lang.Thread.run(Thread.java:834) [?:?] Finish running task: PinotTaskManager in 126ms Creating new stream consumer, reason: Idle for too long Can anyone help me with the same to understand the above error?
    n
    • 2
    • 2
  • b

    beerus

    10/04/2021, 2:07 PM
    How to do query on pinot's broker with increased timeout ?
    d
    j
    • 3
    • 2
  • g

    Gabriel Lucano

    10/04/2021, 7:38 PM
    Hello guys, how do i create a multitenant cluster for my production environment. It gives me the following error
    Copy code
    Executing command: AddTenant -controllerProtocol http -controllerHost localhost -controllerPort 9001 -name krealoBrokerTenant -role BROKER -instanceCount 3 -offlineInstanceCount 0 -realTimeInstanceCount 3 -exec
    {"_code":500,"_error":"Failed to create tenant"}
    {"_code":500,"_error":"Failed to create tenant"}
    m
    • 2
    • 8
  • j

    Jozef Cechovsky

    10/05/2021, 11:07 AM
    Hi there, any help please? I’m struggling to connect external Kafka to Pinot. I have Kafka deployed in one Kubernetes cluster and Pinot in separated one. I’m 100% sure that the communication between these two clusters are correct. Deployment of Pinot is done by this tutorial https://docs.pinot.apache.org/basics/getting-started/kubernetes-quickstart I created Kafka topics, Pinot table and schema according this https://docs.pinot.apache.org/basics/getting-started/pushing-your-streaming-data-to-pinot and just changed config pointing to our Kafka brokers "tableIndexConfig": { "loadMode": "MMAP", "streamConfigs": { "streamType": "kafka", "stream.kafka.consumer.type": "lowlevel", "stream.kafka.topic.name": "transcript-topic", "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder", "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory", "stream.kafka.broker.list": "our_kafka_url:9092", "realtime.segment.flush.threshold.size": "0", "realtime.segment.flush.threshold.time": "24h", "realtime.segment.flush.desired.size": "50M", "stream.kafka.consumer.prop.auto.offset.reset": "smallest" } } But I still do not see any data in my tables. I’m using Kafka version 2.6.2 Or does Pinot work just with Kafka deployed together with it with usage of the same Zookeeper? I tried to set stream.kafka.zk.broker.url to our Zookeeper but still without success. Thanks a lot
    m
    • 2
    • 1
1...232425...166Latest