https://pinot.apache.org/ logo
Join Slack
Powered by
# troubleshooting
  • a

    Abhishek Tanwade

    07/08/2022, 4:23 AM
    Hello everyone, can anyone share some documentation on loading data to Pinot table? Apache pinot deployed on Azure Kubernetes service.
    m
    • 2
    • 1
  • a

    Alexander Vivas

    07/08/2022, 10:01 AM
    good day everyone, does anyone know if pinot has segment backwards compatibility? I’ve been running pinot 0.6.0 and I am thinking of an upgrade to the latest version available which seems to be 0.10.0, do you guys have that sort of migration guide?
    m
    • 2
    • 3
  • a

    Abdullah Jaffer

    07/08/2022, 10:24 AM
    I have this table config that needs to ingest data from orc files saved in S3, it it's not ingesting any data
    Copy code
    {
      "OFFLINE": {
        "tableName": "sales_by_order_OFFLINE",
        "tableType": "OFFLINE",
        "segmentsConfig": {
          "schemaName": "sales_by_order",
          "retentionTimeUnit": "DAYS",
          "retentionTimeValue": "10000",
          "replication": "2",
          "segmentPushFrequency": "HOURLY",
          "segmentPushType": "REFRESH",
          "replicasPerPartition": "1"
        },
        "tenants": {
          "broker": "DefaultTenant",
          "server": "DefaultTenant"
        },
        "tableIndexConfig": {
          "invertedIndexColumns": [],
          "noDictionaryColumns": [],
          "rangeIndexVersion": 2,
          "autoGeneratedInvertedIndex": false,
          "createInvertedIndexDuringSegmentGeneration": false,
          "sortedColumn": [],
          "bloomFilterColumns": [],
          "loadMode": "MMAP",
          "onHeapDictionaryColumns": [],
          "varLengthDictionaryColumns": [],
          "enableDefaultStarTree": false,
          "enableDynamicStarTreeCreation": false,
          "aggregateMetrics": false,
          "nullHandlingEnabled": false,
          "rangeIndexColumns": []
        },
        "metadata": {},
        "quota": {},
        "task": {
          "taskTypeConfigsMap": {
            "SegmentGenerationAndPushTask": {
              "schedule": "0 * * * * ?",
              "tableMaxNumTasks": "28"
            }
          }
        },
        "routing": {},
        "query": {},
        "ingestionConfig": {
          "batchIngestionConfig": {
            "batchConfigMaps": [
              {
                "input.fs.className": "org.apache.pinot.plugin.filesystem.S3PinotFS",
                "input.fs.prop.region": "ap-southeast-1",
                "inputDirURI": "s3 link",
                "includeFileNamePattern": "glob:**/*.orc",
                "excludeFileNamePattern": "glob:**/*.tmp",
                "inputFormat": "orc"
              }
            ],
            "segmentIngestionType": "REFRESH",
            "segmentIngestionFrequency": "HOURLY"
          }
        },
        "isDimTable": false
      }
    }
    n
    • 2
    • 4
  • k

    Kevin Liu

    07/08/2022, 5:22 PM
    Hi folks. I have two questions? 1. Why is RealtimeToOfflineSegmentsTask executed in a single thread, and it is easy to time out due to a large amount of data. Are there any restrictions? 2. Is there any API for converting segment to record (GenericRow) directly from s3?
    m
    x
    • 3
    • 6
  • a

    Alice

    07/10/2022, 2:44 PM
    Hi team, I’m using lookup function according to this doc https://docs.pinot.apache.org/users/user-guide-query/lookup-udf-join. But query result shows No Record(s) found. I’ve set isDimTable=true and primaryKeyColumns(error_category) in my offline table config. Here is my query. select error_category ,lookUp(dim_table_name, insight_id, error_category, error_category) insight_id from fact_table_name I think I’m not using lookUp function correctly because query without lookUp function, like select error_category from fact_table_name, could return some records. Could somebody be aware of how to config lookUp?
    r
    • 2
    • 3
  • m

    Marlon Félix

    07/11/2022, 4:41 PM
    Hello everyone! I'm writing an article for medium of a real case implemantation of Apache Pinot as a way of studying. For that, I'm using a Strimzi cluster and the Twitter Api Kafka Connector (that you can find at https://www.confluent.io/hub/jcustenborder/kafka-connect-twitter) running on Minikube, to get data from twitter's api and ingest it into Pinot. I followed the steps explained in this video

    https://www.youtube.com/watch?v=Jc03u8rXc2w▾

    making some adaptations to run on kubernetes. That way I was able to infer the schema of the "twitter-sample.json" file attached to this message by generating the schema file "twitter-old-schema.json", after that I had to remove some fields: "schema.type", "schema.fields", "schema.optional", "schema.name", and remove the prefix "payload." of every column to generate the file "twitter-schema.json". Then with this schema file and with the table config file "twitter-config.json" I created the REALTIME table "twitter-status-events" (using the column "CreatedAt" as datetime column) using pinot-admin.sh inside pinot controller's pod. But for some reason that I don't know I'm not getting any record in this table, do you have any idea what I'm doing wrong ? (more information replied to this comment)
  • m

    Marlon Félix

    07/11/2022, 4:43 PM
    Hello everyone! I'm writing an article for medium of a real case implementation of Apache Pinot as a way of studying. For that, I'm using a Strimzi cluster and the Twitter Api Kafka Connector (that you can find at https://www.confluent.io/hub/jcustenborder/kafka-connect-twitter) running on Minikube, to get data from twitter's api and ingest it into Pinot. I followed the steps explained in this video

    https://www.youtube.com/watch?v=Jc03u8rXc2w▾

    making some adaptations to run on kubernetes. That way I was able to infer the schema of the "twitter-sample.json" file attached to this message by generating the schema file "twitter-old-schema.json", after that I had to remove some fields: "schema.type", "schema.fields", "schema.optional", "schema.name", and remove the prefix "payload." of every column to generate the file "twitter-schema.json". Then with this schema file and with the table config file "twitter-config.json" I created the REALTIME table "twitter-status-events" (using the column "CreatedAt" as datetime column) using pinot-admin.sh inside pinot controller's pod. But for some reason that I don't know I'm not getting any record in this table, do you have any idea what I'm doing wrong ? (more information replied to this comment)
    twitter-schema.jsontwitter-sample.jsontwitter-old-schema.jsontwitter-config.json
    m
    • 2
    • 11
  • h

    harnoor

    07/11/2022, 8:10 PM
    Hi, Do we have this feature https://github.com/apache/pinot/pull/6120#issue-717507183 documented on pinot docs? (couldn’t find) Is this recommended to speedup regexp queries?
    x
    a
    • 3
    • 4
  • t

    troywinter

    07/12/2022, 8:15 AM
    Hi all, I got this exception using trino with version 389 and pinot version 0.9.3, how should I resolve this?
    Copy code
    io.grpc.StatusRuntimeException: UNKNOWN
    	at io.grpc.Status.asRuntimeException(Status.java:535)
    	at io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:648)
    	at io.trino.plugin.pinot.client.PinotGrpcDataFetcher$PinotGrpcServerQueryClient$ResponseIterator.computeNext(PinotGrpcDataFetcher.java:266)
    	at io.trino.plugin.pinot.client.PinotGrpcDataFetcher$PinotGrpcServerQueryClient$ResponseIterator.computeNext(PinotGrpcDataFetcher.java:253)
    	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:146)
    	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:141)
    	at io.trino.plugin.pinot.client.PinotGrpcDataFetcher.endOfData(PinotGrpcDataFetcher.java:85)
    	at io.trino.plugin.pinot.PinotSegmentPageSource.getNextPage(PinotSegmentPageSource.java:114)
    	at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:311)
    	at io.trino.operator.Driver.processInternal(Driver.java:410)
    	at io.trino.operator.Driver.lambda$process$10(Driver.java:313)
    	at io.trino.operator.Driver.tryWithLock(Driver.java:698)
    	at io.trino.operator.Driver.process(Driver.java:305)
    	at io.trino.operator.Driver.processForDuration(Driver.java:276)
    	at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:740)
    	at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
    	at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:488)
    	at io.trino.$gen.Trino_389____20220712_080400_2.run(Unknown Source)
    	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    	at java.base/java.lang.Thread.run(Thread.java:829)
    x
    • 2
    • 22
  • h

    Harish Bohara

    07/12/2022, 9:35 AM
    Anyone know how to extract data from nested json: Not sure how to extract “data.device” and put it in “device” column.
    Copy code
    Event coming in kafka:
    {
      "user_id": "1234",
      "data": {
        "device": "abcd"
      }
    }
    
    
    Schema I need for table:
    {
      {
        "name": "user_id",
        "dataType": "STRING"
      },
      {
        "name": "device",
        "dataType": "STRING"
      },
    }
    🟢 1
    f
    • 2
    • 5
  • a

    Alice

    07/12/2022, 11:14 AM
    Hi team, could you help see what’s going on here? I set “replicasPerPartition”: “2" in my table config and assign this table tenant_a(server 6, server-7). Due to limited resource, I migrated this table to tenant_b(server-9). Then one segment has the following status. Based on previous experience, I think data migration needs some time, the segment status will recover good soon. But it seems it’s stuck here this time. Is there anything I can do to fix it?
    m
    • 2
    • 4
  • a

    Alice

    07/12/2022, 11:44 AM
    Hi, some segments of one pinot table is bad status. When I call /tables/{realtimeTableName}/consumingSegmentsInfo api to see segment consuming info, I found these segments have no consuming info. What’s possible reasons for this error?
    m
    • 2
    • 9
  • e

    Eaugene Thomas

    07/13/2022, 11:30 AM
    Hi , I was referring

    https://youtu.be/cNnwMF0pOJ8▾

    for playing around with Pinot setup & data ingestion . When querying, the result between OFFLINE + REALTIME vs combined select query for a table is differing . Can someone help me with some insights on the reasons for this ?
    j
    • 2
    • 2
  • s

    Stuart Millholland

    07/13/2022, 4:22 PM
    Has anyone had trouble configuring ingestion aggregations via the UI? I add the section and it saves, but it doesn't stick.
    x
    m
    • 3
    • 24
  • a

    Ashish

    07/13/2022, 7:22 PM
    Is there a behavior change between 0.9.0 and 0.10.0 related to kafka client? In 0.9.0, we should see pinot committing the offset but 0.10.0, pinot never commits the offset to kafka and consumer lag keeps growing.
    m
    l
    • 3
    • 8
  • h

    Harish Bohara

    07/13/2022, 8:18 PM
    Does anyone know how to only store only single row per day (or per hour) if all the columns are same for a given row? - I get 30-50M rows per day where unique row combinations are < 1000. I want to store one unique combination for each hour. Yes the same row can repeat but in next hour --------------------------------------- e.g. if there are 3 row in 1 hour col_1, col_2, col_3, hour_1 col_1, col_2, col_3, hour_1 col_1, col_200, col_3, hour_1 Rows in DB should be for each hour col_1, col_2, col_3, hour_1 col_1, col_200, col_3, hour_1 e.g. if there are 3 row in 1 hour col_1, col_2, col_3, hour_1 col_1, col_2, col_3, hour_1 col_1, col_200, col_3, hour_1 col_1, col_2, col_3, hour_2 Rows in DB should be for each hour col_1, col_2, col_3, hour_1 col_1, col_200, col_3, hour_2 col_1, col_2, col_3, hour_2
    m
    j
    s
    • 4
    • 9
  • a

    André Siefken

    07/14/2022, 8:19 AM
    Hi folks, quick question: using the pinot-java-client with broker-list
    JsonAsyncHttpPinotClientTransport
    am I supposed to reuse a single
    Connection
    across all query requests, or create a new
    Connection
    from the
    ConnectionFactory
    for each query? Or in other words, is the http connection pool held by the
    Connection
    instance, or the
    ConnectionFactory
    ?
    s
    • 2
    • 2
  • d

    Deepika Eswar

    07/14/2022, 11:13 AM
    does pinot support connecting to Tableau for reporting?
    l
    • 2
    • 1
  • e

    Ethan Yu

    07/14/2022, 8:33 PM
    Hi, so I'm trying to run pinot on a kubernetes cluster and to ingest realtime data from kafka. However, when I try to ingest data pinot seems to fail and stop ingesting data at a specific point. I tested this by running two different pinot kubernetes clusters at the same and having both ingest from kafka at the same time, yet they both also seemed to stop at around exactly the same time. If I run pinot on an individual machine it seems to work but for some reason it does not on kubernetes. The config Im running for pinot is 3 controllers, 30 servers, 1 minion, and 3 zookeepers.
    k
    h
    m
    • 4
    • 5
  • a

    Alice

    07/15/2022, 3:57 AM
    Hi team, I’ve a question about startree index. If I add more columns in dimensionsSplitOrder in startree index config, and restart servers, will the existing segments recreate startree index based on new startree index config? Or just new segments will create startree index based on new startree config?
    m
    k
    • 3
    • 14
  • j

    Jacob M

    07/17/2022, 3:50 PM
    hi! in the past, i've always had a primary key where i'm doing some equality filtering in a
    where
    clause and have used
    segmentPartitionConfig
    and
    bloomFilterColumns
    to make sure i'm really only querying a single segment & single server. i'm trying to configure a table to support queries that don't necessarily have any equality clause in the
    where
    but will always have a time clause, like
    where created > X
    . i've noticed all my queries hit all the servers and all the segments. am i doing something wrong? i thought time columns had some special handling maybe! (if it helps, this is an offline table)
    m
    • 2
    • 3
  • c

    chandarasekaran m

    07/18/2022, 4:02 AM
    Hi Team, How I can parse kafka header(in bytes) and filter based on specific field ? any code samples?
    p
    m
    • 3
    • 12
  • k

    Kevin Liu

    07/18/2022, 8:11 AM
    Hi folks,
    Copy code
    GenericRowFileWriter class:
    
      /**
       * Writes the given row into the files.
       */
      public void write(GenericRow genericRow)
          throws IOException {
        _offsetStream.writeLong(_nextOffset);
        byte[] bytes = _serializer.serialize(genericRow);
        _dataStream.write(bytes);
        _nextOffset += bytes.length;
      }
    I use GenericRowFileWriter to write GenericRow to the record.data file. It takes several hours to write more than 20 million data. Why is it so slow to write?
    m
    • 2
    • 4
  • s

    shivam

    07/18/2022, 12:37 PM
    We are getting this error on one of our brokers,
    Copy code
    {
      "id": "10008ad94fa0022__brokerResource",
      "simpleFields": {},
      "mapFields": {
        "HELIX_ERROR     20220718-101026.000050 STATE_TRANSITION 0dc81776-5d32-452b-8c82-ae66fd33a5e6": {
          "AdditionalInfo": "Exception while executing a state transition task span_event_view_REALTIMEjava.lang.reflect.InvocationTargetException\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.base/java.lang.reflect.Method.invoke(Method.java:566)\n\tat org.apache.helix.messaging.handling.HelixStateTransitionHandler.invoke(HelixStateTransitionHandler.java:404)\n\tat org.apache.helix.messaging.handling.HelixStateTransitionHandler.handleMessage(HelixStateTransitionHandler.java:331)\n\tat org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:97)\n\tat org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: java.lang.IllegalStateException: Failed to find table config for table: span_event_view_REALTIME\n\tat shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:518)\n\tat org.apache.pinot.broker.routing.RoutingManager.buildRouting(RoutingManager.java:304)\n\tat org.apache.pinot.broker.broker.helix.BrokerResourceOnlineOfflineStateModelFactory$BrokerResourceOnlineOfflineStateModel.onBecomeOnlineFromOffline(BrokerResourceOnlineOfflineStateModelFactory.java:80)\n\t... 12 more\n",
          "Class": "class org.apache.helix.messaging.handling.HelixStateTransitionHandler",
          "MSG_ID": "dfc7f986-0406-49fc-b6f4-5101630efb17",
          "Message state": "READ"
        },
        "HELIX_ERROR     20220718-101026.000081 STATE_TRANSITION e65c8151-76d4-4267-83ad-48dabdd66eae": {
          "AdditionalInfo": "Message execution failed. msgId: dfc7f986-0406-49fc-b6f4-5101630efb17, errorMsg: java.lang.reflect.InvocationTargetException",
          "Class": "class org.apache.helix.messaging.handling.HelixStateTransitionHandler",
          "MSG_ID": "dfc7f986-0406-49fc-b6f4-5101630efb17",
          "Message state": "READ"
        }
      },
      "listFields": {}
    }
    Quick fix: We have restarted our brokers. but still not clear what went wrong, Need help! //@harnoor
    m
    • 2
    • 1
  • s

    Stuart Coleman

    07/18/2022, 1:40 PM
    hi - we have two pinot tables both consuming from the same Kafka topic. Both are using the low level consumer. One is a hybrid table and one is a realtime only table. We have an issue where one record is missing from the realtime table but is present in the hybrid table. We have looked in the logs and can see no Warn or Error messages at the time the record was lost. The only log of interest is that we have an idle consumer at that time and the stream is recreated. Are there any known scenarios in which message loss is possible?
    d
    m
    +2
    • 5
    • 32
  • p

    Priyank Bagrecha

    07/18/2022, 5:25 PM
    Hello, we have a use case for an offline table but we don't have a time column for segment config. What is the suggested route if there is one? Thank you!
    l
    m
    • 3
    • 9
  • a

    Abhijeet Kushe

    07/18/2022, 6:30 PM
    My hope was to the see existing segments being repartitioned via accountId on different instances and the number or replicasPerPartition to increase to 3 from 1.However I did not seem any changes in the existing segments not did I see change in replicas not did I see any new props added to the segments https://docs.pinot.apache.org/operators/operating-pinot/tuning/routing
    Copy code
    column.accountId.partitionFunction = Module
    column.accountId.numPartitions = 4
    column.accountId.partitionValues = 1
  • m

    Mayank

    07/18/2022, 6:31 PM
    You have replication of 1, and you are also requesting 1 replica to be up at all time, so rebalancer won’t be able to work.
  • m

    Mayank

    07/18/2022, 6:32 PM
    Also, recommend to use Murmur rather than modulo
  • a

    Abhijeet Kushe

    07/18/2022, 6:34 PM
    I see sure I can use Murmur.. I also tried to call the rebalance endpoint with
    minAvailableReplicas: 0
    but I now get the following message
    Copy code
    Instance reassigned, table is already balanced
    m
    • 2
    • 9
1...484950...166Latest