https://pinot.apache.org/ logo
Join Slack
Powered by
# troubleshooting
  • p

    Pankaj Thakkar

    12/15/2021, 8:01 PM
    hey guys, we are ingesting data from a kafka; it seeems pinot is stuck in a loop; it complains about not finding the right offset and does not consume the data that is sitting in kafka.
    m
    s
    • 3
    • 28
  • t

    Tiger Zhao

    12/15/2021, 8:15 PM
    Will pinot automatically delete segments from S3 when a table is deleted? Based on what I've seen, it looks like deleting a realtime table will eventually delete the segments associated with that table. But I'm not sure if this behavior is also true for offline tables where the segments are generated and uploaded with SegmentCreationAndMetadataPush?
    m
    • 2
    • 2
  • p

    Priyank Bagrecha

    12/15/2021, 10:05 PM
    i started using v0.9.1 and keep running into these errors every couple of hours
    Copy code
    2021/12/15 19:09:59.854 ERROR [GroupCommit] [HelixTaskExecutor-message_handle_STATE_TRANSITION] Interrupted while committing change, key: /pinot-poc/INSTANCES/Server_10.220.12.85_8098/CURRENTSTATES/100000abfb404c3/km_mp_play_startree_REALTIME, record: km_mp_play_startree_REALTIME, {}{}{}
    java.lang.InterruptedException: null
            at java.lang.Object.wait(Native Method) ~[?:?]
            at org.apache.helix.GroupCommit.commit(GroupCommit.java:163) [pinot-all-0.9.1-jar-with-dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961]
            at org.apache.helix.manager.zk.ZKHelixDataAccessor.updateProperty(ZKHelixDataAccessor.java:189) [pinot-all-0.9.1-jar-with-dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961]
            at org.apache.helix.manager.zk.ZKHelixDataAccessor.updateProperty(ZKHelixDataAccessor.java:177) [pinot-all-0.9.1-jar-with-dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961]
            at org.apache.helix.messaging.handling.HelixStateTransitionHandler.preHandleMessage(HelixStateTransitionHandler.java:164) [pinot-all-0.9.1-jar-with-dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961]
            at org.apache.helix.messaging.handling.HelixStateTransitionHandler.handleMessage(HelixStateTransitionHandler.java:330) [pinot-all-0.9.1-jar-with-dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961]
            at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:97) [pinot-all-0.9.1-jar-with-dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961]
            at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49) [pinot-all-0.9.1-jar-with-dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961]
            at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
            at java.lang.Thread.run(Thread.java:829) [?:?]
    m
    • 2
    • 12
  • n

    Nicholas Yu

    12/16/2021, 2:17 AM
    i’m trying to run a batch ingest job using aws emr. i’m running a spark submit step like so
    Copy code
    spark-submit 
    --master yarn
    --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand 
    s3://<bucket_name>/lib/pinot-all-0.9.0-jar-with-dependencies.jar -jobSpecFile s3://<bucket_name>/jobs/<table_name>/job.yaml
    but i’m getting a java.lang.NoSuchMethodException: org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.main
    x
    c
    a
    • 4
    • 9
  • a

    Anish Nair

    12/16/2021, 8:02 AM
    Hi team, I have set follwoing flush config for realtime table: "realtime.segment.flush.threshold.size": "0", "realtime.segment.flush.threshold.segment.size": "500M", "realtime.segment.flush.threshold.rows": "0", "realtime.segment.flush.threshold.time": "24h", But still the consuming segment is showing , threshold size= 100000. is this expected?
    Copy code
    {
      "segment.creation.time": "1639641347711",
      "segment.flush.threshold.size": "100000",
      "segment.name": "max_reporting_aggregations__0__0__20211216T0755Z",
      "segment.realtime.numReplicas": "2",
      "segment.realtime.startOffset": "598962485",
      "segment.realtime.status": "IN_PROGRESS",
      "segment.table.name": "max_reporting_aggregations",
      "segment.type": "REALTIME"
    }
    p
    n
    • 3
    • 4
  • a

    Ali Atıl

    12/16/2021, 10:59 AM
    Hi Everyone, Is there a way to indicate the actual data field for records read from kafka ? For example if I had records in Kafka like in below, would it be possible for Pinot to extract actual records from "employee" nested field?
    Copy code
    {
      "data": {
        "employee": {
          "name": "ali",
          "salary": 56000,
          "married": true,
          "messageTime": 1639652167
        }
      }
    }
    Example schema:
    Copy code
    {
      "schemaName": "employee",
      "dimensionFieldSpecs": [
        {
          "name": "name",
          "dataType": "STRING"
        },
        {
          "name": "salary",
          "dataType": "DOUBLE"
        },
        {
          "name": "married",
          "dataType": "BOOLEAN"
        }
      ],
      "metricFieldSpecs": [],
      "dateTimeFieldSpecs": [
        {
          "name": "messageTime",
          "dataType": "LONG",
          "format": "1:MILLISECONDS:EPOCH",
          "granularity": "1:MILLISECONDS"
        }
      ]
    }
    r
    • 2
    • 8
  • m

    Map

    12/16/2021, 12:27 PM
    has this setting 
    pinot.set.instance.id.to.hostname
     changed in recent versions? Dones’t seem to be working any more and IPs are used instead
    • 1
    • 2
  • j

    Jonathan Meyer

    12/16/2021, 4:33 PM
    Hello 👋 Pinot's documentation for upgrading (https://docs.pinot.apache.org/operators/operating-pinot/upgrading-pinot-cluster) indicates that the recommended upgrade order of components is: 1. Controller 2. Broker 3. Server 4. Minion Using the official Pinot Helm chart, is there any way to enforce this order ? It seems like all components try to upgrade at the same time - since there is only a single configuration tag / version in the values
    m
    x
    • 3
    • 6
  • t

    Tiger Zhao

    12/16/2021, 4:53 PM
    Is it possible to rename an existing table?
    s
    m
    • 3
    • 5
  • l

    Luis Fernandez

    12/16/2021, 9:44 PM
    I’m running into this issue where the servers are starting to ingest many and many records ( a spike on ingestion) but when i see the throughput from a kafkastreams perspective i don’t see any spikes, and then weirdly enough i see the servers restarting for some reason, while checking the logs I do see this:
    Copy code
    Caught exception while processing query: QueryContext{_tableName='ads_metrics_REALTIME', _selectExpressions=[listing_id, sum(click_count), sum(impression_count), sum(cost), sum(order_count), sum(revenue)], _aliasList=[null, null, null, null, null, null], _filter=(shop_id = '25746445' AND serve_time BETWEEN '1637125200' AND '1639717199'), _groupByExpressions=[listing_id], _havingFilter=null, _orderByExpressions=null, _limit=6000, _offset=0, _queryOptions={responseFormat=sql, groupByMode=sql, timeoutMs=9999}, _debugOptions=null, _brokerRequest=BrokerRequest(querySource:QuerySource(tableName:ads_metrics_REALTIME), pinotQuery:PinotQuery(dataSource:DataSource(tableName:ads_metrics_REALTIME), selectList:[Expression(type:IDENTIFIER, identifier:Identifier(name:listing_id)), Expression(type:FUNCTION, functionCall:Function(operator:SUM, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:click_count))])), Expression(type:FUNCTION, functionCall:Function(operator:SUM, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:impression_count))])), Expression(type:FUNCTION, functionCall:Function(operator:SUM, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:cost))])), Expression(type:FUNCTION, functionCall:Function(operator:SUM, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:order_count))])), Expression(type:FUNCTION, functionCall:Function(operator:SUM, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:revenue))]))], filterExpression:Expression(type:FUNCTION, functionCall:Function(operator:AND, operands:[Expression(type:FUNCTION, functionCall:Function(operator:EQUALS, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:shop_id)), Expression(type:LITERAL, literal:<Literal longValue:25746445>)])), Expression(type:FUNCTION, functionCall:Function(operator:BETWEEN, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:serve_time)), Expression(type:LITERAL, literal:<Literal longValue:1637125200>), Expression(type:LITERAL, literal:<Literal longValue:1639717199>)]))])), groupByList:[Expression(type:IDENTIFIER, identifier:Identifier(name:listing_id))], orderByList:[], limit:6000, queryOptions:{responseFormat=sql, groupByMode=sql, timeoutMs=9999}))}
    java.lang.ArrayIndexOutOfBoundsException: null
    this started happening out of nowhere so i’m unsure as to what’s happening has anyone gotten this similar kind of error? I also don’t have the stack trace sadly i don’t know why it isn’t being logged 😞 also in general our p99 response times have been impacted
    e
    x
    • 3
    • 30
  • e

    Elon

    12/16/2021, 10:04 PM
    Does anyone here use pool based routing? Is there any advantage to using 3 pools with 4 replica groups vs using 2 pools with 6 replica groups? i.e. does it make sense to have more than 1 pool?
    j
    • 2
    • 9
  • e

    Elon

    12/16/2021, 11:15 PM
    Question about split commit: we enabled the config on the controller, server and table (i.e.
    peerSegmentDownloadScheme
    = 'http'), and see
    "isSplitCommitType":true
    log messages in the server but noticed the controller still contains those segments in the temp directory
    /var/pinot/controller/data/untarredFileTemp
    - does that indicate we don't have something configured incorrectly?
    m
    • 2
    • 43
  • z

    Zsolt Takacs

    12/17/2021, 2:26 PM
    We had to rush the upgrade to 0.9.1 because of the log4shell issue and ran into this change afterwards: https://github.com/apache/pinot/pull/7523 We are using the realtimeToOffline task with an upsert based table and using a rollup based on our business rules to get the same results in the offline segments. This change makes it impossible to keep the same config if we want to change the table config.
    ➕ 1
    m
    r
    x
    • 4
    • 7
  • p

    Priyank Bagrecha

    12/17/2021, 6:41 PM
    exception_server_logs.txt
    exception_server_logs.txt
    m
    • 2
    • 13
  • t

    Tao Hu

    12/17/2021, 6:44 PM
    Hi team, I tried to create a text index in the sample table dimBaseballTeams on teamName column, but when I use text match query, I got
    Copy code
    {
      "message": "QueryExecutionError:\njava.lang.NullPointerException\n\tat org.apache.pinot.core.operator.filter.TextMatchFilterOperator.getNextBlock(TextMatchFilterOperator.java:45)\n\tat org.apache.pinot.core.operator.filter.TextMatchFilterOperator.getNextBlock(TextMatchFilterOperator.java:30)\n\tat org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:49)\n\tat org.apache.pinot.core.operator.DocIdSetOperator.getNextBlock(DocIdSetOperator.java:62)",
      "errorCode": 200
    }
    Here is my table config
    Copy code
    {
      "tableName": "dimBaseballTeams",
      "tableType": "OFFLINE",
      "isDimTable": true,
      "segmentsConfig": {
        "segmentPushType": "REFRESH",
        "replication": "1"
      },
      "tableIndexConfig": {
        "noDictionaryColumns": [
          "teamName"
        ]
      },
      "fieldConfigList": [
        {
          "name": "teamName",
          "encodingType": "RAW",
          "indexType": "TEXT"
        }
      ],
      "tenants": {},
      "metadata": {
        "customConfigs": {}
      }
    }
    m
    s
    • 3
    • 9
  • l

    Luis Fernandez

    12/17/2021, 7:15 PM
    hey just bumping this message in case anyone has any idea as to what’s happening https://apache-pinot.slack.com/archives/C011C9JHN7R/p1639691077070700
    m
    • 2
    • 9
  • n

    Nicholas Yu

    12/20/2021, 12:40 AM
    hi team, has anyone successfully ran a batch ingestion job from s3 into pinot using aws emr (spark)? looking for sample configurations/jobspecs/general help please and thanks 🙂
    m
    • 2
    • 1
  • e

    eywek

    12/20/2021, 12:55 PM
    Hello, I’m trying to use the new LIKE operator, but it seems that the
    NOT LIKE
    doesn’t work, do you know why? i.e. I’m having results with
    Copy code
    select * from datasource_61c064dc1c9900030074e5f3 where JSONEXTRACTSCALAR("labels", '$.locale', 'STRING') LIKE '%fr_FR%' limit 10
    but 0 results with
    Copy code
    select * from datasource_61c064dc1c9900030074e5f3 where JSONEXTRACTSCALAR("labels", '$.locale', 'STRING') NOT LIKE '%en_US%' limit 10
    where labels is
    Copy code
    {
      "locale": "fr_FR",
      "brand": "undiz"
    }
    Thank you
    a
    m
    • 3
    • 13
  • w

    Weixiang Sun

    12/20/2021, 7:14 PM
    I am trying to upload the realtime table schema with the following dateTimeFieldSpecs
    Copy code
    "dateTimeFieldSpecs": [
        {
          "name": "timestamp",
          "dataType": "LONG",
          "defaultNullValue": 0,
          "format": "1:MILLISECONDS:EPOCH",
          "granularity": "1:MILLISECONDS"
        },
        {
          "name": "timestamp_seconds",
          "dataType": "LONG",
          "defaultNullValue": 0,
          "transformFunction": "toEpochSecondsRounded(timestamp, 1)",
          "format": "1:SECONDS:EPOCH",
          "granularity": "1:SECONDS"
        }
      ]
    I got the following error:
    Copy code
    {
      "code": 400,
      "error": "Cannot add invalid schema: schema_name. Reason: Exception in getting arguments for transform function 'toEpochSecondsRounded(timestamp, 1)' for column 'timestamp_seconds'"
    }
    What is the wrong?
    m
    • 2
    • 4
  • a

    Ayush Kumar Jha

    12/21/2021, 5:11 AM
    Hi everyone I am trying to set up monitoring and for that using this
    Copy code
    ALL_JAVA_OPTS="-javaagent:jmx_prometheus_javaagent-0.12.0.jar=8088:pinot.yml -Xms4G -Xmx4G -XX:MaxDirectMemorySize=30g -Dlog4j2.configurationFile=conf/pinot-admin-log4j2.xml -Dplugins.dir=$BASEDIR/plugins"
    sudo bin/pinot-admin.sh StartController -configFileName /home/centos/controller.conf
    But could not access the metrics at 8088 port
    x
    j
    • 3
    • 13
  • d

    Diana Arnos

    12/21/2021, 3:23 PM
    👋 hello there How can I set up the consumer group id when I have a realtime table ingesting from a Kafka topic? It creates the group with id 0. We would need a group id for monitoring purposes. I tried setting it up through the table config as
    stream.kafka.consumer.group.id
    and `stream.kafka.group.id`as for the Kafka docs and it still creates the consumer group with id as 0 😞
    p
    • 2
    • 2
  • s

    Stav Gayer

    12/21/2021, 3:37 PM
    Hey, i’m testing pinot with trino on the last few days and i see something weird in the metrics (i’m running pinot 0.9.1 and trino 366 trino configured with
    Copy code
    pinot.max-rows-per-split-for-segment-queries=1000000
    pinot.request-timeout=1m
    ) as you can see, “broker jvm used” graph continues to rise and fall even when there are no requests to the server at all, and this did not stop until I deleted the pods The same thing happened to the server In addition, Trino is almost unusable Simple query like
    Copy code
    select * from bi_test_table where page_url like '%google%' and epoch_ts >= 1639440000
      AND epoch_ts < 1640044800
    get a timeout after a minute and sometimes crash the servers Where am I wrong?
    r
    k
    +2
    • 5
    • 45
  • w

    Weixiang Sun

    12/21/2021, 6:11 PM
    I have a quick question about hybrid table. Can we configure the red line? such as move it from starting of Mar 25 to starting of Mar 24?
    m
    j
    • 3
    • 6
  • w

    Weixiang Sun

    12/22/2021, 3:04 AM
    Is there any document about how to create the hybrid table? I see the offline table should have the same name as realtime table. Why make such assumption? How about the hybrid table name and child table name? Should they be same also?
    m
    • 2
    • 4
  • l

    Luis Fernandez

    12/22/2021, 2:30 PM
    hey friends, just reaching out with this error again as I haven’t been able to come up with a reason as to why it’s happening,
    Copy code
    Caught exception while processing query: QueryContext{_tableName='ads_metrics_REALTIME', _selectExpressions=[listing_id, sum(click_count), sum(impression_count), sum(cost), sum(order_count), sum(revenue)], _aliasList=[null, null, null, null, null, null], _filter=(shop_id = '25746445' AND serve_time BETWEEN '1637125200' AND '1639717199'), _groupByExpressions=[listing_id], _havingFilter=null, _orderByExpressions=null, _limit=6000, _offset=0, _queryOptions={responseFormat=sql, groupByMode=sql, timeoutMs=9999}, _debugOptions=null, _brokerRequest=BrokerRequest(querySource:QuerySource(tableName:ads_metrics_REALTIME), pinotQuery:PinotQuery(dataSource:DataSource(tableName:ads_metrics_REALTIME), selectList:[Expression(type:IDENTIFIER, identifier:Identifier(name:listing_id)), Expression(type:FUNCTION, functionCall:Function(operator:SUM, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:click_count))])), Expression(type:FUNCTION, functionCall:Function(operator:SUM, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:impression_count))])), Expression(type:FUNCTION, functionCall:Function(operator:SUM, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:cost))])), Expression(type:FUNCTION, functionCall:Function(operator:SUM, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:order_count))])), Expression(type:FUNCTION, functionCall:Function(operator:SUM, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:revenue))]))], filterExpression:Expression(type:FUNCTION, functionCall:Function(operator:AND, operands:[Expression(type:FUNCTION, functionCall:Function(operator:EQUALS, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:shop_id)), Expression(type:LITERAL, literal:<Literal longValue:25746445>)])), Expression(type:FUNCTION, functionCall:Function(operator:BETWEEN, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:serve_time)), Expression(type:LITERAL, literal:<Literal longValue:1637125200>), Expression(type:LITERAL, literal:<Literal longValue:1639717199>)]))])), groupByList:[Expression(type:IDENTIFIER, identifier:Identifier(name:listing_id))], orderByList:[], limit:6000, queryOptions:{responseFormat=sql, groupByMode=sql, timeoutMs=9999}))}
    java.lang.ArrayIndexOutOfBoundsException: null
    more information, it seems to have something to do with my serve_time filter, the longer i look for data (more than 3 days) the more likely i’m to get this error.
    r
    • 2
    • 2
  • d

    Diogo Baeder

    12/22/2021, 7:40 PM
    Hi folks! I seem to have found a bug, but I'd like to confirm here first, and if it does look like a bug I'll open a ticket on GitHub - I just want to check whether I'm doing anything wrong.
    a
    j
    • 3
    • 25
  • v

    Vibhor Jaiswal

    12/28/2021, 10:26 PM
    Hi All . We are couple of new users from a NA based Investment Bank . We are pocing heavily on Pinot and getting some road blocks and need some guidance
    m
    • 2
    • 11
  • p

    Priyank Bagrecha

    12/28/2021, 10:55 PM
    i added a sorted inverted index on one field, and now i am not able to add inverted index on another fields. is that expected? i am using a derived field
    event_ts_5_min
    which represents start of a 5 minute time bucket. would it even make sense to add inverted index on
    event_ts_5_min
    ?
    test_id
    is a filter in all queries. I did read
    A sorted index performs much better than an inverted index, but it can only be applied to one column per table.
    in the documentation but didn't think that i can't apply inverted index on other fields.
    n
    • 2
    • 2
  • t

    Tao Hu

    12/29/2021, 12:41 AM
    Hi team, looks like the text_match query doesn't work as described in the docs when search for multiple words. Based on the doc, the query should return documents that contains the entire phrase. In my use case, phrases such as "Boston Doves", "Boston Rustlers" does not match with "Boston red" but still showing in the result. Can someone take a look at it? I'm using 0.9.2. Thanks!
    m
    s
    +2
    • 5
    • 12
  • y

    yelim yu

    12/30/2021, 2:10 AM
    Hi, this is yelim from korea. There are two questions i want to ask. 1. when there is a null value in a certain row while we want to overwrite not-null value on there, why is it not possible? We found difficulty on “ only not-null value can be overwrited at this version” 2. if we want to delete certain row in a pinot table, is there any solution to to do this? (just like alter table, or drop certain rows in normal db) Many thks beforehand!
    m
    • 2
    • 2
1...293031...166Latest