https://pinot.apache.org/ logo
Join Slack
Powered by
# troubleshooting
  • p

    Priyank Bagrecha

    12/02/2021, 9:58 PM
    hello, i see these messages in logs a lot
    Copy code
    2021/12/02 21:32:12.132 ERROR [SegmentBuildTimeLeaseExtender] [pool-4-thread-1] Failed to send lease extension for km_mp_play_startree__63__21__20211202T2127Z
    2021/12/02 21:32:18.330 ERROR [SegmentBuildTimeLeaseExtender] [pool-4-thread-1] Failed to send lease extension for km_mp_play_startree__103__21__20211202T2127Z
    2021/12/02 21:32:24.354 ERROR [SegmentBuildTimeLeaseExtender] [pool-4-thread-1] Failed to send lease extension for km_mp_play_startree__111__21__20211202T2127Z
    and i see that the server is marked as dead in the cluster manager. how can i get around this? thanks in advance.
    m
    • 2
    • 6
  • p

    Priyank Bagrecha

    12/02/2021, 10:47 PM
    this is another thing i am seeing in logs
    Copy code
    2021/12/02 21:34:27.105 ERROR [LLRealtimeSegmentDataManager_km_mp_play_startree__48__1__20211202T1919Z] [km_mp_play_startree__48__1__20211202T1919Z] Holding after response from Controller: {"offset":-1,"streamPartitionMsgOffset":null,"buildTimeSec":-1,"isSplitCommitType":false,"status":"NOT_SENT"}
    Killed
    i don't have the part before this right now but can update once it happens again.
    m
    • 2
    • 8
  • a

    Ali Atıl

    12/03/2021, 8:48 AM
    Hello, is there anyway to run pinot cluster inside intellij for debug purposes?
    x
    m
    r
    • 4
    • 13
  • d

    Diana Arnos

    12/03/2021, 4:55 PM
    So, I'm trying to configure a table with partial upsert on
    Pinot 0.9.0
    that consumes from a kafka topic and I'm experiencing a weird behaviour. Once the second message gets consumed, Pinot does a full upsert instead of the partial. So every field present in the second message gets updated and all the others are set to null (I believe because they are not present on the second message and the full upsert uses the default null values) Here's the table and schema configs: Schema:
    Copy code
    {
      "schemaName": "responseCount",
      "dimensionFieldSpecs": [
        {
          "name": "responseId",
          "dataType": "STRING"
        },
        {
          "name": "formId",
          "dataType": "STRING"
        },
        {
          "name": "channelId",
          "dataType": "STRING"
        },
        {
          "name": "channelPlatform",
          "dataType": "STRING"
        },
        {
          "name": "companyId",
          "dataType": "STRING"
        },
        {
          "name": "submitted",
          "dataType": "BOOLEAN"
        },
        {
          "name": "deleted",
          "dataType": "BOOLEAN"
        }
      ],
      "dateTimeFieldSpecs": [
        {
          "name": "operationDate",
          "dataType": "STRING",
          "format": "1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd'T'HH:mm:ss.SSSZ",
          "granularity": "1:MILLISECONDS"
        },
        {
          "name": "createdAt",
          "dataType": "STRING",
          "format": "1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd'T'HH:mm:ss.SSSZ",
          "granularity": "1:MILLISECONDS"
        },
        {
          "name": "deletedAt",
          "dataType": "STRING",
          "format": "1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd'T'HH:mm:ss.SSSZ",
          "granularity": "1:MILLISECONDS"
        }
      ],
      "primaryKeyColumns": [
        "responseId"
      ]
    }
    Table:
    Copy code
    {
      "REALTIME": {
        "tableName": "responseCount_REALTIME",
        "tableType": "REALTIME",
        "segmentsConfig": {
          "allowNullTimeValue": false,
          "replication": "1",
          "replicasPerPartition": "1",
          "timeColumnName": "operationDate",
          "schemaName": "responseCount"
        },
        "tenants": {
          "broker": "DefaultTenant",
          "server": "DefaultTenant"
        },
        "tableIndexConfig": {
          "rangeIndexVersion": 1,
          "autoGeneratedInvertedIndex": false,
          "createInvertedIndexDuringSegmentGeneration": false,
          "loadMode": "MMAP",
          "streamConfigs": {
            "streamType": "kafka",
            "stream.kafka.topic.name": "response-count.aggregation.source",
            "stream.kafka.broker.list": "kafka:9092",
            "stream.kafka.consumer.type": "lowlevel",
            "stream.kafka.consumer.prop.auto.offset.reset": "smallest",
            "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
            "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
            "realtime.segment.flush.threshold.rows": "0",
            "realtime.segment.flush.threshold.time": "24h",
            "realtime.segment.flush.segment.size": "100M"
          },
          "enableDefaultStarTree": false,
          "enableDynamicStarTreeCreation": false,
          "aggregateMetrics": false,
          "nullHandlingEnabled": true
        },
        "metadata": {},
        "routing": {
          "instanceSelectorType": "strictReplicaGroup"
        },
        "upsertConfig": {
          "mode": "PARTIAL",
          "partialUpsertStrategies": {
            "deleted": "OVERWRITE",
            "deletedAt": "OVERWRITE"
          },
          "hashFunction": "NONE"
        },
        "isDimTable": false
      }
    }
    Here's the first message consumed:
    Copy code
    Key: {"responseId": "52d96a0d-92ea-4103-9ea9-536252324481"}
    Value:
    {
      "responseId": "52d96a0d-92ea-4103-9ea9-536252324481",
      "formId": "7bd28941-f9e4-45f1-a801-5c7d647cc6cd",
      "channelId": "60d11312-0e01-48d8-acce-4871b8d2365b",
      "channelPlatform": "app",
      "companyId": "00ca0142-5634-57e6-8d44-61427ea4b13d",
      "submitted": true,
      "deleted": "false",
      "createdAt": "2021-05-21T12:55:54.000+0000",
      "operationDate": "2021-05-21T12:55:54.000+0000"
    }
    Here's the second message consumed:
    Copy code
    Key: {"responseId": "52d96a0d-92ea-4103-9ea9-536252324481"}
    Value:
    {
      "responseId": "52d96a0d-92ea-4103-9ea9-536252324481",
      "deleted": "true",
      "deletedAt": "2021-10-21T12:55:54.000+0000",
      "operationDate": "2021-05-21T12:55:54.000+0000"
    }
    d
    m
    +4
    • 7
    • 24
  • a

    Anish Nair

    12/06/2021, 5:00 PM
    Hi Team, We have configured HDFS as deep storage for pinot cluster. Upon checking the segment.realtime.download.url , it is pointing to controllers storage path Url: http://c81:9000/segments/max_reporting_aggregations/max_reporting_aggregations__0__150__20211127T0641Z returned Path: /tmp/data/PinotController/max_reporting_aggregations/max_reporting_aggregations__0__150__20211127T0641Z Shouldn't this segment get stored at HDFS location? Below are the controller's and server's config. Controller Config:
    Copy code
    # Pinot Role
    pinot.service.role=CONTROLLER
    
    # Pinot Cluster name
    pinot.cluster.name=MAX-Pinot
    
    # Pinot Zookeeper Server
    pinot.zk.server=c81:2181
    
    # Use hostname as Pinot Instance ID other than IP
    pinot.set.instance.id.to.hostname=true
    
    # Pinot Controller Port
    controller.port=9000
    
    # Pinot Controller VIP Host
    controller.vip.host=c81
    
    # Pinot Controller VIP Port
    controller.vip.port=9000
    
    # Location to store Pinot Segments pushed from clients
    controller.data.dir=<hdfs://nameservice1/data/max/poc/hdfs/controller/>
    
    controller.task.frequencyPeriod=3600
    controller.local.temp.dir=/opt/pinot/host/
    controller.enable.split.commit=true
    controller.access.protocols.http.port=9000
    controller.helix.cluster.name=MAX-Pinot
    pinot.controller.segment.fetcher.protocols=file,http,hdfs
    pinot.controller.storage.factory.class.hdfs=org.apache.pinot.plugin.filesystem.HadoopPinotFS
    pinot.controller.storage.factory.hdfs.hadoop.conf.path=/opt/pinot/hadoop/etc/hadoop
    pinot.controller.segment.fetcher.hdfs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
    pinot.server.grpc.enable=true
    Server Config:
    Copy code
    # Pinot Role
    pinot.service.role=SERVER
    
    # Pinot Cluster name
    pinot.cluster.name=MAX-Pinot
    
    # Pinot Zookeeper Server
    pinot.zk.server=c81:2181
    
    # Use hostname as Pinot Instance ID other than IP
    pinot.set.instance.id.to.hostname=true
    
    # Pinot Server Netty Port for queris
    pinot.server.netty.port=8098
    
    # Pinot Server Admin API port
    pinot.server.adminapi.port=8097
    
    # Pinot Server Data Directory
    pinot.server.instance.dataDir=/opt/pinot/host/data/server/index
    
    # Pinot Server Temporary Segment Tar Directory
    pinot.server.instance.segmentTarDir=/opt/pinot/host/data/server/segmentTar
    
    pinot.server.consumerDir=/opt/pinot/host/data/server/consumer
    pinot.server.instance.enable.split.commit=true
    pinot.server.instance.reload.consumingSegment=true
    pinot.server.segment.fetcher.protocols=file,http,hdfs
    pinot.server.segment.fetcher.hdfs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
    pinot.server.storage.factory.class.hdfs=org.apache.pinot.plugin.filesystem.HadoopPinotFS
    pinot.server.storage.factory.hdfs.hadoop.conf.path=/opt/pinot/hadoop/etc/hadoop
    pinot.server.grpc.enable=true
    pinot.server.grpc.port=8090
    pinot.server.query.executor.timeout=100000
    pinot.server.instance.realtime.alloc.offheap=true
    m
    • 2
    • 3
  • t

    Tiger Zhao

    12/06/2021, 10:43 PM
    Hi, I'm trying to use the Trino/Pinot connector but I'm seeing some strange behavior. Queries such as
    select * from pinot.default.table
    seem to just run forever without returning any results. But queries like
    select max(col) from pinot.default.table
    seem to run fine. It looks like I have to do some sort of aggregation or group by in order for the query to run. I can't seem to just select rows. Is this behavior expected?
    m
    x
    +2
    • 5
    • 16
  • v

    Vishal Garg

    12/07/2021, 5:44 AM
    How can I download the table data as CSV? From UI only 10 records are exported.
    m
    d
    • 3
    • 4
  • a

    Ahmed Shehata

    12/07/2021, 8:58 AM
    Hello Guys it is a basic question "I Know that pinot currently not handling null values curretly " Reading this article what i understand is there is a work around to at least filter by NULL in select statement https://docs.pinot.apache.org/developers/advanced/null-value-support where column IS NOT NULL .. I used to get errors while injesting data from kafka to a table because if this null issue so i modified my table config to have default values for field that might contain null to avoid such an error"not sure if that's an ok practice i did so both in transform injest function and schema default values as follow { "columnName":"special_reference", "transformFunction":"JSONPATHSTRING(json_format(payload),'$.after.special_reference','null')" }, { "columnName":"customer_id", "transformFunction":"JSONPATHLONG(json_format(payload),'$.after.customer_id',-2147483648)" }, ------- schema }, { "name" : "customer_id", "dataType" : "INT", "defaultNullValue": -2147483648 }, { "name" : "user_id", "dataType" : "INT" },{ "name" : "special_reference", "dataType" : "STRING", "defaultNullValue": "null" }, also i added nullhandlingenabled to true "tableIndexConfig": { "loadMode": "MMAP", "nullHandlingEnabled": true what i was expecting that when i filter by is NULL in the query editor it will be able to to map 'null' originated from actual null and return the right query " Am i wrong please help me? I.E select * from next_intentions_trial4 where special_reference IS NOT NULL limit 10 returns record where special_reference null or pinots 'null' i should say I used later version of pinot on kubernetes
    m
    • 2
    • 2
  • j

    Jonathan Meyer

    12/07/2021, 12:52 PM
    Hello 👋 Is there any way to know / get notified (e.g. kafka message) when a message is done being ingested and available for query ?
    m
    • 2
    • 13
  • e

    Elon

    12/07/2021, 6:00 PM
    We have an upsert table and are trying to use pool based instance assignment. Only 1 instance from each pool contains all the consuming segments and we get duplicate rows if we set
    COMPLETED
    segments to have
    numInstancesPerPartition
    = 0 (so it can use all instances). Is upsert compatible with pool based instance assignment?
    j
    w
    l
    • 4
    • 31
  • x

    xtrntr

    12/08/2021, 7:06 AM
    there seems to be a hard limit of 1 million rows returned by pinot even with using
    LIMIT
    way beyond that; any way to remove this? currently using the
    latest-jdk11
    image for pinot on kubernetes unfortunately, i can’t seem to use
    IN_SUBQUERY
    to represent the userid set, so on the client side i break my pinot queries into 1. fetch userids query (using a GROUP BY + HAVING query). sometimes i may get more than a million user ids 2. do final query
    m
    j
    • 3
    • 25
  • a

    Ahmed Shehata

    12/08/2021, 10:29 AM
    guys do i need to add any configuration related to s3 for realtime table configs i make it before work but lost my change i did most of config related to cotroller and server but not sure if i still need to add something let my tabe use s3 as a deep store segment realtime.segment.download.url: s3 path
    d
    m
    d
    • 4
    • 89
  • t

    troywinter

    12/08/2021, 3:07 PM
    Hi all, Is there any docs explaining how to update to a new release? When I upgrade to the new version, new instance will fail to start, it complains an instance with the same name exist. Currently, I can delete that node from zookeeper’s LiveInstance section, then I can drop it, and the new version will start normally. Is there a better way doing this?
    s
    • 2
    • 2
  • p

    Priyank Bagrecha

    12/08/2021, 7:53 PM
    are segments uploaded from all servers to controller? how can i disable it, assuming it is ok to do that?
    s
    m
    m
    • 4
    • 42
  • a

    Ali Atıl

    12/09/2021, 1:06 PM
    Hi everyone, is there a character limit for STRING data type? it seems like value truncated to the first 512 characters. is there anyway to configure the string length?
    m
    • 2
    • 2
  • t

    Tiger Zhao

    12/09/2021, 6:13 PM
    It looks like the MAX function truncates large longs and loses precision. For example,
    select MAX(1639054811930692679) from table
    returns
    1.63905481193069261E18
    . Is this behavior expected?
    j
    k
    • 3
    • 4
  • p

    Priyank Bagrecha

    12/09/2021, 9:45 PM
    how do i remove the dead controller entries?
    m
    n
    • 3
    • 5
  • p

    Priyank Bagrecha

    12/09/2021, 10:24 PM
    not really a troubleshooting question.
    The deep store stores a compressed version of the segment files and it typically won't include any indexes.
    will index always be in memory? is index re-computed when a server loads a segment from the deep store? is there a way to view size of the index?
    m
    m
    • 3
    • 35
  • t

    Tanmay Movva

    12/10/2021, 3:39 AM
    Hello. I am trying out the Pinot Connector in Trino and I am facing the an error on a simple select query like
    Copy code
    select * from pinot.default.table limit 10
    This is the stacktrace of the error. Can anyone please help? Did anyone face a similar issue before?
    Copy code
    java.lang.NullPointerException: null value in entry: Server_server-2.server-headless.pinot.svc.cluster.local_8098=null
    	at com.google.common.collect.CollectPreconditions.checkEntryNotNull(CollectPreconditions.java:32)
    	at com.google.common.collect.SingletonImmutableBiMap.<init>(SingletonImmutableBiMap.java:42)
    	at com.google.common.collect.ImmutableBiMap.of(ImmutableBiMap.java:72)
    	at com.google.common.collect.ImmutableMap.of(ImmutableMap.java:119)
    	at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:454)
    	at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:433)
    	at io.trino.plugin.pinot.PinotSegmentPageSource.queryPinot(PinotSegmentPageSource.java:221)
    	at io.trino.plugin.pinot.PinotSegmentPageSource.fetchPinotData(PinotSegmentPageSource.java:182)
    	at io.trino.plugin.pinot.PinotSegmentPageSource.getNextPage(PinotSegmentPageSource.java:150)
    	at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:311)
    	at io.trino.operator.Driver.processInternal(Driver.java:387)
    	at io.trino.operator.Driver.lambda$processFor$9(Driver.java:291)
    	at io.trino.operator.Driver.tryWithLock(Driver.java:683)
    	at io.trino.operator.Driver.processFor(Driver.java:284)
    	at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1076)
    	at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
    	at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
    	at io.trino.$gen.Trino_362____20211126_004329_2.run(Unknown Source)
    	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    	at java.base/java.lang.Thread.run(Thread.java:829)
    m
    e
    • 3
    • 14
  • a

    Alexander Vivas

    12/10/2021, 10:09 AM
    <!here> good morning guys. We setup a pinot cluster a while ago, while it was 0.6.0 the latest version and recently spawned a new cluster with version 0.8.0 to test it before using it in production. Before we start streaming data into this new cluster I’d like to know first if having two clusters with low level kafka consumers streaming data from the same topic would represent an issue? I ask this because I see the current cluster doesn’t rely on kafka consumer groups to keep track of the offsets, on the other hand in our kafka provider I see there is an empty named consumer group consuming data from the topics and it seems that one belongs to pinot
    x
    • 2
    • 3
  • j

    Jeff Moszuti

    12/10/2021, 3:44 PM
    Hello, I am kicking the tyres of Pinot (v 0.9.0) by doing the following tutorial https://github.com/npawar/pinot-tutorial. I load 4 records from a CSV file into an offline table named transcript and I get 4 rows returned when executing the following statement
    select * from transcript limit 10
    . As soon as I upload a realtime table config and schema (https://github.com/npawar/pinot-tutorial/tree/master/transcript#upload-realtime-table-config-and-schema) only 3 rows are returned when running the same SQL statement. I do however see 4 rows if I query the offline table e.g.
    select * from transcript_OFFLINE limit 10
    . What could be the reason?
    k
    m
    • 3
    • 7
  • w

    Weixiang Sun

    12/10/2021, 10:15 PM
    Currently hybrid table is between offline table and realtime table. Is it possible to hybrid offline table and upsert table?
    m
    • 2
    • 7
  • s

    Sergey Bondarev

    12/13/2021, 11:49 AM
    Hello, I got an error
    zookeeper.request.timeout value is 0. feature enabled=
    Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
    Socket error occurred: localhost/127.0.0.1:2181: Connection refused
    while running the command
    ./pinot-admin.sh StartController
    Any idea what could go wrong?
    j
    • 2
    • 1
  • l

    Luis Fernandez

    12/13/2021, 4:46 PM
    has anyone gotten this issue before? I’m getting the following exception in one of the brokers:
    Copy code
    2021-12-13 11:31:40	
    java.lang.OutOfMemoryError: Direct buffer memory
    2021-12-13 11:31:40	
    Caught exception while handling response from server: pinot-server-1_R
    we currently have 2 brokers, currently doing a lot of garbage collection i’m unaware as to why. latency from broker to server has been severed by a lot but I’m not sure what happened as to we haven’t been touching the pinot cluster lately, we did stop one of our apps from streaming but that doesn’t line up with the spikes on response times.
    r
    m
    • 3
    • 25
  • m

    Mahesh babu

    12/14/2021, 11:12 AM
    Hi team, error while running join queries of pinot data from presto. ERROR:Query 20211214_102018_00035_4f5rn failed: null value in entry: Server_172.19.0.5_7000=null
    m
    x
    • 3
    • 7
  • p

    Prashant Pandey

    12/15/2021, 9:30 AM
    Hi team, we recently upgrade Pinot from 0.7.1 -> 0.9.1. We have been noticing an increase in controllers’ CPU usage. Here’s the trend for one of the controllers (deployment at 12/15 00:17). Need some help troubleshooting this:
    x
    • 2
    • 7
  • m

    Mark Needham

    12/15/2021, 10:30 AM
    I think you might be missing the
    controller.zk.str
    property in the conf file
    a
    • 2
    • 1
  • s

    Syed Akram

    12/15/2021, 12:04 PM
    what exactly we need to do to restore controller segments to new pinot cluster?
    m
    m
    • 3
    • 78
  • v

    Vedran Krtalić

    12/15/2021, 12:10 PM
    Hello 👏, we are having an issue with derived columns Concretely we're trying to make derived columns -> daysTs and hoursTs, but on consuming segments on realtime table derived columns have min Long value -> -2^63 In other segments derived columns in realtime table have OK value. Also we have setup RealtimeToOffline task and when segments are transferred in offline, daysTs and hoursTs have same min Long value as on consuming segments. Ingestion config part for real time table
    "ingestionConfig": {
    "transformConfigs": [
    {
    "columnName": "id",
    "transformFunction": "JSONPATHSTRING(transaction, '$.id', 'null')"
    },
    .
    .
    .,
    {
    "columnName": "ts",
    "transformFunction": "JSONPATHLONG(transaction, '$.ts.date', 0)"
    }{
    "columnName": "daysTs",
    "transformFunction": "toEpochDays(ts)"
    },
    {
    "columnName": "hoursTs",
    "transformFunction": "toEpochHours(ts)"
    }
    Derived columns are in schema (bottom two):
    "dateTimeFieldSpecs": [
    {
    "name": "ts",
    "dataType": "LONG",
    "format": "1:MILLISECONDS:EPOCH",
    "granularity": "1:MILLISECONDS"
    },
    {
    "name": "cat",
    "dataType": "LONG",
    "format": "1:MILLISECONDS:EPOCH",
    "granularity": "1:MILLISECONDS"
    },
    {
    "name": "agn_o",
    "dataType": "LONG",
    "format": "1:MILLISECONDS:EPOCH",
    "granularity": "1:MILLISECONDS"
    },
    {
    "name": "_sourceTimestamp",
    "dataType": "LONG",
    "format": "1:MILLISECONDS:EPOCH",
    "granularity": "1:MILLISECONDS"
    },
    {
    "name": "daysTs",
    "dataType": "LONG",
    "format": "1:DAYS:EPOCH",
    "granularity": "1:DAYS"
    },
    {
    "name": "hoursTs",
    "dataType": "LONG",
    "format": "1:HOURS:EPOCH",
    "granularity": "1:HOURS"
    }
    hoursTs is aggreate dimension in startree index as follows:
    "starTreeIndexConfigs": [
    {
    "dimensionsSplitOrder": [
    "hoursTs",
    "c",
    "ty",
    "cu",
    "dp",
    "ag"
    ],
    "skipStarNodeCreationForDimensions": [],
    "functionColumnPairs": [
    "SUM__a",
    "COUNT__id",
    "SUM__ra",
    "SUM__rp",
    "COUNT__*"
    ],
    "maxLeafRecords": 10000
    }
    ]
    daysTs is blomm filter index as follows:
    "bloomFilterColumns": [
    "id",
    "daysTs"
    ]
    Are we missing something?
    m
    m
    • 3
    • 10
  • j

    Jonathan Meyer

    12/15/2021, 2:17 PM
    Hello Was there any change to
    ingestFromFile
    endpoint between 0.8.0 and 0.9.1 ? We're now getting HTTP 500 with: Maybe an issue related to null support ?
    Untitled.txt
    m
    j
    • 3
    • 25
1...282930...166Latest