https://pinot.apache.org/ logo
Join Slack
Powered by
# troubleshooting
  • v

    Venkat Boina(VB)

    05/13/2023, 7:12 PM
    @Elon Does passthrough support group by roll up? I am using 0.12 version getting exception as it does not recognise the field declared inside rollup
    e
    • 2
    • 6
  • c

    Chris Han

    05/15/2023, 7:47 PM
    I'm trying to update an
    IDEAL STATE
    for a table in Zookeeper. The
    IDEAL STATE
    json I need to update is over 769,000 characters long (there are over 8000 segments), and when I try to update it I'm receiving a
    Bad Request
    response, presumably because the request data to Zookeeper is too long. I need to manually update the
    DEAD
    server IPs with
    ALIVE
    server IPs. I have over 8000 of these entries:
    Copy code
    ...   
     "table_OFFLINE_8697": {
          "Server_10.193.7.135_8098": "ONLINE"
        },
        "table_OFFLINE_8698": {
          "Server_10.193.7.135_8098": "ONLINE"
        },
    ...
    Is there a way I can iteratively update the
    IDEAL STATE
    that doesn't require me to upload the entire document? Is there another way I can "migrate" the segments from one server to another within the Zookeeper configs?
    h
    • 2
    • 2
  • e

    Ethan Huang

    05/16/2023, 3:27 AM
    Hello team, I am trying to add a range index on an exiting column without dictionary index, I got an exception shown in the image. After reading the code, I found that pinot allows creating range index for no-dictionary columns(
    DefaultIndexCreatorProvider#newRangeIndexCreator
    ,
    RangeIndexHandler#handleNonDictionaryBasedColumn
    ). However, the
    BitSlicedRangeIndexCreator
    relies on the min and max value of the indexing column, but the
    minValue
    and
    maxValue
    are both
    null
    in
    ColumnMetadata
    when the column has no dictionary. is it a bug? or additional configurations needed to avoid such exception? BTW, the version is 0.12.1 release. Thanks.
    r
    m
    j
    • 4
    • 10
  • v

    Venkat Boina(VB)

    05/16/2023, 7:44 AM
    @channel Does passthrough support group by roll up? I am using 0.12 version getting exception as it does not recognise the field declared inside rollup. @Elon or @Mayank
    e
    • 2
    • 5
  • l

    Lee Wei Hern Jason

    05/16/2023, 8:44 AM
    Hi Team, is it possible to get auth token from environment variables in extra configs ? I tried using ${} but the value didnt get assign in the pod.
    Copy code
    envFrom:
        - secretRef:
            name: pinot-secrets
    
        extra:
          configs: |-
            pinot.set.instance.id.to.hostname=true
            pinot.minion.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
            pinot.minion.storage.factory.s3.region=ap-southeast-1
            pinot.minion.segment.fetcher.protocols=file,http,s3
            pinot.minion.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
            segment.fetcher.auth.token=${PINOT_SEGMENT_FETCHER_AUTH_TOKEN}
            task.auth.token=${PINOT_SEGMENT_FETCHER_AUTH_TOKEN}
    x
    • 2
    • 11
  • l

    Lvszn Peng

    05/16/2023, 12:19 PM
    hi team, when i upgrade pinot from 0.9.3 to 0.12.1, the pinot-server show me an error
    Exception in thread "main" java.lang.NoSuchFieldError: JAVA_11
    . Is the Java version to low?
    • 1
    • 4
  • e

    Ehsan Irshad

    05/16/2023, 1:21 PM
    Hi Team. What are the generic guidelines to fine tune the queries, is my method below correct? (here I am not considering the underlying resources, like number of brokers, servers etc or node sizes) 1. Reduce the
    numSegmentsProcessed
    by Segment Pruning on broker 2. Reduce the
    numEntriesScannedPostFilter
    by adding more filters in query 3. Because of 2,
    numEntriesScannedInFilter
    will increase. So make it 0 by adding the indexes
    h
    m
    a
    • 4
    • 3
  • d

    Deepak Arumugham

    05/16/2023, 1:43 PM
    Hi All, We are trying to evaluate a use-case of performing full-text queries on our parquet files(TBs) in GCS buckets. Is Pinot the right solution for our use-case? Can we use GCS as our deep storage in Pinot?
    b
    • 2
    • 3
  • c

    Chris Han

    05/17/2023, 3:35 PM
    I run out of Java heap space when executing queries via the Query Console using v2. Is there guidance on how to appropriately size the heap space? This error is from my server logs
    Copy code
    Exception in thread "idle-connection-reaper" java.lang.OutOfMemoryError: Java heap space
    m
    • 2
    • 3
  • d

    Deepak Arumugham

    05/17/2023, 10:54 PM
    We are trying to ingest Parquet files from GCS buckets. And we are planning to use GCS as our deepstore. We've installed Pinot via helm charts. Our Configmap would look like this
    controller.data.dir=<gs://pinot-data-dir>
    <http://pinot.controller.storage.factory.class.gs|pinot.controller.storage.factory.class.gs>=org.apache.pinot.plugin.filesystem.GcsPinotFS
    pinot.controller.segment.fetcher.protocols=file,http,gs
    <http://pinot.controller.segment.fetcher.gs|pinot.controller.segment.fetcher.gs>.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
    Even though we have provided the correct GCS data directory for the controller, the segments are getting created locally in pinot cluster's disk and soon we get into java.lang.OutOfMemoryError: Java heap space. And our parquet files are sized in 50-500 MB range. We are under the impression that on Ingestion, data would be processed and would be created in GCS buckets. Am I missing something here? How can we solve this? Any pointers would be helpful
    m
    • 2
    • 8
  • m

    Michael Roman Wengle

    05/18/2023, 5:34 AM
    We face the following issues with the `RealtimeToOfflineSegmentsTask`:
    Copy code
    no native library is found for os.name=Linux and os.arch=aarch64
    null
    java.lang.NullPointerException
    	at xerial.larray.impl.LArrayLoader$NativeLib.extractLibraryFile(LArrayLoader.java:182)
    [...]
    Copy code
    java.lang.UnsatisfiedLinkError: 'long xerial.larray.impl.LArrayNative.mmap(long, int, long, long)'
    	at xerial.larray.impl.LArrayNative.mmap(Native Method) ~[pinot-all-0.13.0-SNAPSHOT-jar-with-dependencies.jar:0.13.0-SNAPSHOT-12d86902a84d4bc78b6f2f7bc8bd002659ee61cb]
    The minions are deployed on Graviton nodes in k8s (official Pinot Helm chart). Did anyone experience the same problem? Is there a way to solve the issue or do we need to switch to x86 k8s nodes?
    no native library is found for os.name Linux and os.arch aarch64.log
    m
    • 2
    • 2
  • e

    Eaugene Thomas

    05/18/2023, 7:48 AM
    Hi team , do we have any API in pinot controller to get the table size before replication ? the current table size API is giving total size of the table including replication
    j
    m
    • 3
    • 7
  • d

    Deena Dhayalan

    05/18/2023, 8:31 AM
    Hi team , I having a doubt in distribution of memory while segment creation (Batch Ingestion , Will do in hadoop ways). I need to know that how much heap memory (32GB RAM) needed for how many number of threads I am specifying in segmentCreationJobParallelism for a file size appx 500MB lets say I have 10 orc files in my rawdata folder to ingest
    m
    • 2
    • 6
  • t

    Tommaso Peresson

    05/18/2023, 11:48 AM
    Hello there, is there a way to set
    ConcurrentTasksPerWorker
    in the minion config runtime for a
    SegmentGenerationAndPushTask
    task? Thanks
    l
    • 2
    • 4
  • t

    Tanmay Varun

    05/18/2023, 5:16 PM
    One small bug in command documentation, page https://docs.pinot.apache.org/basics/getting-started/kubernetes-quickstart
    helm install -n pinot-quickstart kafka kafka/kafka --set replicas=1,zookeeper.image.tag=latest
    replicas --> replicaCount
    • 1
    • 1
  • d

    Deepak Arumugham

    05/19/2023, 5:51 AM
    Team, I used Spark to ingest data.. and found a strange case of Segment's state turning to BAD state after ingestion.
    Caught Exception in state transtition from OFFLINE -> ONLINE for resource
    Can you please provide any insights on this. Once the ingestion is complete, the segment goes to BAD state
    And on trying to query, we are getting
    {
    "errorCode": 305,
    "message": "null:\n1 segments unavailable: [xyz_OFFLINE_2021-11-16-17_2022-09-21-00_0]"
    }
    j
    • 2
    • 3
  • s

    Sanjay

    05/19/2023, 1:07 PM
    Hi, I am running an
    standalone
    ingestion and it tries to copy the input files in
    /tmp
    directory and eventually that is causing the space issue, is there any parameter to change to some other
    mount
    path?
    m
    • 2
    • 10
  • t

    Tommaso Peresson

    05/19/2023, 4:04 PM
    Hello, I’m trying to optimise Minion Ingestion with GCS as deep-store. Currently scheduling
    SegmentGenerationAndPushTask
    tasks takes minutes and I don’t know how to debug it and optimise it. I thought it was wildcards in the input format triggering a long scan on GCS(as it is a flat FS) but removing them doesn’t help. Can someone pls help me with a checklist of things to look for optimise this process?
    m
    • 2
    • 7
  • j

    J Vossler

    05/19/2023, 5:16 PM
    We are using prometheus/promtail and storing metrics data in Mimir to be displayed in grafana. Our pinot javaagent is set up to use port 8008 instead of 8888. Is there any standards for what port to use for metrics scrapes? Or just pick anything that is not used?
    m
    • 2
    • 1
  • r

    Raveendra Yerraguntla

    05/20/2023, 6:59 PM
    Hello This question is about performance.The below query takes almost 10 seconds on. gcp cloud with 3 n2-standard-2 . all the fields are string fields except timestamp which is timestamp indexed.what kind of indexes I need to build for a better performance? I have many more time series queries to be displayed from superset but all are timing out. I am looking for index creation and performance improvement . - SELECT query,product_name, COUNT(*) FROM "default"."clicksTable" WHERE product_name != 'null' GROUP BY product_name, query ORDER BY COUNT(*) DESC LIMIT 10000;
    j
    • 2
    • 2
  • t

    Tanmay Varun

    05/20/2023, 10:07 PM
    Hi team, one query - setting this function name in table config will work correctly with apache kafka’s partitioning logic assuming same number of partititions ? (assuming apache kafka default partitioner - MurmurHash(key) % numPartitions
    Copy code
    "segmentPartitionConfig": {
          "columnPartitionMap": {
              "merchantId": {
                "functionName": "Murmur",
                "numPartitions": 36
              }
          }
        },
    j
    • 2
    • 2
  • a

    Ayush Chauhan (Tech)

    05/21/2023, 6:09 AM
    Can we please add support for PreparedStatement in the go client as we have for the Java client?
    j
    k
    • 3
    • 4
  • a

    Abhijeet Kushe

    05/21/2023, 1:35 PM
    We are using Realtime table with kinesis.Yesterday we increased our shards from 2 to 4 but we are not seeing 4 ocnsumers but only 2 at a time ..This is our configuration
  • a

    Abhijeet Kushe

    05/21/2023, 1:35 PM
    Copy code
    {
      "tableName": "workflowEvents",
      "tableType": "REALTIME",
      "segmentsConfig": {
        "timeColumnName": "eventTimestamp",
        "timeType": "MILLISECONDS",
        "schemaName": "workflowEvents",
        "replicasPerPartition": "4",
        "retentionTimeUnit": "DAYS",
        "retentionTimeValue": "1826",
        "segmentPushType": "APPEND"
      },
      "tenants": {
        "broker": "DefaultTenant",
        "server": "DefaultTenant"
      },
      "tableIndexConfig": {
        "loadMode": "MMAP",
        "streamConfigs": {
          "streamType": "kinesis",
          "stream.kinesis.topic.name": "prod-rel-cdp-dl-workflow-metrics-stream",
          "region": "us-east-1",
          "shardIteratorType": "LATEST",
          "stream.kinesis.consumer.type": "lowlevel",
          "stream.kinesis.fetch.timeout.millis": "30000",
          "stream.kinesis.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
          "stream.kinesis.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kinesis.KinesisConsumerFactory",
          "realtime.segment.flush.threshold.size": "1000000",
          "realtime.segment.flush.threshold.time": "1h"
        }
      },
      "upsertConfig": {
        "mode": "FULL"
      },
      "routing": {
        "instanceSelectorType": "strictReplicaGroup"
      },
      "metadata": {
        "customConfigs": {}
      }
    }
  • a

    Abhijeet Kushe

    05/21/2023, 1:36 PM
    We also made a server properties change . Default is 1 https://docs.pinot.apache.org/configuration-reference/server .Is that related ?
    Copy code
    pinot.server.instance.max.parallel.refresh.threads=3
    m
    k
    • 3
    • 13
  • s

    Sid

    05/22/2023, 10:20 AM
    Hi Team, I have been trying to implement this groovy function to transform the timestamp column in events: but somehow its not getting saved in table config - throws error - "transformConfigs": [ { "columnName": "event_time", "transformFunction": "groovy('{\"returnType\":\"TIMESTAMP\", \"isSingleValue\":true}','def truncated = event_timestamp.substring(0, event_timestamp.lastIndexOf('.') + 3);return FromDateTime(truncated, 'yyyy-MM-dd''T''HHmmss.SSS')', event_timestamp)" } ] i updated groovy settings in broker, controller, server - restarted all of them. Yet the error shows - Groovy Transform function has been disabled. would appreciated if any insight could be shared.
    m
    e
    • 3
    • 4
  • t

    Tanmay Varun

    05/22/2023, 9:58 PM
    Hi team, i switched to a new kafka cluster midway, now my pinot servers are not able to read since they are looking at a higher offset, how to reset them
    m
    • 2
    • 5
  • s

    Sonit Rathi

    05/23/2023, 4:02 AM
    please help. added new columns. pause consumption. reloaded segments. resumed consumption. but segments are not getting created. getting below error
    m
    s
    • 3
    • 27
  • e

    Ehsan Irshad

    05/23/2023, 7:07 AM
    Hi Team. I am trying to understand the sortedIndex it seems it can add a lot of value. But didnt manage to get any performance benefits I have a few questions. 1. Does it work for realtime table? For both consuming and committed segments? Will it be created for all the segments when I reload the table after modifying the config? 2. Does it only work for offline tables? 3. Will it work automatically for online to offline flow? Or do we need to sort the data first? 4. I want to sort the data based on city col in my data which is a realtime table but can be converted to hybrid.
    m
    • 2
    • 27
  • j

    Jatin

    05/23/2023, 9:56 AM
    Hi Team I have column update_date which contains 'null' , now i want to get day for it using day(FromDateTime(updated_date, 'yyyy-MM-dd')) but is showing error --> [ { "message": "QueryExecutionError:\nProcessingException(errorCode:450, messageInternalError\njava.lang.NullPointerException)\n\tat org.apache.pinot.common.response.ProcessingException.deepCopy(ProcessingException.java:146)\n\tat org.apache.pinot.common.exception.QueryException.getException(QueryException.java:172)\n\tat org.apache.pinot.common.exception.QueryException.getException(QueryException.java:167)", "errorCode": 200 } ]
    m
    • 2
    • 6
1...808182...166Latest