https://pinot.apache.org/ logo
Join Slack
Powered by
# troubleshooting
  • p

    Pratik Bhadane

    01/04/2023, 6:53 AM
    Hello Team, Could you please answer the following questions: 1. How can we enable full support for SQL joins and subqueries in apache Pinot. Currently, we are on Pinot 0.12.0 and cannot execute subqueries/joins. 2. Does using Presto allow us SQL joins and subqueries and nested queries support for Realtime and Offline tables of Pinot?
    m
    • 2
    • 2
  • m

    Mostafa Ghadimi

    01/04/2023, 12:45 PM
    We are experiencing two issues on the Apache Pinot simultaneously: Set the segment max size to 500M as the following:
    Copy code
    "realtime.segment.flush.threshold.rows": "0",
    "realtime.segment.flush.threshold.time": "1h",
    "realtime.segment.flush.segment.size": "500M"
    Issue 1: The segments' sizes after changing the state from CONSUMING to COMPLETED is about 200M and not 500 (the segment creation duration is less than 1 hour for sure) Issue 2: The segments are stored at
    /var/pinot/server/data/index
    and not at
    /var/pinot/server/data/segment
    . Here is the map of volumes in docker-compose file:
    Copy code
    - ./data/server_data/segment:/var/pinot/server/data/segment
    - ./data/server_data/index:/var/pinot/server/data/index
    m
    s
    s
    • 4
    • 15
  • h

    Huaqiang He

    01/04/2023, 12:46 PM
    Hi team, I get an execution error when running a query that uses the function
    lastwithtime
    Copy code
    select str11 as job_id, 
    lastwithtime(str14,event_timestamp,'STRING') as query
    from telemetry_events 
    where epoch_minute between toEpochMinutes(now()-60000*24*7) and toEpochMinutes(now())  
    group by str11
    limit 10000
    
    execute query error: QueryExecutionError: java.lang.RuntimeException: Caught exception while building data table. at org.apache.pinot.core.operator.blocks.InstanceResponseBlock.<init>(InstanceResponseBlock.java:46) at org.apache.pinot.core.operator.InstanceResponseOperator.getNextBlock(InstanceResponseOperator.java:118) at org.apache.pinot.core.operator.InstanceResponseOperator.getNextBlock(InstanceResponseOperator.java:39) at org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:39) ... Caused by: java.nio.BufferOverflowException at java.base/java.nio.HeapByteBuffer.put(HeapByteBuffer.java:221) at java.base/java.nio.ByteBuffer.put(ByteBuffer.java:914) at org.apache.pinot.segment.local.customobject.StringLongPair.toBytes(StringLongPair.java:46) at org.apache.pinot.core.common.ObjectSerDeUtils$11.serialize(ObjectSerDeUtils.java:438)
    where str14 (query) is a SQL like string. It looks like a character encoding issue. I can work around it with
    Copy code
    decodeUrl(lastwithtime(encodeUrl(str14),event_timestamp,'STRING')) as query
    m
    • 2
    • 3
  • s

    Sevvy Yusuf

    01/04/2023, 1:59 PM
    Hi team, anyone here using spot instances for their Pinot infrastructure? I would be interested in hearing about how you manage issues around availability vs cost
    m
    s
    • 3
    • 2
  • c

    chandarasekaran m

    01/05/2023, 3:29 PM
    <!here> I am have cloned pinot from master and running in locally . I want to enable V2 query engine to explore latest features. Where i have to add(config file directory) to add below properties.
    Copy code
    "pinot.multistage.engine.enabled": "true",
    "pinot.server.instance.currentDataTableVersion": "4",
    "pinot.query.server.port": "8421",
    "pinot.query.runner.port": "8442"
    m
    n
    r
    • 4
    • 8
  • c

    chandarasekaran m

    01/05/2023, 3:30 PM
    @Neha Pawar
  • t

    Thomas Steinholz

    01/05/2023, 10:47 PM
    Hello Team, I have a pinot cluster with over 30 servers on it, but it seems like no matter what, the segments will always overload 2 or so servers and keep the rest far under utilized (close to 20% of the PVC) while bring down the cluster with the over-used servers attempting to use beyond 100% PVC usage. Are there recommendations to improve the balancing of these segments across servers? As well as the recovery process for pinot servers that have 100% PVC utilization. Untagging and rebalancing seems to only do so much and takes a very very very long time to make any progress.
    k
    m
    • 3
    • 23
  • s

    Shreeram Goyal

    01/06/2023, 10:44 AM
    Hi, We have been trying to do batch ingestion via spark using parquet. file format We found that the time columns are converted to UTC time and not as per the actual timezone. Is there any workaround for this ?
    m
    s
    • 3
    • 11
  • m

    Mostafa Ghadimi

    01/07/2023, 1:11 PM
    Problem Description: After testing the Pinot on our development nodes, it worked properly and we wanted to switch the production nodes, to the new Pinot we were working on. The legacy Pinot (on production nodes) already had access to Kafka nodes. After a short down time and deploying the version of Pinot on production, we faced to this error during table creation:
    Copy code
    org.apache.pinot.spi.stream.TransientConsumerException: org.apache.pinot.shaded.org.apache.kafka.common.errors.TimeoutException: Failed to get offsets by times in 5001ms"
    Would someone help us to fix this issue? What has been done: • The connection with Kafka nodes has been checked. • The legacy version of Pinot had the same table description for Kafka data ingestion! P.S.: We are using Ansible in order to deploy the Pinot which has been open-sourced on this link.
    s
    k
    • 3
    • 3
  • m

    Mostafa Ghadimi

    01/07/2023, 1:14 PM
    @channel
  • c

    Caleb Shei

    01/08/2023, 5:21 PM
    A bug? Try to do data batch import via Spark on our Hadoop secured cluster (i.e., kerberos enabled) and get the following error:
    Copy code
    23/01/08 16:52:57 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) (<http://ph-jp98v52.infra.adtechlabs.com|ph-jp98v52.infra.adtechlabs.com> executor 2): java.lang.RuntimeException: java.lang.RuntimeException: Failed to authenticate user principal [i-cshei@INFRA.ADTECHLABS.COM] with keytab [/home/i-cshei/.keytab]
            at org.apache.pinot.spi.filesystem.PinotFSFactory.register(PinotFSFactory.java:77)
            at org.apache.pinot.plugin.ingestion.batch.spark3.SparkSegmentGenerationJobRunner$1.call(SparkSegmentGenerationJobRunner.java:349)
            at org.apache.pinot.plugin.ingestion.batch.spark3.SparkSegmentGenerationJobRunner$1.call(SparkSegmentGenerationJobRunner.java:342)
            at org.apache.spark.api.java.JavaRDDLike.$anonfun$foreach$1(JavaRDDLike.scala:352)
            at org.apache.spark.api.java.JavaRDDLike.$anonfun$foreach$1$adapted(JavaRDDLike.scala:352)
            at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:575)
            at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:573)
            at org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
            at org.apache.spark.rdd.RDD.$anonfun$foreach$2(RDD.scala:1003)
            at org.apache.spark.rdd.RDD.$anonfun$foreach$2$adapted(RDD.scala:1003)
            at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2268)
            at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
            at org.apache.spark.scheduler.Task.run(Task.scala:136)
            at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
            at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
            at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
            at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
            at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
            at java.base/java.lang.Thread.run(Thread.java:829)
    Caused by: java.lang.RuntimeException: Failed to authenticate user principal [i-cshei@INFRA.ADTECHLABS.COM] with keytab [/home/i-cshei/.keytab]
            at org.apache.pinot.plugin.filesystem.HadoopPinotFS.authenticate(HadoopPinotFS.java:288)
            at org.apache.pinot.plugin.filesystem.HadoopPinotFS.init(HadoopPinotFS.java:72)
            at com.valassis.plugin.filesystem.HadoopValassisFS.init(HadoopValassisFS.java:48)
            at org.apache.pinot.plugin.filesystem.HadoopPinotFS.init(HadoopPinotFS.java:65)
            ... 18 more
    m
    x
    • 3
    • 9
  • r

    Raluca Lazar

    01/09/2023, 6:20 PM
    Hi all, I have a need to scale down the number of replicas for one of our servers from 6 to 3 and I'm having a hard time with the segment rebalancing command after scaling down. Here is some info on the environment: • 3 kafka partitions ingesting into this table / Pinot replication factor is 2 (
    "replicasPerPartition": "2"
    ) • this is the REALTIME portion of a hybrid table and we have a realtime-to-offline job setup to run every day I've read through this and followed all the steps and I still end up in a situation where Pinot thinks my segments are distributed across 6 server instances. The CURL command to rebalance returns this message:
    "description": "Instance reassigned, table is already balanced"
    and the
    segmentAssignment
    section shows segments distributed on 6 servers. Am I missing anything?
    s
    • 2
    • 35
  • t

    Thomas Steinholz

    01/09/2023, 6:52 PM
    Hello all, I have some pinot tables that I have untagged that hit 100% PVC utilization and are not dropping. I have already rebalanced and moved as many tables as I could off of those two 100% servers but I think they are still being used for queries by the broker. What is the best way to recover these servers that have maxed out volumes? Separately, I needed to add a new column to some of the bigger tables and have reloaded all the segments - yet no matter what these segments seem to never generate the new column. I just see large negative values for the LONG column and the tables state something like:
    There are <many thousands of> invalid segment/s. This usually means that they were created with an older schema. Please reload the table in order to refresh these segments to the new schema.
    The smaller tables I have seemed to have generated valid values for all the segments, but these bigger tables can’t seem to reload any more segments. I assume this is related to the two filled up servers not being able to reload the segments but also not moving the segments to other servers (that are mostly under 50% in utilization).
    s
    n
    • 3
    • 15
  • v

    Vincent Vu

    01/09/2023, 10:46 PM
    Hello all, I’m trying to ingest Kerberos Kafka data into Pinot real-time but I’m having a lot of issues. Can anyone who has some experience that can help or point me to some kind of guide to follow?
    m
    p
    +2
    • 5
    • 27
  • p

    Pyry Kovanen

    01/10/2023, 10:34 AM
    TLS/SSL settings and Helm Chart Hi all, I've followed the Configuring TLS/SSL #TLS only guide with the Helm Chart. I have couple of questions regarding this: • When i.e.
    pinot-server
    starts it prints a line:
    Starting server admin application on: <http://0.0.0.0:8097>, <https://0.0.0.0:7443>
    . Why is that, given that the settings are explicitly disabling the
    http
    ? ◦ pinot.server.adminapi.access.protocols=https ◦ pinot.server.adminapi.access.protocols.https.port=7443 ◦ pinot.server.netty.enabled=false ◦ pinot.server.nettytls.enabled=true ◦ pinot.server.nettytls.port=8098 ◦ This happens with other components as well. • Helm chart
    values.yaml
    does not support setting TLS/SSL related ports on kubernetes services, it's hardwired to the default non-secure ports, like for
    pinot-controller
    the port is
    9000
    instead of
    9443
    used in the settings. To change these I must either delete the unnecessary services or use
    kubectl patch
    to change the ports right after
    helm install
    , as a quick workaround. Is there something I missed here? • There is no built in way to secure Zookeeper traffic with the chart, it seems. Is this due to recommendation to use Zookeeper Operator instead? In general: • Is Helm chart suitable for production usage? Possibly if I replace the Zookeeper with Zookeeper Operator managed installation? Pinot version that I'm using is
    apachepinot/pinot:0.11.0-SNAPSHOT-a6f5e89-20221207
    Thanks already in advance!
    m
    a
    • 3
    • 3
  • e

    Elon

    01/11/2023, 1:07 AM
    Hi, we wanted to know if there are any issues with enabling groovy: there is a config
    controller.disable.ingestion.groovy
    which defaults to false. Some users here want to use groovy transforms and we wanted to know if there are any risks or recommendations (i.e. do not use groovy, transform via flink or builtin function where applicable). Thanks!
  • r

    Rohit Anilkumar

    01/11/2023, 9:29 AM
    [SOLVED] Hey I am trying to push metrics from pinot using JMX. exported the following to ALL_JAVA_OPTS
    -javaagent:/home/ec2-user/pinot/jmx/jmx_prometheus_javaagent-0.17.2.jar=8008:/home/ec2-user/pinot/jmx/pinot.yml
    but when I check
    public_IP:8008/metrics
    , its giving me a refused to connect. I tried
    curl localhost:8008/metrics
    from the EC2 node and its giving me
    curl: (7) Failed to connect to localhost port 8008 after 0 ms: Connection refused
    -> Does this mean nothing is being pushed to that port? Am i missing something here?
  • s

    Sachin Mittal Consultant

    01/12/2023, 5:17 AM
    Hello folks I am facing a particular problem reading from Kinesis stream and it has to do we de-aggregation The records which I get from
    KinesisConsumer
    are aggregated records which were published by some other KPL. So the consumer needs to do de-aggregation in order to process them further. Refer: https://docs.aws.amazon.com/streams/latest/dev/kinesis-kpl-consumer-deaggregation.html Now I don't see this happening in pinot's
    KinesisConsumer
    and looks like we are using awssdk v2 for kinesis client, and I am not sure how we can de-aggregate them Any thoughts ?
    n
    • 2
    • 7
  • e

    Ehsan Irshad

    01/12/2023, 11:21 AM
    Hi Team we are trying to add users, using the swagger api. We can see they are updated in Zookeeper property stores but we cannot use them to authenticate. Are we doing something wrong here? We dont want to maintain the users in the controller config etc. Referring to this issue: https://github.com/apache/pinot/pull/8314
    m
    a
    • 3
    • 3
  • p

    Pratik Bhadane

    01/13/2023, 11:54 AM
    Hello Team, Pinot version: 0.12 I have 3-time columns in MILLISECONDS epoch out of which one column (Updated_Date) is having null values. When I am ingesting data with the following schema's record with Updated_Date null values are not getting inserted in the Pinot Realtime table. I am only able to ingest Updated_Date null value records with DimensionField of type STRING. Is this expected behavior? I have tried adding "allowNullTimeValue": true in the table config but this field is ignored at the time of adding tableConfigFile and not shown in Pinot Web UI TABLE CONFIG. So does it mean record with NULL values of other than STRING type wll not get inserted 1. =================== { "schemaName": "GoNoGoCustomerApplication_v3_2", "dimensionFieldSpecs": [ { "name": "Application_ID", "dataType": "STRING" }, { "name": "Application_Stage", "dataType": "STRING" }, { "name": "op", "dataType": "STRING" }, { "name": "ord", "dataType": "STRING" } ], "metricFieldSpecs": [ { "name": "Requested_Amount", "dataType": "DOUBLE" } ], "dateTimeFieldSpecs": [ { "name": "ts_ms", "dataType": "LONG", "format" : "1MILLISECONDSEPOCH", "granularity": "1:MILLISECONDS" }, { "name": "Creation_Date", "dataType": "LONG", "format" : "1MILLISECONDSEPOCH", "granularity": "1:MILLISECONDS" }, { "name": "Updated_Date", "dataType": "LONG", "format" : "1MILLISECONDSEPOCH", "granularity": "1:MILLISECONDS" }, { "name": "source_ts_ms", "dataType": "LONG", "format" : "1MILLISECONDSEPOCH", "granularity": "1:MILLISECONDS" } ] } 2. =================== { "schemaName": "GoNoGoCustomerApplication_v3_5", "dimensionFieldSpecs": [ { "name": "Application_ID", "dataType": "STRING" }, { "name": "Application_Stage", "dataType": "STRING" }, { "name": "op", "dataType": "STRING" }, { "name": "ord", "dataType": "STRING" }, { "name": "Updated_Date", "dataType": "LONG" }, { "name": "source_ts_ms", "dataType": "LONG" }, { "name": "Creation_Date", "dataType": "LONG" } ], "metricFieldSpecs": [ { "name": "Requested_Amount", "dataType": "DOUBLE" } ], "dateTimeFieldSpecs": [ { "name": "ts_ms", "dataType": "LONG", "format" : "1MILLISECONDSEPOCH", "granularity": "1:MILLISECONDS" } ] } 3. =================== { "schemaName": "GoNoGoCustomerApplication_v3_3", "dimensionFieldSpecs": [ { "name": "Application_ID", "dataType": "STRING" }, { "name": "Application_Stage", "dataType": "STRING" }, { "name": "op", "dataType": "STRING" }, { "name": "ord", "dataType": "STRING" } ], "metricFieldSpecs": [ { "name": "Requested_Amount", "dataType": "DOUBLE" } ], "dateTimeFieldSpecs": [ { "name": "ts_ms", "dataType": "LONG", "format" : "1MILLISECONDSEPOCH", "granularity": "1:MILLISECONDS" }, { "name": "Creation_Date", "dataType": "LONG", "format" : "1MILLISECONDSEPOCH", "granularity": "1:MILLISECONDS" }, { "name": "Updated_Date", "dataType": "LONG", "format" : "1MILLISECONDSEPOCH", "granularity": "1:MILLISECONDS" }, { "name": "source_ts_ms", "dataType": "LONG", "format" : "1MILLISECONDSEPOCH", "granularity": "1:MILLISECONDS" } ] } 4. =================== { "schemaName": "GoNoGoCustomerApplication_v3_4", "dimensionFieldSpecs": [ { "name": "Application_ID", "dataType": "STRING" }, { "name": "Application_Stage", "dataType": "STRING" }, { "name": "op", "dataType": "STRING" }, { "name": "ord", "dataType": "STRING" }, { "name": "Updated_Date", "dataType": "TIMESTAMP" }, { "name": "source_ts_ms", "dataType": "TIMESTAMP" }, { "name": "Creation_Date", "dataType": "TIMESTAMP" } ], "metricFieldSpecs": [ { "name": "Requested_Amount", "dataType": "DOUBLE" } ], "dateTimeFieldSpecs": [ { "name": "ts_ms", "dataType": "LONG", "format" : "1MILLISECONDSEPOCH", "granularity": "1:MILLISECONDS" } ] } cat table-config.json { _"tableName":"GoNoGoCustomerApplication_v3_5",_ "tableType":"REALTIME", "segmentsConfig":{ _"timeColumnName":"ts_ms",_ "timeType":"MILLISECONDS", "allowNullTimeValue": true, "allowNullTimeValue": "true", "retentionTimeUnit":"DAYS", "retentionTimeValue":"7000", "segmentPushType":"APPEND", "segmentAssignmentStrategy":"BalanceNumSegmentAssignmentStrategy", _"schemaName":"GoNoGoCustomerApplication_v3_5",_ "replicasPerPartition":"1" }, "tenants":{ }, "tableIndexConfig":{ "loadMode":"MMAP", "nullHandlingEnabled": false, "streamConfigs":{ "streamType": "kafka", "stream.kafka.consumer.type": "lowlevel", "stream.kafka.topic.name": "CustomerApplication", "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder", "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory", "stream.kafka.broker.list": "<IP>:9092", "realtime.segment.flush.threshold.time": "3600000", "realtime.segment.flush.threshold.rows": "50000", "stream.kafka.consumer.prop.auto.offset.reset": "smallest" } }, "ingestionConfig":{ "transformConfigs":[ { _"columnName":"Application_ID",_ _"transformFunction":"JSONPATHSTRING(after, '$.id')" }, { _"columnName":"Requested_Amount",_ "transformFunction":"JSONPATHSTRING(after, '$.applicationRequest.application.loanAmount')" }, { _"columnName":"Application_Stage",_ "transformFunction":"JSONPATHSTRING(after, '$.applicationRequest.currentStageId')" }, { _"columnName":"Creation_Date",_ "transformFunction":"JSONPATHSTRING(after, '$.dateTime.$date')" }, { _"columnName":"Updated_Date",_ "transformFunction":"JSONPATHSTRING(after, '$.updatedDate.$date')" }, { _"columnName":"source_ts_ms",_ _"transformFunction":"JSONPATHSTRING(source, '$.ts_ms')"_ }, { "columnName":"ord", "transformFunction":"JSONPATHSTRING(source, '$.ord')" } ] }, "metadata":{ "customConfigs":{ } } }
  • p

    Phil Sarkis

    01/13/2023, 7:38 PM
    On one of my machines when I try to run ./pinot-admin.sh QuickStart -type Stream it works out of the box and on another machine out of the box upon attempting to start Kafka, it waits for about 5 seconds and then gives me:
    n
    x
    • 3
    • 14
  • s

    Sidharth Sawhney

    01/13/2023, 11:14 PM
    Hi everyone running into a bug while uploading a batch csv file into pinot. The first screenshot is a picture of my csv file. The second picture is one of my table config. The third is my ingestion job spec. I have a csv file of my data on my docker container that I am trying to upload to a cluster I host on localhost. the command I used is below:
    pinot/bin/pinot-admin.sh LaunchDataIngestionJob -jobSpecFile ingestionJobEVSpec.yml
    My CSV file when uploaded shows up as 0s and nulls. the values of the csv are missing nor is there an error when I execute the above command. Does anyone know what could be the issue?
    m
    • 2
    • 14
  • s

    Shubham Kumar

    01/16/2023, 7:08 AM
    Hi team, we were consuming huge clickstream data(multiline json event data) in pinot. and since clickstream events has multiple schemas most of events were not getting parsed and was getting logged. Each line(with one key) is getting sent as new event in elastic search, Which is causing it to have even more ingestion rate than our biggest cluster. Is there a way to limit these kind of pinot server logs using any configuration? sample log shows whole multiline json:
    Copy code
    },
    "device_id" : "****WAD*A*D*AS",
    "os" : "Android",
    "session" : null,
    "advertising_id" : "null",
    "source" : "SyncTimer",
    "manufacturer" : "OnePlus",
    "event_ts_mins" : null,
    "app_name" : null,
    "event_ts" : [ 1673719133085 ],
    "event_date" : null,
    "event_ts_hr" : null,
    "event_name" : null,
    "event_ts_days" : null,
    "customer_id" : null,
    "device" : {
    "device_id" : "*****",
    "os" : "Android",
    "os_version" : "33",
    "model" : "CPH2411",
    "manufacturer" : "OnePlus"
    },
    "user" : {
    "customer_id" : "DFH*#U*$*#Q93(***8"
    },
    "events" : [ {
    "event_name" : "pulse_sdk_init",
    "timestamp" : 1673719133085
    } ],
    "timestamp" : null
    }
    }
    k
    • 2
    • 6
  • m

    mahmoud elhalwany

    01/16/2023, 10:56 AM
    Hello, is there is any way to subscribe to table or export table data from pinot to kafka ?
    m
    d
    • 3
    • 2
  • r

    Rohit Anilkumar

    01/16/2023, 11:23 AM
    Hey, quick question. I have the following in my monitoring yaml file for brokers
    Copy code
    - pattern: "\"org.apache.pinot.common.metrics\"<type=\"BrokerMetrics\", name=\"pinot.broker.(\\w+)\\.queries\"><>(\\w+)"
      name: "pinot_broker_queries_$2"
    But when i check prometheus, i dont see anything that starts with pinot_broker_queries. Is it not scraping the metrics correctly? But Im getting some of the other metrics (edited) I get the metrics with _exceptions, _nettyConnection, _healthChecks etc. Im using the config provided in the documentation https://docs.pinot.apache.org/operators/operating-pinot/monitoring broker
    Copy code
    rules:
    # Pinot Broker
    - pattern: "\"org.apache.pinot.common.metrics\"<type=\"BrokerMetrics\", name=\"pinot.broker.(\\w+).authorization\"><>(\\w+)"
      name: "pinot_broker_authorization_$2"
      labels:
        table: "$1"
    - pattern: "\"org.apache.pinot.common.metrics\"<type=\"BrokerMetrics\", name=\"pinot.broker.(\\w+)\\.documentsScanned\"><>(\\w+)"
      name: "pinot_broker_documentsScanned_$2"
      labels:
        table: "$1"
    - pattern: "\"org.apache.pinot.common.metrics\"<type=\"BrokerMetrics\", name=\"pinot.broker.(\\w+)\\.entriesScannedInFilter\"><>(\\w+)"
      name: "pinot_broker_entriesScannedInFilter_$2"
      labels:
        table: "$1"
    - pattern: "\"org.apache.pinot.common.metrics\"<type=\"BrokerMetrics\", name=\"pinot.broker.(\\w+)\\.entriesScannedPostFilter\"><>(\\w+)"
      name: "pinot_broker_entriesScannedPostFilter_$2"
      labels:
        table: "$1"
    - pattern: "\"org.apache.pinot.common.metrics\"<type=\"BrokerMetrics\", name=\"pinot.broker.(\\w+)\\.freshnessLagMs\"><>(\\w+)"
      name: "pinot_broker_freshnessLagMs_$2"
      labels:
        table: "$1"
    - pattern: "\"org.apache.pinot.common.metrics\"<type=\"BrokerMetrics\", name=\"pinot.broker.(\\w+)\\.queries\"><>(\\w+)"
      name: "pinot_broker_queries_$2"
      labels:
        table: "$1"
    - pattern: "\"org.apache.pinot.common.metrics\"<type=\"BrokerMetrics\", name=\"pinot.broker.(\\w+)\\.queryExecution\"><>(\\w+)"
      name: "pinot_broker_queryExecution_$2"
      labels:
        table: "$1"
    - pattern: "\"org.apache.pinot.common.metrics\"<type=\"BrokerMetrics\", name=\"pinot.broker.(\\w+)\\.queryRouting\"><>(\\w+)"
      name: "pinot_broker_queryRouting_$2"
      labels:
        table: "$1"
    - pattern: "\"org.apache.pinot.common.metrics\"<type=\"BrokerMetrics\", name=\"pinot.broker.(\\w+)\\.reduce\"><>(\\w+)"
      name: "pinot_broker_reduce_$2"
      labels:
        table: "$1"
    - pattern: "\"org.apache.pinot.common.metrics\"<type=\"BrokerMetrics\", name=\"pinot.broker.(\\w+)\\.requestCompilation\"><>(\\w+)"
      name: "pinot_broker_requestCompilation_$2"
      labels:
        table: "$1"
    - pattern: "\"org.apache.pinot.common.metrics\"<type=\"BrokerMetrics\", name=\"pinot.broker.(\\w+)\\.scatterGather\"><>(\\w+)"
      name: "pinot_broker_scatterGather_$2"
      labels:
        table: "$1"
    - pattern: "\"org.apache.pinot.common.metrics\"<type=\"BrokerMetrics\", name=\"pinot.broker.(\\w+)\\.totalServerResponseSize\"><>(\\w+)"
      name: "pinot_broker_totalServerResponseSize_$2"
      labels:
        table: "$1"
    - pattern: "\"org.apache.pinot.common.metrics\"<type=\"BrokerMetrics\", name=\"pinot.broker.(\\w+)_(\\w+).groupBySize\"><>(\\w+)"
      name: "pinot_broker_groupBySize_$3"
      labels:
        table: "$1"
        tableType: "$2"
    - pattern: "\"org.apache.pinot.common.metrics\"<type=\"BrokerMetrics\", name=\"pinot.broker.(\\w+)_(\\w+).noServingHostForSegment\"><>(\\w+)"
      name: "pinot_broker_noServingHostForSegment_$3"
      labels:
        table: "$1"
        tableType: "$2"
    - pattern: "\"org.apache.pinot.common.metrics\"<type=\"BrokerMetrics\", name=\"pinot.broker.healthcheck(\\w+)\"><>(\\w+)"
      name: "pinot_broker_healthcheck_$1_$2"
    - pattern: "\"org.apache.pinot.common.metrics\"<type=\"BrokerMetrics\", name=\"pinot.broker.helix.(\\w+)\"><>(\\w+)"
      name: "pinot_broker_helix_$1_$2"
    - pattern: "\"org.apache.pinot.common.metrics\"<type=\"BrokerMetrics\", name=\"pinot.broker.helixZookeeper(\\w+)\"><>(\\w+)"
      name: "pinot_broker_helix_zookeeper_$1_$2"
    - pattern: "\"org.apache.pinot.common.metrics\"<type=\"BrokerMetrics\", name=\"pinot.broker.nettyConnection(\\w+)\"><>(\\w+)"
      name: "pinot_broker_nettyConnection_$1_$2"
    - pattern: "\"org.apache.pinot.common.metrics\"<type=\"BrokerMetrics\", name=\"pinot.broker.clusterChangeCheck\"\"><>(\\w+)"
      name: "pinot_broker_clusterChangeCheck_$1"
    - pattern: "\"org.apache.pinot.common.metrics\"<type=\"BrokerMetrics\", name=\"pinot.broker.proactiveClusterChangeCheck\"><>(\\w+)"
      name: "pinot_broker_proactiveClusterChangeCheck_$1"
    - pattern: "\"org.apache.pinot.common.metrics\"<type=\"BrokerMetrics\", name=\"pinot.broker.(\\w+)Exceptions\"><>(\\w+)"
      name: "pinot_broker_exceptions_$1_$2"
    - pattern: "\"org.apache.pinot.common.metrics\"<type=\"BrokerMetrics\", name=\"pinot.broker.routingTableUpdateTime\"><>(\\w+)"
      name: "pinot_broker_routingTableUpdateTime_$1"
    The query options I see on prometheus. Some of the metrics arent getting scraped I guess?
    Copy code
    pinot_broker_exceptions_requestCompilation_Count
    pinot_broker_exceptions_requestCompilation_FifteenMinuteRate
    pinot_broker_exceptions_requestCompilation_FiveMinuteRate
    pinot_broker_exceptions_requestCompilation_MeanRate
    pinot_broker_exceptions_requestCompilation_OneMinuteRate
    pinot_broker_exceptions_resourceMissing_Count
    pinot_broker_exceptions_resourceMissing_FifteenMinuteRate
    pinot_broker_exceptions_resourceMissing_FiveMinuteRate
    pinot_broker_exceptions_resourceMissing_MeanRate
    pinot_broker_exceptions_resourceMissing_OneMinuteRate
    pinot_broker_exceptions_uncaughtGet_Count
    pinot_broker_exceptions_uncaughtGet_FifteenMinuteRate
    pinot_broker_exceptions_uncaughtGet_FiveMinuteRate
    pinot_broker_exceptions_uncaughtGet_MeanRate
    pinot_broker_exceptions_uncaughtGet_OneMinuteRate
    pinot_broker_exceptions_uncaughtPost_Count
    pinot_broker_exceptions_uncaughtPost_FifteenMinuteRate
    pinot_broker_exceptions_uncaughtPost_FiveMinuteRate
    pinot_broker_exceptions_uncaughtPost_MeanRate
    pinot_broker_exceptions_uncaughtPost_OneMinuteRate
    pinot_broker_healthcheck_BadCalls_Count
    pinot_broker_healthcheck_BadCalls_FifteenMinuteRate
    pinot_broker_healthcheck_BadCalls_FiveMinuteRate
    pinot_broker_healthcheck_BadCalls_MeanRate
    pinot_broker_healthcheck_BadCalls_OneMinuteRate
    pinot_broker_healthcheck_OkCalls_Count
    pinot_broker_healthcheck_OkCalls_FifteenMinuteRate
    pinot_broker_healthcheck_OkCalls_FiveMinuteRate
    pinot_broker_healthcheck_OkCalls_MeanRate
    pinot_broker_healthcheck_OkCalls_OneMinuteRate
    pinot_broker_helix_connected_Value
    pinot_broker_helix_ookeeperReconnects_Count
    pinot_broker_helix_ookeeperReconnects_FifteenMinuteRate
    pinot_broker_helix_ookeeperReconnects_FiveMinuteRate
    pinot_broker_helix_ookeeperReconnects_MeanRate
    pinot_broker_helix_ookeeperReconnects_OneMinuteRate
    pinot_broker_nettyConnection_BytesReceived_Count
    pinot_broker_nettyConnection_BytesReceived_FifteenMinuteRate
    pinot_broker_nettyConnection_BytesReceived_FiveMinuteRate
    pinot_broker_nettyConnection_BytesReceived_MeanRate
    pinot_broker_nettyConnection_BytesReceived_OneMinuteRate
    pinot_broker_nettyConnection_BytesSent_Count
    pinot_broker_nettyConnection_BytesSent_FifteenMinuteRate
    pinot_broker_nettyConnection_BytesSent_FiveMinuteRate
    pinot_broker_nettyConnection_BytesSent_MeanRate
    pinot_broker_nettyConnection_BytesSent_OneMinuteRate
    pinot_broker_nettyConnection_ConnectTimeMs_Value
    pinot_broker_nettyConnection_RequestsSent_Count
    pinot_broker_nettyConnection_RequestsSent_FifteenMinuteRate
    pinot_broker_nettyConnection_RequestsSent_FiveMinuteRate
    pinot_broker_nettyConnection_RequestsSent_MeanRate
    pinot_broker_nettyConnection_RequestsSent_OneMinuteRate
    pinot_broker_proactiveClusterChangeCheck_Count
    pinot_broker_proactiveClusterChangeCheck_FifteenMinuteRate
    pinot_broker_proactiveClusterChangeCheck_FiveMinuteRate
    pinot_broker_proactiveClusterChangeCheck_MeanRate
    pinot_broker_proactiveClusterChangeCheck_OneMinuteRate
    I dont see the missing metrics in the jmx port either. Im seeing the same issue in the controller as well.
    m
    t
    • 3
    • 5
  • p

    Pratik Bhadane

    01/16/2023, 1:21 PM
    Hello Team, How can we get count() output as "0" zero In Pinot? For now I am getting output as "No Record(s) found"
    m
    j
    • 3
    • 4
  • p

    Prashanth Rao

    01/17/2023, 7:45 AM
    Hi, Greetings everyone, I did a restart on Pinot Server/Controller and found the tables have vanished along with the data . I had specified -dataDir and I see metadata in that folder. What needs to be done to pull back the data that is seemingly gone now ?
    m
    • 2
    • 17
  • e

    Ehsan Irshad

    01/17/2023, 10:15 AM
    Hi Team. We have setup the Pinot Deep Store in S3, and we also tried to decoupled controller from the data path But some of the segments in the realtime table are still in ERROR state with error below
    Copy code
    [TABLENAME_REALTIME-RealtimeTableDataManager] [HelixTaskExecutor-message_handle_thread_10] Download and move segment TABLENAME__1__169__20230104T0711Z from peer with scheme http failed.
    Do server and controllers need to point to same URI , our config Controller: controller.data.dir=s3://prd-pinot-archive/prd-mimic-pinot/controller-data Server:
    pinot.server.instance.segment.store.uri=<s3://prd-pinot-archive/prd-mimic-pinot/server-data>
    l
    s
    m
    • 4
    • 4
  • e

    eywek

    01/17/2023, 10:22 AM
    Hello, I’m trying to use the Apache Pulsar stream ingestion plugin but when creating a REALTIME table with the following config:
    Copy code
    "streamConfigs": {
            "streamType": "pulsar",
            "topic.consumption.rate.limit": "1500",
            "stream.pulsar.fetch.timeout.millis": "30000",
            "stream.pulsar.consumer.type": "lowlevel",
            "stream.pulsar.topic.name": "<persistent://public/default/worker_datasource_60e5a1ab40480001009289b7_6258e7b21993b1000737be40_28>",
            "stream.pulsar.decoder.class.name": "org.apache.pinot.plugin.inputformat.json.JSONMessageDecoder",
            "stream.pulsar.consumer.factory.class.name": "org.apache.pinot.plugin.stream.pulsar.PulsarConsumerFactory",
            "stream.pulsar.bootstrap.servers": "<pulsar://pulsar.production.internal.reelevant.io>.:6650",
            "stream.pulsar.consumer.prop.auto.offset.reset": "smallest",
            "realtime.segment.flush.threshold.rows": "0",
            "realtime.segment.flush.threshold.segment.size": "200M"
          }
    I’m having a
    UNHEALTHY
    ingestion status for the table:
    Copy code
    {
      "ingestionStatus": {
        "ingestionState": "UNHEALTHY",
        "errorMessage": "Did not get any response from servers for segment: worker_datasource_60e5a1ab40480001009289b7_6258e7b21993b1000737be40_28__0__1__20230117T1012Z"
      }
    }
    And if I use the
    /consumingSegmentsInfo
    I’m having the following result:
    Copy code
    {
      "_segmentToConsumingInfoMap": {
        "worker_datasource_60e5a1ab40480001009289b7_6258e7b21993b1000737be40_28__0__1__20230117T1012Z": []
      }
    }
    From what I see, the plugin is consuming data from Pulsar and I’m able to query it but I was wondering why I get an
    UNHEALTHY
    ingestion status since I was using it for monitoring purposes. Thank you
    m
    n
    r
    • 4
    • 18
  • a

    abhinav wagle

    01/17/2023, 11:17 PM
    Hello, I currently have a Tenant in Pinot which has a REAL TIME table. Which API's should I call and in what order to add a new OFFLINE table to same tenant. When I created a new OFFline Table, it does show up in the expected Tenant, but my ingestion job fails to find it. Which API should I invoke to add "test_OFFLINE to the following Server instance config . Since even after table creation the server is not updated with
    test_OFFLINE
    .
    Copy code
    "listFields": {
        "TAG_LIST": [
          "test_REALTIME"
        ]
      }
    s
    • 2
    • 26
1...686970...166Latest