https://pinot.apache.org/ logo
Join Slack
Powered by
# troubleshooting
  • a

    Anish Nair

    11/09/2021, 7:37 AM
    Hey team, i have few questions, can someone help ? 1) Queries are not returning results most of the time. Upon checking broker logs found the following :
    Copy code
    Failed to find servers hosting segment: mytable_0_8_20211029T2056Z for table: mytable_REALTIME (all ONLINE/CONSUMING instances: [] are disabled, but find enabled OFFLINE instance: Server_ip_8098 from OFFLINE instances: [Server_ip_8098], not counting the segment as unavailable)
    is this query timeout case? 2) I have set, flush.threshold.size to 10mn. But segments are getting created with lesser rows ( Total docs: 3.4mn). Is this expected? 3) What type of index is recommended on Realtime table with upsert mode on ? 4) In upsert mode, any limitation on "comparison time column" , i.e timestamp format, granularity? my table date column is in yyyyMMddHH format. comparison time column will be in timestamp format yyyy-MM-dd HHmmss
    Copy code
    {
      "upsertConfig": {
        "mode": "FULL",
        "comparisonColumn": "anotherTimeColumn"
      }
    }
    5) Queries are timing out at 10secs, even after changing the values at broker and server level. anyother configs needs to be changed ? pinot.broker.timeoutMs pinot.server.query.executor.timeout
    m
    • 2
    • 8
  • a

    Ali Atıl

    11/09/2021, 8:04 AM
    Hello everyone, I am using version 0.7.1. I am trying to create a hybrid table. Do i have to put controller.task.frequecyInSeconds in my controller config file? it says it is deprecated in configuration reference.
    x
    • 2
    • 3
  • d

    Dan DC

    11/09/2021, 2:17 PM
    Hey team, I've got an Avro schema which contains an array of records in a child field. I want to convert this to JSON during ingestion so I've added a transformation for this column to my realtime table. I've specified
    $
    as my complex type delimiter because I've got some Groovy transformations that I need to apply to other columns and is the only delimiter I can use to make my field names compatible with Groovy identifiers. My config looks like:
    Copy code
    ...
    "complexTypeConfig": {
      "delimiter": "$",
      ...
    },
    "transformConfigs": [
      ...
      "columnName": "some_field",
      "transformFunction": "json_format(parent_field$some_field)"
      ...
    ],
    ...
    k
    • 2
    • 6
  • v

    Vibhor Jain

    11/09/2021, 4:22 PM
    Hi Team, We have a hybrid table for our analytics use case and were using UPSERT for REALTIME table. It was working perfectly fine in 0.8. When minion was moving data to OFFLINE, we were using mergeType: "dedup" and duplicates were getting eliminated in OFFLINE flow also. When we upgraded to 0.9, the UPSERT is no more supported for hybrid table. This validator is blocking our table deployment. We understand UPSERT cannot work for OFFLINE table but why is it blocked for hybrid tables? Can someone clarify if we are missing something here?
    m
    r
    +3
    • 6
    • 41
  • l

    Luis Fernandez

    11/09/2021, 5:16 PM
    How do I know if a segment is too big ?
    m
    • 2
    • 2
  • l

    Luis Fernandez

    11/09/2021, 6:08 PM
    in the logs i’m observing
    Copy code
    2021-11-09 12:53:00	
    Slow query: request handler processing time: 441, send response latency: 1, total time to handle request: 442
    2021-11-09 12:53:00	
    Processed requestId=1975257,table=etsyads_metrics_REALTIME,segments(queried/processed/matched/consuming)=46/46/46/1,schedulerWaitMs=0,reqDeserMs=0,totalExecMs=441,resSerMs=0,totalTimeMs=441,minConsumingFreshnessMs=1636480380211,broker=Broker_pinot-broker-1.pinot-broker-headless.pinot.svc.cluster.local_8099,numDocsScanned=20584,scanInFilter=0,scanPostFilter=123504,sched=fcfs,threadCpuTimeNs=0
    i was able to then find the request id in the broker and got some more info:
    Copy code
    requestId=1976569,table=ads_metrics_REALTIME,timeMs=234,docs=17731/290711208,entries=0/106386,segments(queried/processed/matched/consuming/unavailable):46/46/46/1/0,consumingFreshnessTimeMs=1636480906334,servers=1/1,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);pinot-server-1_R=0,233,7479,0,-1,offlineThreadCpuTimeNs=0,realtimeThreadCpuTimeNs=0,query=SELECT product_id, SUM(click_count), SUM(impression_count), SUM(cost), SUM(order_count), SUM(revenue) FROM ads_metrics WHERE user_id = 13133627 AND serve_time BETWEEN 1633924800 AND 1636520399 GROUP BY product_id LIMIT 6000
    is there any way i could tell from these logs why this is being slow (?) only thing I can see is the
    scanPostFilter=123504
    which may happen because of the group by i believe we currently do not have any indexes into that product_id column, would adding one speed up things in any way?
    r
    • 2
    • 23
  • a

    Ali Atıl

    11/10/2021, 7:27 AM
    Hey everyone, i have managed to create a hybrid table. I have few questions regarding to the subject. •Since segments are transferred to offline table periodically, Is it a correct assumption that i don't need those transferred realtime segments to be hosted in servers? •If that is the case, Is it recommended to clean up those transferred segments and what is the correct way to clean them up? What comes up to my mind is setting up retentionTimeUnit and retentionTimeValue properties in realtime table configuration. Does Pinot have a built-in clean up mechanism for hybrid tables? Thanks in advance
    n
    • 2
    • 3
  • t

    Tony Requist

    11/10/2021, 3:26 PM
    We have a Pinot / Kubernetes deployment with 6 controller pods. We are seeing high CPU on one controller, very low on the others. Restarting pods does not change this behavior. Our Pinot is now primarily ingesting one fairly high volume Kafka stream with 128 partitions. Is this expected?
    x
    s
    • 3
    • 7
  • c

    Carl

    11/11/2021, 4:56 PM
    Hi team, we are observing a pattern of latency increase daily in Pinot query. E.g. p95 increase from <100ms to 400ms, and this increase last for less then a hour each day. Is there some system metrics we could look at to identify t root cause for this?
    m
    a
    • 3
    • 14
  • d

    Diogo Baeder

    11/11/2021, 5:47 PM
    Hi again, folks! Hey, I got a question about timestamps in datetime columns: I'm trying to use
    1:MILLISECONDS:EPOCH
    , and I'm publishing Kafka events containing timestamps that are basically
    int(time_in_seconds_as_float * 1000)
    from a Python-based app, but when I use the incubator to query the table I'm getting back negative values. I'm probably doing something wrong, but isn't the idea to publish the time, in milliseconds, since Epoch (1970-01-01 000000)?
    m
    • 2
    • 10
  • k

    Kamal Chavda

    11/12/2021, 6:41 PM
    Hi all, I have a realtime table which completes loading all data from source ( using debezium > kafka). I compared the kafka connect logs and total records from snapshot match total records in Pinot table, however a few minutes later there are less records in Pinot. Nothing in the pinot-controller/server/broker logs. Anyone else experience this?
    s
    • 2
    • 3
  • s

    Sandeep R

    11/14/2021, 12:35 AM
    Hi Team, What is best way running pinot services in the background? Like when server got rebooted, pinot services(controller,broker,zk) should start automatically.
    m
    d
    • 3
    • 7
  • m

    Map

    11/14/2021, 11:13 PM
    When querying Pinot via Trino, it seems aggregate pushdowns won’t work for
    count(*)
    if trino functions are in the predict. For example, the query below doesn’t work and returns an error due to the max rows per split setting:
    Copy code
    select count(*) from table0 where from_unixtime(col0) > current_timestamp
    but the following query works:
    Copy code
    select count(*) from table0 where col0 > 0
    Suspect it has something to do with the order of evaluation. Perhaps the trino functions should be evaluated before determining if push downs should happen?
    m
    • 2
    • 2
  • y

    Yash Agarwal

    11/15/2021, 6:49 AM
    We have a pinot cluster, some of our users are running very heavy queries which results in
    Copy code
    java.lang.OutOfMemoryError: Java heap space
    This is fine, but as the result of this the server instance is becoming unhealthy. i.e. Live Instance Config becomes
    Copy code
    {
      "_code": 404,
      "_error": "ZKPath /PinotCluster/LIVEINSTANCES/Server_node_8098 does not exist:"
    }
    How can we solve the same ?
    k
    • 2
    • 12
  • a

    Ali Atıl

    11/15/2021, 11:26 AM
    Hello everyone, I am using version 0.8.0. When i run the RealtimeProvisioningHelper command below, it gives me an exception. Any idea why it happens? I have put one realtime table segment in sampleCompletedSegmentDir directory. Command:
    Copy code
    root@pinot-controller-0:/opt/pinot# bin/pinot-admin.sh RealtimeProvisioningHelper -tableConfigFile /opt/pinot/denizTableConfig.json -numPartitions 1 -numHosts 2 -numHours 6,12,18,24 -sampleCompletedSegmentDir /opt/pinot/samplesegment/realtime/ -ingestionRate 100
    Exception:
    Copy code
    Executing command: RealtimeProvisioningHelper -tableConfigFile /opt/pinot/denizTableConfig.json -numPartitions 1 -pushFrequency null -numHosts 2 -numHours 6,12,18,24 -sampleCompletedSegmentDir /opt/pinot/samplesegment/realtime/ -ingestionRate 100 -maxUsableHostMemory 48G -retentionHours 0
    Exception caught:
    java.lang.RuntimeException: Caught exception when reading segment index dir
            at org.apache.pinot.controller.recommender.realtime.provisioning.MemoryEstimator.<init>(MemoryEstimator.java:117) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-517a0dcea48a7dcb8616addc403c20e0fc23484a]
            at org.apache.pinot.tools.admin.command.RealtimeProvisioningHelperCommand.execute(RealtimeProvisioningHelperCommand.java:268) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-517a0dcea48a7dcb8616addc403c20e0fc23484a]
            at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:169) [pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-517a0dcea48a7dcb8616addc403c20e0fc23484a]
            at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:189) [pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-517a0dcea48a7dcb8616addc403c20e0fc23484a]
    Caused by: java.lang.NullPointerException: Cannot find segment metadata file under directory: /opt/pinot/samplesegment/realtime
            at shaded.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:864) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-517a0dcea48a7dcb8616addc403c20e0fc23484a]
            at org.apache.pinot.segment.spi.index.metadata.SegmentMetadataImpl.getPropertiesConfiguration(SegmentMetadataImpl.java:144) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-517a0dcea48a7dcb8616addc403c20e0fc23484a]
            at org.apache.pinot.segment.spi.index.metadata.SegmentMetadataImpl.<init>(SegmentMetadataImpl.java:117) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-517a0dcea48a7dcb8616addc403c20e0fc23484a]
            at org.apache.pinot.controller.recommender.realtime.provisioning.MemoryEstimator.<init>(MemoryEstimator.java:115) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-517a0dcea48a7dcb8616addc403c20e0fc23484a]
            ... 3 more
    realtime table config file [-tableConfigFile /opt/pinot/denizTableConfig.json]
    Copy code
    {
      "tableName": "denizhybrid",
      "tableType": "REALTIME",
      "segmentsConfig": {
        "timeColumnName": "messageTime",
        "timeType": "MILLISECONDS",
        "schemaName": "deniz",
        "replicasPerPartition": "1",
        "retentionTimeUnit": "DAYS",
        "retentionTimeValue": "2"
      },
      "tenants": {},
      "fieldConfigList": [
        {
          "name": "location_st_point",
          "encodingType": "RAW",
          "indexType": "H3",
          "properties": {
            "resolutions": "5"
          }
        }
      ],
      "tableIndexConfig": {
        "loadMode": "MMAP",
        "rangeIndexColumns": [
          "latitude",
          "longitude"
        ],
        "noDictionaryColumns": [
          "location_st_point"
        ],
        "streamConfigs": {
          "streamType": "kafka",
          "stream.kafka.consumer.type": "lowlevel",
          "stream.kafka.topic.name": "kafkadeniztest2",
          "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
          "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
          "stream.kafka.broker.list": "kafka:9092",
          "realtime.segment.flush.threshold.size": "0",
          "realtime.segment.flush.threshold.time": "24h",
          "realtime.segment.flush.desired.size": "50M",
          "stream.kafka.consumer.prop.auto.offset.reset": "smallest"
        }
      },
      "query": {
        "timeoutMs": 60000
      },
      "metadata": {
        "customConfigs": {}
      },
      "task": {
        "taskTypeConfigsMap": {
          "RealtimeToOfflineSegmentsTask": {
            "bucketTimePeriod": "6h",
            "bufferTimePeriod": "9h",
            "maxNumRecordsPerSegment": "1000000"
          }
        }
      }
    }
    Thanks in Advance.
    m
    • 2
    • 2
  • k

    Kamal Chavda

    11/15/2021, 4:42 PM
    Hi All, had a few questions about using
    Pinot managed offline flows
    . Any help would be greatly appreciated! 1. Does the OFFLINE table config need to have the
    RealtimeToOfflineSegmentsTask
    match the one added to the REALTIME table config? 2. I'm seeing this
    TASK_ERROR to DROPPED
    in the minion log. What does this signify?
    Copy code
    20 START:INVOKE /PinotCluster/INSTANCES/Minion_172.19.0.6_9514/MESSAGES listener:org.apache.helix.messaging.handling.HelixTaskExecutor@157c6932 type: CALLBACK
    Resubscribe change listener to path: /PinotCluster/INSTANCES/Minion_172.19.0.6_9514/MESSAGES, for listener: org.apache.helix.messaging.handling.HelixTaskExecutor@157c6932, watchChild: false
    Subscribing changes listener to path: /PinotCluster/INSTANCES/Minion_172.19.0.6_9514/MESSAGES, type: CALLBACK, listener: org.apache.helix.messaging.handling.HelixTaskExecutor@157c6932
    Subscribing child change listener to path:/PinotCluster/INSTANCES/Minion_172.19.0.6_9514/MESSAGES
    Subscribing to path:/PinotCluster/INSTANCES/Minion_172.19.0.6_9514/MESSAGES took:0
    The latency of message 6a8ac921-3913-43e8-a777-b15c16185245 is 7 ms
    Scheduling message 6a8ac921-3913-43e8-a777-b15c16185245: TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945:TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945_0, TASK_ERROR->DROPPED
    Submit task: 6a8ac921-3913-43e8-a777-b15c16185245 to pool: java.util.concurrent.ThreadPoolExecutor@67024f54[Running, pool size = 40, active threads = 0, queued tasks = 0, completed tasks = 221]
    Message: 6a8ac921-3913-43e8-a777-b15c16185245 handling task scheduled
    20 END:INVOKE /PinotCluster/INSTANCES/Minion_172.19.0.6_9514/MESSAGES listener:org.apache.helix.messaging.handling.HelixTaskExecutor@157c6932 type: CALLBACK Took: 8ms
    handling task: 6a8ac921-3913-43e8-a777-b15c16185245 begin, at: 1636993355435
    handling message: 6a8ac921-3913-43e8-a777-b15c16185245 transit TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945.TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945_0|[] from:TASK_ERROR to:DROPPED, relayedFrom: null
    Merging with delta list, recordId = TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945 other:TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945
    Instance Minion_172.19.0.6_9514, partition TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945_0 received state transition from TASK_ERROR to DROPPED on session 1005c465f540008, message id: 6a8ac921-3913-43e8-a777-b15c16185245
    Merging with delta list, recordId = TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945 other:TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945
    Removed /PinotCluster/INSTANCES/Minion_172.19.0.6_9514/CURRENTSTATES/1005c465f540008/TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945
    Message 6a8ac921-3913-43e8-a777-b15c16185245 completed.
    Delete message 6a8ac921-3913-43e8-a777-b15c16185245 from zk!
    message finished: 6a8ac921-3913-43e8-a777-b15c16185245, took 14
    Message: 6a8ac921-3913-43e8-a777-b15c16185245 (parent: null) handling task for TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945:TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945_0 completed at: 1636993355449, results: true. FrameworkTime: 1 ms; HandlerTime: 13 ms.
    Subscribing changes listener to path: /PinotCluster/INSTANCES/Minion_172.19.0.6_9514/MESSAGES, type: CALLBACK, listener: org.apache.helix.messaging.handling.HelixTaskExecutor@157c6932
    Subscribing child change listener to path:/PinotCluster/INSTANCES/Minion_172.19.0.6_9514/MESSAGES
    Subscribing to path:/PinotCluster/INSTANCES/Minion_172.19.0.6_9514/MESSAGES took:0
    3. The tasks/scheduler/information API endpoint returns "Task scheduler is disabled". I've added entry to controller config
    "controller.task.frequencyInSeconds": 3600
    is there some other setting I need to configure? 4. The tasks/task/taskname/state is giving a
    500 Index 1 out of bounds for length 1"
    but tasks/tasktype/taskstates shows completed. I'm not seeing any segments added to my OFFLINE table though. Any idea on what's missing?
    n
    x
    • 3
    • 51
  • t

    Tony Requist

    11/15/2021, 6:53 PM
    Based on a thread from a few days ago, I changed our Pinot deployment from 6 controllers to 3. Now I am seeing three controllers as "dead" in Cluster Manager, and I am getting
    segments ... unavailable
    errors (though I am not sure these two issues are related 1. How do I get rid of "dead" controllers when I reduce the number of controllers? 2. Could this cause
    segment ... unavailable
    ?
    m
    • 2
    • 10
  • e

    Elon

    11/15/2021, 8:19 PM
    Hi, observed that increasing the zk client timeout in the pinot zookeeper does not prevent a zk client timeout from helix, which is hardcoded. We see these errors when the brokers are under heavy gc pressure, gc pauses, etc.
    m
    p
    k
    • 4
    • 12
  • s

    Sandeep R

    11/15/2021, 8:43 PM
    Hi team, Can we join two tables and query?
    k
    k
    • 3
    • 3
  • t

    Tony Requist

    11/16/2021, 4:36 AM
    Backfill question -- we have a large REALTIME table (~900GB/day). Due to a configuration error (ZK heap size too low) we lost some data because the Kafka retention was less than the time to fix the bug. This has me thinking of way to fill in missing data in the future for disaster recovery. We have all the raw data sitting in Parquet files in our data lake. My initial thought was to regenerate the segments with missing data (they are east to identify). Is it possible to upload (refresh) REALTIME segments, assuming the event time range is correct (there would be more events in the replacement segment)? Or do I have to use a HYBRID table and either populate the OFFLINE segments myself or use Pinot managed Offline flows?
    m
    y
    • 3
    • 2
  • a

    Anish Nair

    11/16/2021, 6:47 AM
    Hi team, This is regarding batch ingestion from HDFS to Offline_Table. After running the following command. bin/pinot-ingestion-job.sh -jobSpecFile /root/hdfsBatchIngestionSpec1.yaml Getting the following logs, segments are not getting created.
    Copy code
    Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
    Initializing PinotFS for scheme hdfs, classname org.apache.pinot.plugin.filesystem.HadoopPinotFS
    Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    log4j:WARN No appenders could be found for logger (org.apache.htrace.core.Tracer).
    log4j:WARN Please initialize the log4j system properly.
    log4j:WARN See <http://logging.apache.org/log4j/1.2/faq.html#noconfig> for more info.
    No unit for dfs.client.datanode-restart.timeout(30) assuming SECONDS
    No unit for dfs.client.datanode-restart.timeout(30) assuming SECONDS
    The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
    successfully initialized HadoopPinotFS
    Creating an executor service with 1 threads(Job parallelism: 0, available cores: 24.)
    Submitting one Segment Generation Task for <hdfs://nameservice1/data/poc/pinot-ingestion/part-00000-a75dbdce-f8f4-469f-8f70-d412b02b59cb-c000.gz.parquet>
    Using class: org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader to read segment, ignoring configured file format: AVRO
    Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner
    Initializing PinotFS for scheme hdfs, classname org.apache.pinot.plugin.filesystem.HadoopPinotFS
    successfully initialized HadoopPinotFS
    Start pushing segments: []... to locations: [org.apache.pinot.spi.ingestion.batch.spec.PinotClusterSpec@5d28bcd5] for table poc_test_table
    d
    j
    +2
    • 5
    • 17
  • l

    Lars-Kristian Svenøy

    11/16/2021, 12:37 PM
    Hey everyone. Quick question; When querying for a specific time range in Pinot, is it more efficient to use the primary time column defined in the segmentsConfig, or is it equivalent to using any other time column? From the docs it seems to indicate that the primary time column is only used for retention purposes, meaning that querying for another timestamp should be fine too. In my case, I am creating a copy of the primary timestamp, reducing the granularity of it, and calling it
    daysSinceEpoch
    , as I want to query for entities within certain days.
    Copy code
    "ingestionConfig": {
        "transformConfigs": [
          {
            "columnName": "daysSinceEpoch",
            "transformFunction": "toEpochDays(documentTimestamp)"
          }
        ],
    ...
    Additionally, for the RealtimeToOfflineSegmentsTask, I am using this value for deduplication purposes. In the schema:
    Copy code
    "primaryKeyColumns": ["customerId", "machineId", "daysSinceEpoch"]
    ...
    This is because for each event, I only want to keep the latest in a day. Here’s the RealtimeToOfflineSegmentsTask
    Copy code
    "RealtimeToOfflineSegmentsTask": {
            "bucketTimePeriod": "1d",
            "bufferTimePeriod": "2d",
            "mergeType": "dedup",
            "maxNumRecordsPerSegment": 10000000,
            "roundBucketTimePeriod": "1h"
          }
    In the realtime table, I am also filtering out any events older than 14 days (Where documentTimestamp is the actual primary timeColumnName)
    Copy code
    "filterConfig": {
      "filterFunction": "Groovy({documentTimestamp < (new Date() - 14).getTime()}, documentTimestamp)"
    },
    Does that make sense?
    n
    m
    • 3
    • 12
  • i

    II

    11/16/2021, 5:30 PM
    hi, team, any insight on this SQL issue I am trying to use
    distinctCount
    aggregation function to count under different conditions
    Copy code
    select distinctCount(case when condition1 then colA else null end) as condition1Count,
        distinctCount(case when condition2 then colA else null end) as condition2Count,
        distinctCount(case when condition3 then colA else null end) as condition3Count
    from tableA
    colA is type int or String. but looks like it’s not supported in pinot cause null is not supported in the selection query Will there be a future support for this.
    x
    • 2
    • 4
  • j

    Jonathan Meyer

    11/17/2021, 11:39 AM
    Hello 🙂 Quick question regarding
    ingestionConfig
    on REALTIME tables Is there any way to
    jsonPathString
    + further process the result with
    Groovy
    in
    transformConfig
    ?
    m
    n
    m
    • 4
    • 38
  • t

    Trust Okoroego

    11/17/2021, 1:14 PM
    Hello ✋ , I deleted a realtime table from my Pinot cluster, but I can still see the consumer group with "empty name" created by pinot on the topic still keeping track of the consumer lags. See image below: Since pinot is using lowlevel consumers, there is actually no real concept of consumer group, and since the consumer group name is "blank" I am not able to delete it. While this may not affect any new realtime table created to consume this topic, is there no way to ensure the consumer is removed from the topic when the realtime table is removed?
    m
    s
    n
    • 4
    • 15
  • a

    Arpit

    11/17/2021, 4:31 PM
    Hi, I am executing a inner join query in Presto but getting below error : Error when hitting host with Pinot query " select validfrom, Id, InsertTimsttamp from trade_realtime where (id = '1234') limit 2147483647 My original query is like this: select a.Id, max(a.ValidFrom) as MaxValidFrom, a.InsertTimeStamp from mypinotcluster.default.trade a inner join ( select Id, Max(InsertTimeStamp) as MaxInsertTime from mypinotcluster.default.trade group by Id ) b on a.Id = b.Id and a.InsertTimeStamp = b.MaxInsertTime AND a.Id='4-467125-467125 -0-50' group by a.Id, a.InsertTimeStamp LIMIT 20; Looks like Presto is computing the result in memory instead of executing in Pinot. Any ideas how can I make it work?
    m
    x
    • 3
    • 17
  • a

    Ayush Kumar Jha

    11/18/2021, 11:08 AM
    Hi all the download link for 0.8.0 and 0.7.1 is not working.It is giving 404 error
    x
    p
    • 3
    • 2
  • a

    Ali

    11/18/2021, 11:09 AM
    Hi, I’m trying to improve the performance of a select count(distinct col_a) query, it’s taking several minutes at the moment before failing (out of memory, box has 64gb ram). There are about 50 million unique values from about 700 millions rows. The DistinctCountHLL and DistinctCountThetaSketch estimates are fast enough but not accurate enough. What can I do improve the performance of the count(distinct col_a) query?
    r
    k
    +2
    • 5
    • 103
  • d

    Diogo Baeder

    11/18/2021, 12:52 PM
    BTW it would be cool if there was an official docker-compose file, maintained by the Pinot dev team, for testing purposes...
    m
    • 2
    • 5
  • m

    Mark Needham

    11/18/2021, 1:14 PM
    if so, there was some code added to the PinotAdministrator that has it do a
    System.exit(0)
    as soon as commands have been executed
    m
    k
    x
    • 4
    • 18
1...262728...166Latest