Apache Pinot #troubleshooting

Anish Nair

11/09/2021, 7:37 AM

Hey team, i have few questions, can someone help ? 1) Queries are not returning results most of the time. Upon checking broker logs found the following :

Copy code

Failed to find servers hosting segment: mytable_0_8_20211029T2056Z for table: mytable_REALTIME (all ONLINE/CONSUMING instances: [] are disabled, but find enabled OFFLINE instance: Server_ip_8098 from OFFLINE instances: [Server_ip_8098], not counting the segment as unavailable)

is this query timeout case? 2) I have set, flush.threshold.size to 10mn. But segments are getting created with lesser rows ( Total docs: 3.4mn). Is this expected? 3) What type of index is recommended on Realtime table with upsert mode on ? 4) In upsert mode, any limitation on "comparison time column" , i.e timestamp format, granularity? my table date column is in yyyyMMddHH format. comparison time column will be in timestamp format yyyy-MM-dd HHmmss

Copy code

{
  "upsertConfig": {
    "mode": "FULL",
    "comparisonColumn": "anotherTimeColumn"
  }
}

5) Queries are timing out at 10secs, even after changing the values at broker and server level. anyother configs needs to be changed ? pinot.broker.timeoutMs pinot.server.query.executor.timeout

Ali Atıl

11/09/2021, 8:04 AM

Hello everyone, I am using version 0.7.1. I am trying to create a hybrid table. Do i have to put controller.task.frequecyInSeconds in my controller config file? it says it is deprecated in configuration reference.

Dan DC

11/09/2021, 2:17 PM

Hey team, I've got an Avro schema which contains an array of records in a child field. I want to convert this to JSON during ingestion so I've added a transformation for this column to my realtime table. I've specified

as my complex type delimiter because I've got some Groovy transformations that I need to apply to other columns and is the only delimiter I can use to make my field names compatible with Groovy identifiers. My config looks like:

Copy code

...
"complexTypeConfig": {
  "delimiter": "$",
  ...
},
"transformConfigs": [
  ...
  "columnName": "some_field",
  "transformFunction": "json_format(parent_field$some_field)"
  ...
],
...

Vibhor Jain

11/09/2021, 4:22 PM

Hi Team, We have a hybrid table for our analytics use case and were using UPSERT for REALTIME table. It was working perfectly fine in 0.8. When minion was moving data to OFFLINE, we were using mergeType: "dedup" and duplicates were getting eliminated in OFFLINE flow also. When we upgraded to 0.9, the UPSERT is no more supported for hybrid table. This validator is blocking our table deployment. We understand UPSERT cannot work for OFFLINE table but why is it blocked for hybrid tables? Can someone clarify if we are missing something here?

Luis Fernandez

11/09/2021, 5:16 PM

How do I know if a segment is too big ?

Luis Fernandez

11/09/2021, 6:08 PM

in the logs i’m observing

Copy code

2021-11-09 12:53:00	
Slow query: request handler processing time: 441, send response latency: 1, total time to handle request: 442
2021-11-09 12:53:00	
Processed requestId=1975257,table=etsyads_metrics_REALTIME,segments(queried/processed/matched/consuming)=46/46/46/1,schedulerWaitMs=0,reqDeserMs=0,totalExecMs=441,resSerMs=0,totalTimeMs=441,minConsumingFreshnessMs=1636480380211,broker=Broker_pinot-broker-1.pinot-broker-headless.pinot.svc.cluster.local_8099,numDocsScanned=20584,scanInFilter=0,scanPostFilter=123504,sched=fcfs,threadCpuTimeNs=0

i was able to then find the request id in the broker and got some more info:

Copy code

requestId=1976569,table=ads_metrics_REALTIME,timeMs=234,docs=17731/290711208,entries=0/106386,segments(queried/processed/matched/consuming/unavailable):46/46/46/1/0,consumingFreshnessTimeMs=1636480906334,servers=1/1,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);pinot-server-1_R=0,233,7479,0,-1,offlineThreadCpuTimeNs=0,realtimeThreadCpuTimeNs=0,query=SELECT product_id, SUM(click_count), SUM(impression_count), SUM(cost), SUM(order_count), SUM(revenue) FROM ads_metrics WHERE user_id = 13133627 AND serve_time BETWEEN 1633924800 AND 1636520399 GROUP BY product_id LIMIT 6000

is there any way i could tell from these logs why this is being slow (?) only thing I can see is the

scanPostFilter=123504

which may happen because of the group by i believe we currently do not have any indexes into that product_id column, would adding one speed up things in any way?

Ali Atıl

11/10/2021, 7:27 AM

Hey everyone, i have managed to create a hybrid table. I have few questions regarding to the subject. •Since segments are transferred to offline table periodically, Is it a correct assumption that i don't need those transferred realtime segments to be hosted in servers? •If that is the case, Is it recommended to clean up those transferred segments and what is the correct way to clean them up? What comes up to my mind is setting up retentionTimeUnit and retentionTimeValue properties in realtime table configuration. Does Pinot have a built-in clean up mechanism for hybrid tables? Thanks in advance

Tony Requist

11/10/2021, 3:26 PM

We have a Pinot / Kubernetes deployment with 6 controller pods. We are seeing high CPU on one controller, very low on the others. Restarting pods does not change this behavior. Our Pinot is now primarily ingesting one fairly high volume Kafka stream with 128 partitions. Is this expected?

Carl

11/11/2021, 4:56 PM

Hi team, we are observing a pattern of latency increase daily in Pinot query. E.g. p95 increase from <100ms to 400ms, and this increase last for less then a hour each day. Is there some system metrics we could look at to identify t root cause for this?

Diogo Baeder

11/11/2021, 5:47 PM

Hi again, folks! Hey, I got a question about timestamps in datetime columns: I'm trying to use

1:MILLISECONDS:EPOCH

, and I'm publishing Kafka events containing timestamps that are basically

int(time_in_seconds_as_float * 1000)

from a Python-based app, but when I use the incubator to query the table I'm getting back negative values. I'm probably doing something wrong, but isn't the idea to publish the time, in milliseconds, since Epoch (1970-01-01 000000)?

Kamal Chavda

11/12/2021, 6:41 PM

Hi all, I have a realtime table which completes loading all data from source ( using debezium > kafka). I compared the kafka connect logs and total records from snapshot match total records in Pinot table, however a few minutes later there are less records in Pinot. Nothing in the pinot-controller/server/broker logs. Anyone else experience this?

Sandeep R

11/14/2021, 12:35 AM

Hi Team, What is best way running pinot services in the background? Like when server got rebooted, pinot services(controller,broker,zk) should start automatically.

Map

11/14/2021, 11:13 PM

When querying Pinot via Trino, it seems aggregate pushdowns won’t work for

count(*)

if trino functions are in the predict. For example, the query below doesn’t work and returns an error due to the max rows per split setting:

Copy code

select count(*) from table0 where from_unixtime(col0) > current_timestamp

but the following query works:

Copy code

select count(*) from table0 where col0 > 0

Suspect it has something to do with the order of evaluation. Perhaps the trino functions should be evaluated before determining if push downs should happen?

Yash Agarwal

11/15/2021, 6:49 AM

We have a pinot cluster, some of our users are running very heavy queries which results in

Copy code

java.lang.OutOfMemoryError: Java heap space

This is fine, but as the result of this the server instance is becoming unhealthy. i.e. Live Instance Config becomes

Copy code

{
  "_code": 404,
  "_error": "ZKPath /PinotCluster/LIVEINSTANCES/Server_node_8098 does not exist:"
}

How can we solve the same ?

Ali Atıl

11/15/2021, 11:26 AM

Hello everyone, I am using version 0.8.0. When i run the RealtimeProvisioningHelper command below, it gives me an exception. Any idea why it happens? I have put one realtime table segment in sampleCompletedSegmentDir directory. Command:

Copy code

root@pinot-controller-0:/opt/pinot# bin/pinot-admin.sh RealtimeProvisioningHelper -tableConfigFile /opt/pinot/denizTableConfig.json -numPartitions 1 -numHosts 2 -numHours 6,12,18,24 -sampleCompletedSegmentDir /opt/pinot/samplesegment/realtime/ -ingestionRate 100

Exception:

Copy code

Executing command: RealtimeProvisioningHelper -tableConfigFile /opt/pinot/denizTableConfig.json -numPartitions 1 -pushFrequency null -numHosts 2 -numHours 6,12,18,24 -sampleCompletedSegmentDir /opt/pinot/samplesegment/realtime/ -ingestionRate 100 -maxUsableHostMemory 48G -retentionHours 0
Exception caught:
java.lang.RuntimeException: Caught exception when reading segment index dir
        at org.apache.pinot.controller.recommender.realtime.provisioning.MemoryEstimator.<init>(MemoryEstimator.java:117) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-517a0dcea48a7dcb8616addc403c20e0fc23484a]
        at org.apache.pinot.tools.admin.command.RealtimeProvisioningHelperCommand.execute(RealtimeProvisioningHelperCommand.java:268) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-517a0dcea48a7dcb8616addc403c20e0fc23484a]
        at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:169) [pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-517a0dcea48a7dcb8616addc403c20e0fc23484a]
        at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:189) [pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-517a0dcea48a7dcb8616addc403c20e0fc23484a]
Caused by: java.lang.NullPointerException: Cannot find segment metadata file under directory: /opt/pinot/samplesegment/realtime
        at shaded.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:864) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-517a0dcea48a7dcb8616addc403c20e0fc23484a]
        at org.apache.pinot.segment.spi.index.metadata.SegmentMetadataImpl.getPropertiesConfiguration(SegmentMetadataImpl.java:144) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-517a0dcea48a7dcb8616addc403c20e0fc23484a]
        at org.apache.pinot.segment.spi.index.metadata.SegmentMetadataImpl.<init>(SegmentMetadataImpl.java:117) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-517a0dcea48a7dcb8616addc403c20e0fc23484a]
        at org.apache.pinot.controller.recommender.realtime.provisioning.MemoryEstimator.<init>(MemoryEstimator.java:115) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-517a0dcea48a7dcb8616addc403c20e0fc23484a]
        ... 3 more

realtime table config file [-tableConfigFile /opt/pinot/denizTableConfig.json]

Copy code

{
  "tableName": "denizhybrid",
  "tableType": "REALTIME",
  "segmentsConfig": {
    "timeColumnName": "messageTime",
    "timeType": "MILLISECONDS",
    "schemaName": "deniz",
    "replicasPerPartition": "1",
    "retentionTimeUnit": "DAYS",
    "retentionTimeValue": "2"
  },
  "tenants": {},
  "fieldConfigList": [
    {
      "name": "location_st_point",
      "encodingType": "RAW",
      "indexType": "H3",
      "properties": {
        "resolutions": "5"
      }
    }
  ],
  "tableIndexConfig": {
    "loadMode": "MMAP",
    "rangeIndexColumns": [
      "latitude",
      "longitude"
    ],
    "noDictionaryColumns": [
      "location_st_point"
    ],
    "streamConfigs": {
      "streamType": "kafka",
      "stream.kafka.consumer.type": "lowlevel",
      "stream.kafka.topic.name": "kafkadeniztest2",
      "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
      "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
      "stream.kafka.broker.list": "kafka:9092",
      "realtime.segment.flush.threshold.size": "0",
      "realtime.segment.flush.threshold.time": "24h",
      "realtime.segment.flush.desired.size": "50M",
      "stream.kafka.consumer.prop.auto.offset.reset": "smallest"
    }
  },
  "query": {
    "timeoutMs": 60000
  },
  "metadata": {
    "customConfigs": {}
  },
  "task": {
    "taskTypeConfigsMap": {
      "RealtimeToOfflineSegmentsTask": {
        "bucketTimePeriod": "6h",
        "bufferTimePeriod": "9h",
        "maxNumRecordsPerSegment": "1000000"
      }
    }
  }
}

Thanks in Advance.

Kamal Chavda

11/15/2021, 4:42 PM

Hi All, had a few questions about using

Pinot managed offline flows

. Any help would be greatly appreciated! 1. Does the OFFLINE table config need to have the

RealtimeToOfflineSegmentsTask

match the one added to the REALTIME table config? 2. I'm seeing this

TASK_ERROR to DROPPED

in the minion log. What does this signify?

Copy code

20 START:INVOKE /PinotCluster/INSTANCES/Minion_172.19.0.6_9514/MESSAGES listener:org.apache.helix.messaging.handling.HelixTaskExecutor@157c6932 type: CALLBACK
Resubscribe change listener to path: /PinotCluster/INSTANCES/Minion_172.19.0.6_9514/MESSAGES, for listener: org.apache.helix.messaging.handling.HelixTaskExecutor@157c6932, watchChild: false
Subscribing changes listener to path: /PinotCluster/INSTANCES/Minion_172.19.0.6_9514/MESSAGES, type: CALLBACK, listener: org.apache.helix.messaging.handling.HelixTaskExecutor@157c6932
Subscribing child change listener to path:/PinotCluster/INSTANCES/Minion_172.19.0.6_9514/MESSAGES
Subscribing to path:/PinotCluster/INSTANCES/Minion_172.19.0.6_9514/MESSAGES took:0
The latency of message 6a8ac921-3913-43e8-a777-b15c16185245 is 7 ms
Scheduling message 6a8ac921-3913-43e8-a777-b15c16185245: TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945:TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945_0, TASK_ERROR->DROPPED
Submit task: 6a8ac921-3913-43e8-a777-b15c16185245 to pool: java.util.concurrent.ThreadPoolExecutor@67024f54[Running, pool size = 40, active threads = 0, queued tasks = 0, completed tasks = 221]
Message: 6a8ac921-3913-43e8-a777-b15c16185245 handling task scheduled
20 END:INVOKE /PinotCluster/INSTANCES/Minion_172.19.0.6_9514/MESSAGES listener:org.apache.helix.messaging.handling.HelixTaskExecutor@157c6932 type: CALLBACK Took: 8ms
handling task: 6a8ac921-3913-43e8-a777-b15c16185245 begin, at: 1636993355435
handling message: 6a8ac921-3913-43e8-a777-b15c16185245 transit TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945.TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945_0|[] from:TASK_ERROR to:DROPPED, relayedFrom: null
Merging with delta list, recordId = TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945 other:TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945
Instance Minion_172.19.0.6_9514, partition TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945_0 received state transition from TASK_ERROR to DROPPED on session 1005c465f540008, message id: 6a8ac921-3913-43e8-a777-b15c16185245
Merging with delta list, recordId = TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945 other:TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945
Removed /PinotCluster/INSTANCES/Minion_172.19.0.6_9514/CURRENTSTATES/1005c465f540008/TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945
Message 6a8ac921-3913-43e8-a777-b15c16185245 completed.
Delete message 6a8ac921-3913-43e8-a777-b15c16185245 from zk!
message finished: 6a8ac921-3913-43e8-a777-b15c16185245, took 14
Message: 6a8ac921-3913-43e8-a777-b15c16185245 (parent: null) handling task for TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945:TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945_0 completed at: 1636993355449, results: true. FrameworkTime: 1 ms; HandlerTime: 13 ms.
Subscribing changes listener to path: /PinotCluster/INSTANCES/Minion_172.19.0.6_9514/MESSAGES, type: CALLBACK, listener: org.apache.helix.messaging.handling.HelixTaskExecutor@157c6932
Subscribing child change listener to path:/PinotCluster/INSTANCES/Minion_172.19.0.6_9514/MESSAGES
Subscribing to path:/PinotCluster/INSTANCES/Minion_172.19.0.6_9514/MESSAGES took:0

3. The tasks/scheduler/information API endpoint returns "Task scheduler is disabled". I've added entry to controller config

"controller.task.frequencyInSeconds": 3600

is there some other setting I need to configure? 4. The tasks/task/taskname/state is giving a

500 Index 1 out of bounds for length 1"

but tasks/tasktype/taskstates shows completed. I'm not seeing any segments added to my OFFLINE table though. Any idea on what's missing?

Tony Requist

11/15/2021, 6:53 PM

Based on a thread from a few days ago, I changed our Pinot deployment from 6 controllers to 3. Now I am seeing three controllers as "dead" in Cluster Manager, and I am getting

segments ... unavailable

errors (though I am not sure these two issues are related 1. How do I get rid of "dead" controllers when I reduce the number of controllers? 2. Could this cause

segment ... unavailable

Elon

11/15/2021, 8:19 PM

Hi, observed that increasing the zk client timeout in the pinot zookeeper does not prevent a zk client timeout from helix, which is hardcoded. We see these errors when the brokers are under heavy gc pressure, gc pauses, etc.

Sandeep R

11/15/2021, 8:43 PM

Hi team, Can we join two tables and query?

Tony Requist

11/16/2021, 4:36 AM

Backfill question -- we have a large REALTIME table (~900GB/day). Due to a configuration error (ZK heap size too low) we lost some data because the Kafka retention was less than the time to fix the bug. This has me thinking of way to fill in missing data in the future for disaster recovery. We have all the raw data sitting in Parquet files in our data lake. My initial thought was to regenerate the segments with missing data (they are east to identify). Is it possible to upload (refresh) REALTIME segments, assuming the event time range is correct (there would be more events in the replacement segment)? Or do I have to use a HYBRID table and either populate the OFFLINE segments myself or use Pinot managed Offline flows?

Anish Nair

11/16/2021, 6:47 AM

Hi team, This is regarding batch ingestion from HDFS to Offline_Table. After running the following command. bin/pinot-ingestion-job.sh -jobSpecFile /root/hdfsBatchIngestionSpec1.yaml Getting the following logs, segments are not getting created.

Copy code

Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
Initializing PinotFS for scheme hdfs, classname org.apache.pinot.plugin.filesystem.HadoopPinotFS
Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
log4j:WARN No appenders could be found for logger (org.apache.htrace.core.Tracer).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See <http://logging.apache.org/log4j/1.2/faq.html#noconfig> for more info.
No unit for dfs.client.datanode-restart.timeout(30) assuming SECONDS
No unit for dfs.client.datanode-restart.timeout(30) assuming SECONDS
The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
successfully initialized HadoopPinotFS
Creating an executor service with 1 threads(Job parallelism: 0, available cores: 24.)
Submitting one Segment Generation Task for <hdfs://nameservice1/data/poc/pinot-ingestion/part-00000-a75dbdce-f8f4-469f-8f70-d412b02b59cb-c000.gz.parquet>
Using class: org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader to read segment, ignoring configured file format: AVRO
Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner
Initializing PinotFS for scheme hdfs, classname org.apache.pinot.plugin.filesystem.HadoopPinotFS
successfully initialized HadoopPinotFS
Start pushing segments: []... to locations: [org.apache.pinot.spi.ingestion.batch.spec.PinotClusterSpec@5d28bcd5] for table poc_test_table

Lars-Kristian Svenøy

11/16/2021, 12:37 PM

Hey everyone. Quick question; When querying for a specific time range in Pinot, is it more efficient to use the primary time column defined in the segmentsConfig, or is it equivalent to using any other time column? From the docs it seems to indicate that the primary time column is only used for retention purposes, meaning that querying for another timestamp should be fine too. In my case, I am creating a copy of the primary timestamp, reducing the granularity of it, and calling it

daysSinceEpoch

, as I want to query for entities within certain days.

Copy code

"ingestionConfig": {
    "transformConfigs": [
      {
        "columnName": "daysSinceEpoch",
        "transformFunction": "toEpochDays(documentTimestamp)"
      }
    ],
...

Additionally, for the RealtimeToOfflineSegmentsTask, I am using this value for deduplication purposes. In the schema:

Copy code

"primaryKeyColumns": ["customerId", "machineId", "daysSinceEpoch"]
...

This is because for each event, I only want to keep the latest in a day. Here’s the RealtimeToOfflineSegmentsTask

Copy code

"RealtimeToOfflineSegmentsTask": {
        "bucketTimePeriod": "1d",
        "bufferTimePeriod": "2d",
        "mergeType": "dedup",
        "maxNumRecordsPerSegment": 10000000,
        "roundBucketTimePeriod": "1h"
      }

In the realtime table, I am also filtering out any events older than 14 days (Where documentTimestamp is the actual primary timeColumnName)

Copy code

"filterConfig": {
  "filterFunction": "Groovy({documentTimestamp < (new Date() - 14).getTime()}, documentTimestamp)"
},

Does that make sense?

11/16/2021, 5:30 PM

hi, team, any insight on this SQL issue I am trying to use

distinctCount

aggregation function to count under different conditions

Copy code

select distinctCount(case when condition1 then colA else null end) as condition1Count,
    distinctCount(case when condition2 then colA else null end) as condition2Count,
    distinctCount(case when condition3 then colA else null end) as condition3Count
from tableA

colA is type int or String. but looks like it’s not supported in pinot cause null is not supported in the selection query Will there be a future support for this.

Jonathan Meyer

11/17/2021, 11:39 AM

Hello 🙂 Quick question regarding

ingestionConfig

on REALTIME tables Is there any way to

jsonPathString

+ further process the result with

Groovy

transformConfig

Trust Okoroego

11/17/2021, 1:14 PM

Hello ✋ , I deleted a realtime table from my Pinot cluster, but I can still see the consumer group with "empty name" created by pinot on the topic still keeping track of the consumer lags. See image below: Since pinot is using lowlevel consumers, there is actually no real concept of consumer group, and since the consumer group name is "blank" I am not able to delete it. While this may not affect any new realtime table created to consume this topic, is there no way to ensure the consumer is removed from the topic when the realtime table is removed?

Arpit

11/17/2021, 4:31 PM

Hi, I am executing a inner join query in Presto but getting below error : Error when hitting host with Pinot query " select validfrom, Id, InsertTimsttamp from trade_realtime where (id = '1234') limit 2147483647 My original query is like this: select a.Id, max(a.ValidFrom) as MaxValidFrom, a.InsertTimeStamp from mypinotcluster.default.trade a inner join ( select Id, Max(InsertTimeStamp) as MaxInsertTime from mypinotcluster.default.trade group by Id ) b on a.Id = b.Id and a.InsertTimeStamp = b.MaxInsertTime AND a.Id='4-467125-467125 -0-50' group by a.Id, a.InsertTimeStamp LIMIT 20; Looks like Presto is computing the result in memory instead of executing in Pinot. Any ideas how can I make it work?

Ayush Kumar Jha

11/18/2021, 11:08 AM

Hi all the download link for 0.8.0 and 0.7.1 is not working.It is giving 404 error

Ali

11/18/2021, 11:09 AM

Hi, I’m trying to improve the performance of a select count(distinct col_a) query, it’s taking several minutes at the moment before failing (out of memory, box has 64gb ram). There are about 50 million unique values from about 700 millions rows. The DistinctCountHLL and DistinctCountThetaSketch estimates are fast enough but not accurate enough. What can I do improve the performance of the count(distinct col_a) query?

Diogo Baeder

11/18/2021, 12:52 PM

BTW it would be cool if there was an official docker-compose file, maintained by the Pinot dev team, for testing purposes...

Mark Needham

11/18/2021, 1:14 PM

if so, there was some code added to the PinotAdministrator that has it do a

System.exit(0)

as soon as commands have been executed