Anish Nair
11/09/2021, 7:37 AMFailed to find servers hosting segment: mytable_0_8_20211029T2056Z for table: mytable_REALTIME (all ONLINE/CONSUMING instances: [] are disabled, but find enabled OFFLINE instance: Server_ip_8098 from OFFLINE instances: [Server_ip_8098], not counting the segment as unavailable)
is this query timeout case?
2) I have set, flush.threshold.size to 10mn. But segments are getting created with lesser rows ( Total docs: 3.4mn). Is this expected?
3) What type of index is recommended on Realtime table with upsert mode on ?
4) In upsert mode, any limitation on "comparison time column" , i.e timestamp format, granularity? my table date column is in yyyyMMddHH format. comparison time column will be in timestamp format yyyy-MM-dd HHmmss
{
"upsertConfig": {
"mode": "FULL",
"comparisonColumn": "anotherTimeColumn"
}
}
5) Queries are timing out at 10secs, even after changing the values at broker and server level. anyother configs needs to be changed ?
pinot.broker.timeoutMs
pinot.server.query.executor.timeoutAli Atıl
11/09/2021, 8:04 AMDan DC
11/09/2021, 2:17 PM$
as my complex type delimiter because I've got some Groovy transformations that I need to apply to other columns and is the only delimiter I can use to make my field names compatible with Groovy identifiers. My config looks like:
...
"complexTypeConfig": {
"delimiter": "$",
...
},
"transformConfigs": [
...
"columnName": "some_field",
"transformFunction": "json_format(parent_field$some_field)"
...
],
...
Vibhor Jain
11/09/2021, 4:22 PMLuis Fernandez
11/09/2021, 5:16 PMLuis Fernandez
11/09/2021, 6:08 PM2021-11-09 12:53:00
Slow query: request handler processing time: 441, send response latency: 1, total time to handle request: 442
2021-11-09 12:53:00
Processed requestId=1975257,table=etsyads_metrics_REALTIME,segments(queried/processed/matched/consuming)=46/46/46/1,schedulerWaitMs=0,reqDeserMs=0,totalExecMs=441,resSerMs=0,totalTimeMs=441,minConsumingFreshnessMs=1636480380211,broker=Broker_pinot-broker-1.pinot-broker-headless.pinot.svc.cluster.local_8099,numDocsScanned=20584,scanInFilter=0,scanPostFilter=123504,sched=fcfs,threadCpuTimeNs=0
i was able to then find the request id in the broker and got some more info:
requestId=1976569,table=ads_metrics_REALTIME,timeMs=234,docs=17731/290711208,entries=0/106386,segments(queried/processed/matched/consuming/unavailable):46/46/46/1/0,consumingFreshnessTimeMs=1636480906334,servers=1/1,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);pinot-server-1_R=0,233,7479,0,-1,offlineThreadCpuTimeNs=0,realtimeThreadCpuTimeNs=0,query=SELECT product_id, SUM(click_count), SUM(impression_count), SUM(cost), SUM(order_count), SUM(revenue) FROM ads_metrics WHERE user_id = 13133627 AND serve_time BETWEEN 1633924800 AND 1636520399 GROUP BY product_id LIMIT 6000
is there any way i could tell from these logs why this is being slow (?) only thing I can see is the scanPostFilter=123504
which may happen because of the group by i believe we currently do not have any indexes into that product_id column, would adding one speed up things in any way?Ali Atıl
11/10/2021, 7:27 AMTony Requist
11/10/2021, 3:26 PMCarl
11/11/2021, 4:56 PMDiogo Baeder
11/11/2021, 5:47 PM1:MILLISECONDS:EPOCH
, and I'm publishing Kafka events containing timestamps that are basically int(time_in_seconds_as_float * 1000)
from a Python-based app, but when I use the incubator to query the table I'm getting back negative values. I'm probably doing something wrong, but isn't the idea to publish the time, in milliseconds, since Epoch (1970-01-01 000000)?Kamal Chavda
11/12/2021, 6:41 PMSandeep R
11/14/2021, 12:35 AMMap
11/14/2021, 11:13 PMcount(*)
if trino functions are in the predict. For example, the query below doesn’t work and returns an error due to the max rows per split setting:
select count(*) from table0 where from_unixtime(col0) > current_timestamp
but the following query works:
select count(*) from table0 where col0 > 0
Suspect it has something to do with the order of evaluation. Perhaps the trino functions should be evaluated before determining if push downs should happen?Yash Agarwal
11/15/2021, 6:49 AMjava.lang.OutOfMemoryError: Java heap space
This is fine, but as the result of this the server instance is becoming unhealthy. i.e. Live Instance Config becomes
{
"_code": 404,
"_error": "ZKPath /PinotCluster/LIVEINSTANCES/Server_node_8098 does not exist:"
}
How can we solve the same ?Ali Atıl
11/15/2021, 11:26 AMroot@pinot-controller-0:/opt/pinot# bin/pinot-admin.sh RealtimeProvisioningHelper -tableConfigFile /opt/pinot/denizTableConfig.json -numPartitions 1 -numHosts 2 -numHours 6,12,18,24 -sampleCompletedSegmentDir /opt/pinot/samplesegment/realtime/ -ingestionRate 100
Exception:
Executing command: RealtimeProvisioningHelper -tableConfigFile /opt/pinot/denizTableConfig.json -numPartitions 1 -pushFrequency null -numHosts 2 -numHours 6,12,18,24 -sampleCompletedSegmentDir /opt/pinot/samplesegment/realtime/ -ingestionRate 100 -maxUsableHostMemory 48G -retentionHours 0
Exception caught:
java.lang.RuntimeException: Caught exception when reading segment index dir
at org.apache.pinot.controller.recommender.realtime.provisioning.MemoryEstimator.<init>(MemoryEstimator.java:117) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-517a0dcea48a7dcb8616addc403c20e0fc23484a]
at org.apache.pinot.tools.admin.command.RealtimeProvisioningHelperCommand.execute(RealtimeProvisioningHelperCommand.java:268) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-517a0dcea48a7dcb8616addc403c20e0fc23484a]
at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:169) [pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-517a0dcea48a7dcb8616addc403c20e0fc23484a]
at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:189) [pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-517a0dcea48a7dcb8616addc403c20e0fc23484a]
Caused by: java.lang.NullPointerException: Cannot find segment metadata file under directory: /opt/pinot/samplesegment/realtime
at shaded.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:864) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-517a0dcea48a7dcb8616addc403c20e0fc23484a]
at org.apache.pinot.segment.spi.index.metadata.SegmentMetadataImpl.getPropertiesConfiguration(SegmentMetadataImpl.java:144) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-517a0dcea48a7dcb8616addc403c20e0fc23484a]
at org.apache.pinot.segment.spi.index.metadata.SegmentMetadataImpl.<init>(SegmentMetadataImpl.java:117) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-517a0dcea48a7dcb8616addc403c20e0fc23484a]
at org.apache.pinot.controller.recommender.realtime.provisioning.MemoryEstimator.<init>(MemoryEstimator.java:115) ~[pinot-all-0.9.0-SNAPSHOT-jar-with-dependencies.jar:0.9.0-SNAPSHOT-517a0dcea48a7dcb8616addc403c20e0fc23484a]
... 3 more
realtime table config file [-tableConfigFile /opt/pinot/denizTableConfig.json]
{
"tableName": "denizhybrid",
"tableType": "REALTIME",
"segmentsConfig": {
"timeColumnName": "messageTime",
"timeType": "MILLISECONDS",
"schemaName": "deniz",
"replicasPerPartition": "1",
"retentionTimeUnit": "DAYS",
"retentionTimeValue": "2"
},
"tenants": {},
"fieldConfigList": [
{
"name": "location_st_point",
"encodingType": "RAW",
"indexType": "H3",
"properties": {
"resolutions": "5"
}
}
],
"tableIndexConfig": {
"loadMode": "MMAP",
"rangeIndexColumns": [
"latitude",
"longitude"
],
"noDictionaryColumns": [
"location_st_point"
],
"streamConfigs": {
"streamType": "kafka",
"stream.kafka.consumer.type": "lowlevel",
"stream.kafka.topic.name": "kafkadeniztest2",
"stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
"stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
"stream.kafka.broker.list": "kafka:9092",
"realtime.segment.flush.threshold.size": "0",
"realtime.segment.flush.threshold.time": "24h",
"realtime.segment.flush.desired.size": "50M",
"stream.kafka.consumer.prop.auto.offset.reset": "smallest"
}
},
"query": {
"timeoutMs": 60000
},
"metadata": {
"customConfigs": {}
},
"task": {
"taskTypeConfigsMap": {
"RealtimeToOfflineSegmentsTask": {
"bucketTimePeriod": "6h",
"bufferTimePeriod": "9h",
"maxNumRecordsPerSegment": "1000000"
}
}
}
}
Thanks in Advance.Kamal Chavda
11/15/2021, 4:42 PMPinot managed offline flows
. Any help would be greatly appreciated!
1. Does the OFFLINE table config need to have the RealtimeToOfflineSegmentsTask
match the one added to the REALTIME table config?
2. I'm seeing this TASK_ERROR to DROPPED
in the minion log. What does this signify?
20 START:INVOKE /PinotCluster/INSTANCES/Minion_172.19.0.6_9514/MESSAGES listener:org.apache.helix.messaging.handling.HelixTaskExecutor@157c6932 type: CALLBACK
Resubscribe change listener to path: /PinotCluster/INSTANCES/Minion_172.19.0.6_9514/MESSAGES, for listener: org.apache.helix.messaging.handling.HelixTaskExecutor@157c6932, watchChild: false
Subscribing changes listener to path: /PinotCluster/INSTANCES/Minion_172.19.0.6_9514/MESSAGES, type: CALLBACK, listener: org.apache.helix.messaging.handling.HelixTaskExecutor@157c6932
Subscribing child change listener to path:/PinotCluster/INSTANCES/Minion_172.19.0.6_9514/MESSAGES
Subscribing to path:/PinotCluster/INSTANCES/Minion_172.19.0.6_9514/MESSAGES took:0
The latency of message 6a8ac921-3913-43e8-a777-b15c16185245 is 7 ms
Scheduling message 6a8ac921-3913-43e8-a777-b15c16185245: TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945:TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945_0, TASK_ERROR->DROPPED
Submit task: 6a8ac921-3913-43e8-a777-b15c16185245 to pool: java.util.concurrent.ThreadPoolExecutor@67024f54[Running, pool size = 40, active threads = 0, queued tasks = 0, completed tasks = 221]
Message: 6a8ac921-3913-43e8-a777-b15c16185245 handling task scheduled
20 END:INVOKE /PinotCluster/INSTANCES/Minion_172.19.0.6_9514/MESSAGES listener:org.apache.helix.messaging.handling.HelixTaskExecutor@157c6932 type: CALLBACK Took: 8ms
handling task: 6a8ac921-3913-43e8-a777-b15c16185245 begin, at: 1636993355435
handling message: 6a8ac921-3913-43e8-a777-b15c16185245 transit TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945.TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945_0|[] from:TASK_ERROR to:DROPPED, relayedFrom: null
Merging with delta list, recordId = TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945 other:TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945
Instance Minion_172.19.0.6_9514, partition TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945_0 received state transition from TASK_ERROR to DROPPED on session 1005c465f540008, message id: 6a8ac921-3913-43e8-a777-b15c16185245
Merging with delta list, recordId = TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945 other:TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945
Removed /PinotCluster/INSTANCES/Minion_172.19.0.6_9514/CURRENTSTATES/1005c465f540008/TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945
Message 6a8ac921-3913-43e8-a777-b15c16185245 completed.
Delete message 6a8ac921-3913-43e8-a777-b15c16185245 from zk!
message finished: 6a8ac921-3913-43e8-a777-b15c16185245, took 14
Message: 6a8ac921-3913-43e8-a777-b15c16185245 (parent: null) handling task for TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945:TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_1636993325945_0 completed at: 1636993355449, results: true. FrameworkTime: 1 ms; HandlerTime: 13 ms.
Subscribing changes listener to path: /PinotCluster/INSTANCES/Minion_172.19.0.6_9514/MESSAGES, type: CALLBACK, listener: org.apache.helix.messaging.handling.HelixTaskExecutor@157c6932
Subscribing child change listener to path:/PinotCluster/INSTANCES/Minion_172.19.0.6_9514/MESSAGES
Subscribing to path:/PinotCluster/INSTANCES/Minion_172.19.0.6_9514/MESSAGES took:0
3. The tasks/scheduler/information API endpoint returns "Task scheduler is disabled". I've added entry to controller config "controller.task.frequencyInSeconds": 3600
is there some other setting I need to configure?
4. The tasks/task/taskname/state is giving a 500 Index 1 out of bounds for length 1"
but tasks/tasktype/taskstates shows completed. I'm not seeing any segments added to my OFFLINE table though. Any idea on what's missing?Tony Requist
11/15/2021, 6:53 PMsegments ... unavailable
errors (though I am not sure these two issues are related
1. How do I get rid of "dead" controllers when I reduce the number of controllers?
2. Could this cause segment ... unavailable
?Elon
11/15/2021, 8:19 PMSandeep R
11/15/2021, 8:43 PMTony Requist
11/16/2021, 4:36 AMAnish Nair
11/16/2021, 6:47 AMTrying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
Initializing PinotFS for scheme hdfs, classname org.apache.pinot.plugin.filesystem.HadoopPinotFS
Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
log4j:WARN No appenders could be found for logger (org.apache.htrace.core.Tracer).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See <http://logging.apache.org/log4j/1.2/faq.html#noconfig> for more info.
No unit for dfs.client.datanode-restart.timeout(30) assuming SECONDS
No unit for dfs.client.datanode-restart.timeout(30) assuming SECONDS
The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
successfully initialized HadoopPinotFS
Creating an executor service with 1 threads(Job parallelism: 0, available cores: 24.)
Submitting one Segment Generation Task for <hdfs://nameservice1/data/poc/pinot-ingestion/part-00000-a75dbdce-f8f4-469f-8f70-d412b02b59cb-c000.gz.parquet>
Using class: org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader to read segment, ignoring configured file format: AVRO
Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner
Initializing PinotFS for scheme hdfs, classname org.apache.pinot.plugin.filesystem.HadoopPinotFS
successfully initialized HadoopPinotFS
Start pushing segments: []... to locations: [org.apache.pinot.spi.ingestion.batch.spec.PinotClusterSpec@5d28bcd5] for table poc_test_table
Lars-Kristian Svenøy
11/16/2021, 12:37 PMdaysSinceEpoch
, as I want to query for entities within certain days.
"ingestionConfig": {
"transformConfigs": [
{
"columnName": "daysSinceEpoch",
"transformFunction": "toEpochDays(documentTimestamp)"
}
],
...
Additionally, for the RealtimeToOfflineSegmentsTask, I am using this value for deduplication purposes.
In the schema:
"primaryKeyColumns": ["customerId", "machineId", "daysSinceEpoch"]
...
This is because for each event, I only want to keep the latest in a day.
Here’s the RealtimeToOfflineSegmentsTask
"RealtimeToOfflineSegmentsTask": {
"bucketTimePeriod": "1d",
"bufferTimePeriod": "2d",
"mergeType": "dedup",
"maxNumRecordsPerSegment": 10000000,
"roundBucketTimePeriod": "1h"
}
In the realtime table, I am also filtering out any events older than 14 days (Where documentTimestamp is the actual primary timeColumnName)
"filterConfig": {
"filterFunction": "Groovy({documentTimestamp < (new Date() - 14).getTime()}, documentTimestamp)"
},
Does that make sense?II
11/16/2021, 5:30 PMdistinctCount
aggregation function to count under different conditions
select distinctCount(case when condition1 then colA else null end) as condition1Count,
distinctCount(case when condition2 then colA else null end) as condition2Count,
distinctCount(case when condition3 then colA else null end) as condition3Count
from tableA
colA is type int or String.
but looks like it’s not supported in pinot cause null is not supported in the selection query
Will there be a future support for this.Jonathan Meyer
11/17/2021, 11:39 AMingestionConfig
on REALTIME tables
Is there any way to jsonPathString
+ further process the result with Groovy
in transformConfig
?Trust Okoroego
11/17/2021, 1:14 PMArpit
11/17/2021, 4:31 PMAyush Kumar Jha
11/18/2021, 11:08 AMAli
11/18/2021, 11:09 AMDiogo Baeder
11/18/2021, 12:52 PMMark Needham
System.exit(0)
as soon as commands have been executed