Priyank Bagrecha
12/02/2021, 9:58 PM2021/12/02 21:32:12.132 ERROR [SegmentBuildTimeLeaseExtender] [pool-4-thread-1] Failed to send lease extension for km_mp_play_startree__63__21__20211202T2127Z
2021/12/02 21:32:18.330 ERROR [SegmentBuildTimeLeaseExtender] [pool-4-thread-1] Failed to send lease extension for km_mp_play_startree__103__21__20211202T2127Z
2021/12/02 21:32:24.354 ERROR [SegmentBuildTimeLeaseExtender] [pool-4-thread-1] Failed to send lease extension for km_mp_play_startree__111__21__20211202T2127Z
and i see that the server is marked as dead in the cluster manager. how can i get around this? thanks in advance.Priyank Bagrecha
12/02/2021, 10:47 PM2021/12/02 21:34:27.105 ERROR [LLRealtimeSegmentDataManager_km_mp_play_startree__48__1__20211202T1919Z] [km_mp_play_startree__48__1__20211202T1919Z] Holding after response from Controller: {"offset":-1,"streamPartitionMsgOffset":null,"buildTimeSec":-1,"isSplitCommitType":false,"status":"NOT_SENT"}
Killed
i don't have the part before this right now but can update once it happens again.Ali Atıl
12/03/2021, 8:48 AMDiana Arnos
12/03/2021, 4:55 PMPinot 0.9.0
that consumes from a kafka topic and I'm experiencing a weird behaviour.
Once the second message gets consumed, Pinot does a full upsert instead of the partial. So every field present in the second message gets updated and all the others are set to null (I believe because they are not present on the second message and the full upsert uses the default null values)
Here's the table and schema configs:
Schema:
{
"schemaName": "responseCount",
"dimensionFieldSpecs": [
{
"name": "responseId",
"dataType": "STRING"
},
{
"name": "formId",
"dataType": "STRING"
},
{
"name": "channelId",
"dataType": "STRING"
},
{
"name": "channelPlatform",
"dataType": "STRING"
},
{
"name": "companyId",
"dataType": "STRING"
},
{
"name": "submitted",
"dataType": "BOOLEAN"
},
{
"name": "deleted",
"dataType": "BOOLEAN"
}
],
"dateTimeFieldSpecs": [
{
"name": "operationDate",
"dataType": "STRING",
"format": "1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd'T'HH:mm:ss.SSSZ",
"granularity": "1:MILLISECONDS"
},
{
"name": "createdAt",
"dataType": "STRING",
"format": "1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd'T'HH:mm:ss.SSSZ",
"granularity": "1:MILLISECONDS"
},
{
"name": "deletedAt",
"dataType": "STRING",
"format": "1:MILLISECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd'T'HH:mm:ss.SSSZ",
"granularity": "1:MILLISECONDS"
}
],
"primaryKeyColumns": [
"responseId"
]
}
Table:
{
"REALTIME": {
"tableName": "responseCount_REALTIME",
"tableType": "REALTIME",
"segmentsConfig": {
"allowNullTimeValue": false,
"replication": "1",
"replicasPerPartition": "1",
"timeColumnName": "operationDate",
"schemaName": "responseCount"
},
"tenants": {
"broker": "DefaultTenant",
"server": "DefaultTenant"
},
"tableIndexConfig": {
"rangeIndexVersion": 1,
"autoGeneratedInvertedIndex": false,
"createInvertedIndexDuringSegmentGeneration": false,
"loadMode": "MMAP",
"streamConfigs": {
"streamType": "kafka",
"stream.kafka.topic.name": "response-count.aggregation.source",
"stream.kafka.broker.list": "kafka:9092",
"stream.kafka.consumer.type": "lowlevel",
"stream.kafka.consumer.prop.auto.offset.reset": "smallest",
"stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
"stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
"realtime.segment.flush.threshold.rows": "0",
"realtime.segment.flush.threshold.time": "24h",
"realtime.segment.flush.segment.size": "100M"
},
"enableDefaultStarTree": false,
"enableDynamicStarTreeCreation": false,
"aggregateMetrics": false,
"nullHandlingEnabled": true
},
"metadata": {},
"routing": {
"instanceSelectorType": "strictReplicaGroup"
},
"upsertConfig": {
"mode": "PARTIAL",
"partialUpsertStrategies": {
"deleted": "OVERWRITE",
"deletedAt": "OVERWRITE"
},
"hashFunction": "NONE"
},
"isDimTable": false
}
}
Here's the first message consumed:
Key: {"responseId": "52d96a0d-92ea-4103-9ea9-536252324481"}
Value:
{
"responseId": "52d96a0d-92ea-4103-9ea9-536252324481",
"formId": "7bd28941-f9e4-45f1-a801-5c7d647cc6cd",
"channelId": "60d11312-0e01-48d8-acce-4871b8d2365b",
"channelPlatform": "app",
"companyId": "00ca0142-5634-57e6-8d44-61427ea4b13d",
"submitted": true,
"deleted": "false",
"createdAt": "2021-05-21T12:55:54.000+0000",
"operationDate": "2021-05-21T12:55:54.000+0000"
}
Here's the second message consumed:
Key: {"responseId": "52d96a0d-92ea-4103-9ea9-536252324481"}
Value:
{
"responseId": "52d96a0d-92ea-4103-9ea9-536252324481",
"deleted": "true",
"deletedAt": "2021-10-21T12:55:54.000+0000",
"operationDate": "2021-05-21T12:55:54.000+0000"
}
Anish Nair
12/06/2021, 5:00 PM# Pinot Role
pinot.service.role=CONTROLLER
# Pinot Cluster name
pinot.cluster.name=MAX-Pinot
# Pinot Zookeeper Server
pinot.zk.server=c81:2181
# Use hostname as Pinot Instance ID other than IP
pinot.set.instance.id.to.hostname=true
# Pinot Controller Port
controller.port=9000
# Pinot Controller VIP Host
controller.vip.host=c81
# Pinot Controller VIP Port
controller.vip.port=9000
# Location to store Pinot Segments pushed from clients
controller.data.dir=<hdfs://nameservice1/data/max/poc/hdfs/controller/>
controller.task.frequencyPeriod=3600
controller.local.temp.dir=/opt/pinot/host/
controller.enable.split.commit=true
controller.access.protocols.http.port=9000
controller.helix.cluster.name=MAX-Pinot
pinot.controller.segment.fetcher.protocols=file,http,hdfs
pinot.controller.storage.factory.class.hdfs=org.apache.pinot.plugin.filesystem.HadoopPinotFS
pinot.controller.storage.factory.hdfs.hadoop.conf.path=/opt/pinot/hadoop/etc/hadoop
pinot.controller.segment.fetcher.hdfs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
pinot.server.grpc.enable=true
Server Config:
# Pinot Role
pinot.service.role=SERVER
# Pinot Cluster name
pinot.cluster.name=MAX-Pinot
# Pinot Zookeeper Server
pinot.zk.server=c81:2181
# Use hostname as Pinot Instance ID other than IP
pinot.set.instance.id.to.hostname=true
# Pinot Server Netty Port for queris
pinot.server.netty.port=8098
# Pinot Server Admin API port
pinot.server.adminapi.port=8097
# Pinot Server Data Directory
pinot.server.instance.dataDir=/opt/pinot/host/data/server/index
# Pinot Server Temporary Segment Tar Directory
pinot.server.instance.segmentTarDir=/opt/pinot/host/data/server/segmentTar
pinot.server.consumerDir=/opt/pinot/host/data/server/consumer
pinot.server.instance.enable.split.commit=true
pinot.server.instance.reload.consumingSegment=true
pinot.server.segment.fetcher.protocols=file,http,hdfs
pinot.server.segment.fetcher.hdfs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
pinot.server.storage.factory.class.hdfs=org.apache.pinot.plugin.filesystem.HadoopPinotFS
pinot.server.storage.factory.hdfs.hadoop.conf.path=/opt/pinot/hadoop/etc/hadoop
pinot.server.grpc.enable=true
pinot.server.grpc.port=8090
pinot.server.query.executor.timeout=100000
pinot.server.instance.realtime.alloc.offheap=true
Tiger Zhao
12/06/2021, 10:43 PMselect * from pinot.default.table
seem to just run forever without returning any results.
But queries like select max(col) from pinot.default.table
seem to run fine. It looks like I have to do some sort of aggregation or group by in order for the query to run. I can't seem to just select rows. Is this behavior expected?Vishal Garg
12/07/2021, 5:44 AMAhmed Shehata
12/07/2021, 8:58 AMJonathan Meyer
12/07/2021, 12:52 PMElon
12/07/2021, 6:00 PMCOMPLETED
segments to have numInstancesPerPartition
= 0 (so it can use all instances). Is upsert compatible with pool based instance assignment?xtrntr
12/08/2021, 7:06 AMLIMIT
way beyond that; any way to remove this? currently using the latest-jdk11
image for pinot on kubernetes
unfortunately, i can’t seem to use IN_SUBQUERY
to represent the userid set, so on the client side i break my pinot queries into
1. fetch userids query (using a GROUP BY + HAVING query). sometimes i may get more than a million user ids
2. do final queryAhmed Shehata
12/08/2021, 10:29 AMtroywinter
12/08/2021, 3:07 PMPriyank Bagrecha
12/08/2021, 7:53 PMAli Atıl
12/09/2021, 1:06 PMTiger Zhao
12/09/2021, 6:13 PMselect MAX(1639054811930692679) from table
returns 1.63905481193069261E18
. Is this behavior expected?Priyank Bagrecha
12/09/2021, 9:45 PMPriyank Bagrecha
12/09/2021, 10:24 PMThe deep store stores a compressed version of the segment files and it typically won't include any indexes.
will index always be in memory? is index re-computed when a server loads a segment from the deep store? is there a way to view size of the index?Tanmay Movva
12/10/2021, 3:39 AMselect * from pinot.default.table limit 10
This is the stacktrace of the error. Can anyone please help? Did anyone face a similar issue before?
java.lang.NullPointerException: null value in entry: Server_server-2.server-headless.pinot.svc.cluster.local_8098=null
at com.google.common.collect.CollectPreconditions.checkEntryNotNull(CollectPreconditions.java:32)
at com.google.common.collect.SingletonImmutableBiMap.<init>(SingletonImmutableBiMap.java:42)
at com.google.common.collect.ImmutableBiMap.of(ImmutableBiMap.java:72)
at com.google.common.collect.ImmutableMap.of(ImmutableMap.java:119)
at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:454)
at com.google.common.collect.ImmutableMap.copyOf(ImmutableMap.java:433)
at io.trino.plugin.pinot.PinotSegmentPageSource.queryPinot(PinotSegmentPageSource.java:221)
at io.trino.plugin.pinot.PinotSegmentPageSource.fetchPinotData(PinotSegmentPageSource.java:182)
at io.trino.plugin.pinot.PinotSegmentPageSource.getNextPage(PinotSegmentPageSource.java:150)
at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:311)
at io.trino.operator.Driver.processInternal(Driver.java:387)
at io.trino.operator.Driver.lambda$processFor$9(Driver.java:291)
at io.trino.operator.Driver.tryWithLock(Driver.java:683)
at io.trino.operator.Driver.processFor(Driver.java:284)
at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1076)
at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
at io.trino.$gen.Trino_362____20211126_004329_2.run(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Alexander Vivas
12/10/2021, 10:09 AMJeff Moszuti
12/10/2021, 3:44 PMselect * from transcript limit 10
. As soon as I upload a realtime table config and schema (https://github.com/npawar/pinot-tutorial/tree/master/transcript#upload-realtime-table-config-and-schema) only 3 rows are returned when running the same SQL statement. I do however see 4 rows if I query the offline table e.g. select * from transcript_OFFLINE limit 10
. What could be the reason?Weixiang Sun
12/10/2021, 10:15 PMSergey Bondarev
12/13/2021, 11:49 AMzookeeper.request.timeout value is 0. feature enabled=
Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
Socket error occurred: localhost/127.0.0.1:2181: Connection refused
while running the command
./pinot-admin.sh StartController
Any idea what could go wrong?Luis Fernandez
12/13/2021, 4:46 PM2021-12-13 11:31:40
java.lang.OutOfMemoryError: Direct buffer memory
2021-12-13 11:31:40
Caught exception while handling response from server: pinot-server-1_R
we currently have 2 brokers, currently doing a lot of garbage collection i’m unaware as to why. latency from broker to server has been severed by a lot but I’m not sure what happened as to we haven’t been touching the pinot cluster lately, we did stop one of our apps from streaming but that doesn’t line up with the spikes on response times.Mahesh babu
12/14/2021, 11:12 AMPrashant Pandey
12/15/2021, 9:30 AMMark Needham
controller.zk.str
property in the conf fileSyed Akram
12/15/2021, 12:04 PMVedran Krtalić
12/15/2021, 12:10 PM"ingestionConfig": {
"transformConfigs": [
{
"columnName": "id",
"transformFunction": "JSONPATHSTRING(transaction, '$.id', 'null')"
},
.
.
.,
{
"columnName": "ts",
"transformFunction": "JSONPATHLONG(transaction, '$.ts.date', 0)"
}{
"columnName": "daysTs",
"transformFunction": "toEpochDays(ts)"
},
{
"columnName": "hoursTs",
"transformFunction": "toEpochHours(ts)"
}
Derived columns are in schema (bottom two):
"dateTimeFieldSpecs": [
{
"name": "ts",
"dataType": "LONG",
"format": "1:MILLISECONDS:EPOCH",
"granularity": "1:MILLISECONDS"
},
{
"name": "cat",
"dataType": "LONG",
"format": "1:MILLISECONDS:EPOCH",
"granularity": "1:MILLISECONDS"
},
{
"name": "agn_o",
"dataType": "LONG",
"format": "1:MILLISECONDS:EPOCH",
"granularity": "1:MILLISECONDS"
},
{
"name": "_sourceTimestamp",
"dataType": "LONG",
"format": "1:MILLISECONDS:EPOCH",
"granularity": "1:MILLISECONDS"
},
{
"name": "daysTs",
"dataType": "LONG",
"format": "1:DAYS:EPOCH",
"granularity": "1:DAYS"
},
{
"name": "hoursTs",
"dataType": "LONG",
"format": "1:HOURS:EPOCH",
"granularity": "1:HOURS"
}
hoursTs is aggreate dimension in startree index as follows:
"starTreeIndexConfigs": [
{
"dimensionsSplitOrder": [
"hoursTs",
"c",
"ty",
"cu",
"dp",
"ag"
],
"skipStarNodeCreationForDimensions": [],
"functionColumnPairs": [
"SUM__a",
"COUNT__id",
"SUM__ra",
"SUM__rp",
"COUNT__*"
],
"maxLeafRecords": 10000
}
]
daysTs is blomm filter index as follows:
"bloomFilterColumns": [
"id",
"daysTs"
]
Are we missing something?Jonathan Meyer
12/15/2021, 2:17 PMingestFromFile
endpoint between 0.8.0 and 0.9.1 ?
We're now getting HTTP 500 with:
Maybe an issue related to null support ?