Pankaj Thakkar
12/15/2021, 8:01 PMTiger Zhao
12/15/2021, 8:15 PMPriyank Bagrecha
12/15/2021, 10:05 PM2021/12/15 19:09:59.854 ERROR [GroupCommit] [HelixTaskExecutor-message_handle_STATE_TRANSITION] Interrupted while committing change, key: /pinot-poc/INSTANCES/Server_10.220.12.85_8098/CURRENTSTATES/100000abfb404c3/km_mp_play_startree_REALTIME, record: km_mp_play_startree_REALTIME, {}{}{}
java.lang.InterruptedException: null
at java.lang.Object.wait(Native Method) ~[?:?]
at org.apache.helix.GroupCommit.commit(GroupCommit.java:163) [pinot-all-0.9.1-jar-with-dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961]
at org.apache.helix.manager.zk.ZKHelixDataAccessor.updateProperty(ZKHelixDataAccessor.java:189) [pinot-all-0.9.1-jar-with-dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961]
at org.apache.helix.manager.zk.ZKHelixDataAccessor.updateProperty(ZKHelixDataAccessor.java:177) [pinot-all-0.9.1-jar-with-dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961]
at org.apache.helix.messaging.handling.HelixStateTransitionHandler.preHandleMessage(HelixStateTransitionHandler.java:164) [pinot-all-0.9.1-jar-with-dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961]
at org.apache.helix.messaging.handling.HelixStateTransitionHandler.handleMessage(HelixStateTransitionHandler.java:330) [pinot-all-0.9.1-jar-with-dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961]
at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:97) [pinot-all-0.9.1-jar-with-dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961]
at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49) [pinot-all-0.9.1-jar-with-dependencies.jar:0.9.1-f8ec6f6f8eead03488d3f4d0b9501fc3c4232961]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:829) [?:?]
Nicholas Yu
12/16/2021, 2:17 AMspark-submit
--master yarn
--class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand
s3://<bucket_name>/lib/pinot-all-0.9.0-jar-with-dependencies.jar -jobSpecFile s3://<bucket_name>/jobs/<table_name>/job.yaml
but i’m getting a java.lang.NoSuchMethodException: org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.mainAnish Nair
12/16/2021, 8:02 AM{
"segment.creation.time": "1639641347711",
"segment.flush.threshold.size": "100000",
"segment.name": "max_reporting_aggregations__0__0__20211216T0755Z",
"segment.realtime.numReplicas": "2",
"segment.realtime.startOffset": "598962485",
"segment.realtime.status": "IN_PROGRESS",
"segment.table.name": "max_reporting_aggregations",
"segment.type": "REALTIME"
}
Ali Atıl
12/16/2021, 10:59 AM{
"data": {
"employee": {
"name": "ali",
"salary": 56000,
"married": true,
"messageTime": 1639652167
}
}
}
Example schema:
{
"schemaName": "employee",
"dimensionFieldSpecs": [
{
"name": "name",
"dataType": "STRING"
},
{
"name": "salary",
"dataType": "DOUBLE"
},
{
"name": "married",
"dataType": "BOOLEAN"
}
],
"metricFieldSpecs": [],
"dateTimeFieldSpecs": [
{
"name": "messageTime",
"dataType": "LONG",
"format": "1:MILLISECONDS:EPOCH",
"granularity": "1:MILLISECONDS"
}
]
}
Map
12/16/2021, 12:27 PMpinot.set.instance.id.to.hostname
changed in recent versions? Dones’t seem to be working any more and IPs are used insteadJonathan Meyer
12/16/2021, 4:33 PMTiger Zhao
12/16/2021, 4:53 PMLuis Fernandez
12/16/2021, 9:44 PMCaught exception while processing query: QueryContext{_tableName='ads_metrics_REALTIME', _selectExpressions=[listing_id, sum(click_count), sum(impression_count), sum(cost), sum(order_count), sum(revenue)], _aliasList=[null, null, null, null, null, null], _filter=(shop_id = '25746445' AND serve_time BETWEEN '1637125200' AND '1639717199'), _groupByExpressions=[listing_id], _havingFilter=null, _orderByExpressions=null, _limit=6000, _offset=0, _queryOptions={responseFormat=sql, groupByMode=sql, timeoutMs=9999}, _debugOptions=null, _brokerRequest=BrokerRequest(querySource:QuerySource(tableName:ads_metrics_REALTIME), pinotQuery:PinotQuery(dataSource:DataSource(tableName:ads_metrics_REALTIME), selectList:[Expression(type:IDENTIFIER, identifier:Identifier(name:listing_id)), Expression(type:FUNCTION, functionCall:Function(operator:SUM, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:click_count))])), Expression(type:FUNCTION, functionCall:Function(operator:SUM, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:impression_count))])), Expression(type:FUNCTION, functionCall:Function(operator:SUM, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:cost))])), Expression(type:FUNCTION, functionCall:Function(operator:SUM, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:order_count))])), Expression(type:FUNCTION, functionCall:Function(operator:SUM, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:revenue))]))], filterExpression:Expression(type:FUNCTION, functionCall:Function(operator:AND, operands:[Expression(type:FUNCTION, functionCall:Function(operator:EQUALS, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:shop_id)), Expression(type:LITERAL, literal:<Literal longValue:25746445>)])), Expression(type:FUNCTION, functionCall:Function(operator:BETWEEN, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:serve_time)), Expression(type:LITERAL, literal:<Literal longValue:1637125200>), Expression(type:LITERAL, literal:<Literal longValue:1639717199>)]))])), groupByList:[Expression(type:IDENTIFIER, identifier:Identifier(name:listing_id))], orderByList:[], limit:6000, queryOptions:{responseFormat=sql, groupByMode=sql, timeoutMs=9999}))}
java.lang.ArrayIndexOutOfBoundsException: null
this started happening out of nowhere so i’m unsure as to what’s happening has anyone gotten this similar kind of error? I also don’t have the stack trace sadly i don’t know why it isn’t being logged 😞 also in general our p99 response times have been impactedElon
12/16/2021, 10:04 PMElon
12/16/2021, 11:15 PMpeerSegmentDownloadScheme
= 'http'), and see "isSplitCommitType":true
log messages in the server but noticed the controller still contains those segments in the temp directory /var/pinot/controller/data/untarredFileTemp
- does that indicate we don't have something configured incorrectly?Zsolt Takacs
12/17/2021, 2:26 PMPriyank Bagrecha
12/17/2021, 6:41 PMTao Hu
12/17/2021, 6:44 PM{
"message": "QueryExecutionError:\njava.lang.NullPointerException\n\tat org.apache.pinot.core.operator.filter.TextMatchFilterOperator.getNextBlock(TextMatchFilterOperator.java:45)\n\tat org.apache.pinot.core.operator.filter.TextMatchFilterOperator.getNextBlock(TextMatchFilterOperator.java:30)\n\tat org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:49)\n\tat org.apache.pinot.core.operator.DocIdSetOperator.getNextBlock(DocIdSetOperator.java:62)",
"errorCode": 200
}
Here is my table config
{
"tableName": "dimBaseballTeams",
"tableType": "OFFLINE",
"isDimTable": true,
"segmentsConfig": {
"segmentPushType": "REFRESH",
"replication": "1"
},
"tableIndexConfig": {
"noDictionaryColumns": [
"teamName"
]
},
"fieldConfigList": [
{
"name": "teamName",
"encodingType": "RAW",
"indexType": "TEXT"
}
],
"tenants": {},
"metadata": {
"customConfigs": {}
}
}
Luis Fernandez
12/17/2021, 7:15 PMNicholas Yu
12/20/2021, 12:40 AMeywek
12/20/2021, 12:55 PMNOT LIKE
doesn’t work, do you know why?
i.e. I’m having results with
select * from datasource_61c064dc1c9900030074e5f3 where JSONEXTRACTSCALAR("labels", '$.locale', 'STRING') LIKE '%fr_FR%' limit 10
but 0 results with
select * from datasource_61c064dc1c9900030074e5f3 where JSONEXTRACTSCALAR("labels", '$.locale', 'STRING') NOT LIKE '%en_US%' limit 10
where labels is
{
"locale": "fr_FR",
"brand": "undiz"
}
Thank youWeixiang Sun
12/20/2021, 7:14 PM"dateTimeFieldSpecs": [
{
"name": "timestamp",
"dataType": "LONG",
"defaultNullValue": 0,
"format": "1:MILLISECONDS:EPOCH",
"granularity": "1:MILLISECONDS"
},
{
"name": "timestamp_seconds",
"dataType": "LONG",
"defaultNullValue": 0,
"transformFunction": "toEpochSecondsRounded(timestamp, 1)",
"format": "1:SECONDS:EPOCH",
"granularity": "1:SECONDS"
}
]
I got the following error:
{
"code": 400,
"error": "Cannot add invalid schema: schema_name. Reason: Exception in getting arguments for transform function 'toEpochSecondsRounded(timestamp, 1)' for column 'timestamp_seconds'"
}
What is the wrong?Ayush Kumar Jha
12/21/2021, 5:11 AMALL_JAVA_OPTS="-javaagent:jmx_prometheus_javaagent-0.12.0.jar=8088:pinot.yml -Xms4G -Xmx4G -XX:MaxDirectMemorySize=30g -Dlog4j2.configurationFile=conf/pinot-admin-log4j2.xml -Dplugins.dir=$BASEDIR/plugins"
sudo bin/pinot-admin.sh StartController -configFileName /home/centos/controller.conf
But could not access the metrics at 8088 portDiana Arnos
12/21/2021, 3:23 PMstream.kafka.consumer.group.id
and `stream.kafka.group.id`as for the Kafka docs and it still creates the consumer group with id as 0 😞Stav Gayer
12/21/2021, 3:37 PMpinot.max-rows-per-split-for-segment-queries=1000000
pinot.request-timeout=1m
)
as you can see, “broker jvm used” graph continues to rise and fall even when there are no requests to the server at all, and this did not stop until I deleted the pods
The same thing happened to the server
In addition, Trino is almost unusable
Simple query like
select * from bi_test_table where page_url like '%google%' and epoch_ts >= 1639440000
AND epoch_ts < 1640044800
get a timeout after a minute and sometimes crash the servers
Where am I wrong?Weixiang Sun
12/21/2021, 6:11 PMWeixiang Sun
12/22/2021, 3:04 AMLuis Fernandez
12/22/2021, 2:30 PMCaught exception while processing query: QueryContext{_tableName='ads_metrics_REALTIME', _selectExpressions=[listing_id, sum(click_count), sum(impression_count), sum(cost), sum(order_count), sum(revenue)], _aliasList=[null, null, null, null, null, null], _filter=(shop_id = '25746445' AND serve_time BETWEEN '1637125200' AND '1639717199'), _groupByExpressions=[listing_id], _havingFilter=null, _orderByExpressions=null, _limit=6000, _offset=0, _queryOptions={responseFormat=sql, groupByMode=sql, timeoutMs=9999}, _debugOptions=null, _brokerRequest=BrokerRequest(querySource:QuerySource(tableName:ads_metrics_REALTIME), pinotQuery:PinotQuery(dataSource:DataSource(tableName:ads_metrics_REALTIME), selectList:[Expression(type:IDENTIFIER, identifier:Identifier(name:listing_id)), Expression(type:FUNCTION, functionCall:Function(operator:SUM, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:click_count))])), Expression(type:FUNCTION, functionCall:Function(operator:SUM, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:impression_count))])), Expression(type:FUNCTION, functionCall:Function(operator:SUM, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:cost))])), Expression(type:FUNCTION, functionCall:Function(operator:SUM, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:order_count))])), Expression(type:FUNCTION, functionCall:Function(operator:SUM, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:revenue))]))], filterExpression:Expression(type:FUNCTION, functionCall:Function(operator:AND, operands:[Expression(type:FUNCTION, functionCall:Function(operator:EQUALS, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:shop_id)), Expression(type:LITERAL, literal:<Literal longValue:25746445>)])), Expression(type:FUNCTION, functionCall:Function(operator:BETWEEN, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:serve_time)), Expression(type:LITERAL, literal:<Literal longValue:1637125200>), Expression(type:LITERAL, literal:<Literal longValue:1639717199>)]))])), groupByList:[Expression(type:IDENTIFIER, identifier:Identifier(name:listing_id))], orderByList:[], limit:6000, queryOptions:{responseFormat=sql, groupByMode=sql, timeoutMs=9999}))}
java.lang.ArrayIndexOutOfBoundsException: null
more information, it seems to have something to do with my serve_time filter, the longer i look for data (more than 3 days) the more likely i’m to get this error.Diogo Baeder
12/22/2021, 7:40 PMVibhor Jaiswal
12/28/2021, 10:26 PMPriyank Bagrecha
12/28/2021, 10:55 PMevent_ts_5_min
which represents start of a 5 minute time bucket. would it even make sense to add inverted index on event_ts_5_min
? test_id
is a filter in all queries. I did read A sorted index performs much better than an inverted index, but it can only be applied to one column per table.
in the documentation but didn't think that i can't apply inverted index on other fields.Tao Hu
12/29/2021, 12:41 AMyelim yu
12/30/2021, 2:10 AM