Apache Pinot #troubleshooting

Elon

12/04/2020, 1:54 AM

Hi, we had a server go into a gc loop where it wasn't reducing the heap (only 1 server, the other 5 are fine). Then we noticed that 3 out of 6 of our servers had 2x the amount of data for a table (i.e. 300gb vs 150gb). I am running a rebalance now. Is there anything we can do to even out the disk space among all the servers? We also have replicas per partition set to 3, but we have 6 servers, should we increase replicas to 6, or reduce replicas per partition to 2?

João Comini

12/04/2020, 3:10 PM

Hello guys! I'm having some troubles while running a hybrid table, may someone help me please? I'm receiving these warnings in the Broker when pushing offline segments to Pinot:

Copy code

[BaseBrokerRequestHandler] [jersey-server-managed-async-executor-1] Failed to find time boundary info for hybrid table: transaction

When I try to run a query, i get a timeout. Server log:

Copy code

Timed out while polling results block, numBlocksMerged: 0 (query: QueryContext{_tableName='transaction_REALTIME', _selectExpressions=[count(*)], _aliasMap={}, _filter=transactionDate > '1606971455132', _groupByExpressions=null, _havingFilter=null, _orderByExpressions=null, _limit=10, _offset=0, _queryOptions={responseFormat=sql, groupByMode=sql, timeoutMs=9999}, _debugOptions=null, _brokerRequest=BrokerRequest(querySource:QuerySource(tableName:transaction_REALTIME), filterQuery:FilterQuery(id:0, column:transactionDate, value:[(1606971455132		*)], operator:RANGE, nestedFilterQueryIds:[]), aggregationsInfo:[AggregationInfo(aggregationType:COUNT, aggregationParams:{column=*}, isInSelectList:true, expressions:[*])], filterSubQueryMap:FilterQueryMap(filterQueryMap:{0=FilterQuery(id:0, column:transactionDate, value:[(1606971455132		*)], operator:RANGE, nestedFilterQueryIds:[])}), queryOptions:{responseFormat=sql, groupByMode=sql, timeoutMs=9999}, pinotQuery:PinotQuery(dataSource:DataSource(tableName:transaction_REALTIME), selectList:[Expression(type:FUNCTION, functionCall:Function(operator:COUNT, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:*))]))], filterExpression:Expression(type:FUNCTION, functionCall:Function(operator:GREATER_THAN, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:transactionDate)), Expression(type:LITERAL, literal:<Literal longValue:1606971455132>)]))), limit:10)})

If I try to use

Tracing

i get a NPE in the offline servers:

Copy code

ERROR [QueryScheduler] [pqr-0] Encountered exception while processing requestId 83 from broker Broker_pinot-broker-0.pinot-broker-headless.pinot.svc.cluster.local_8099
java.lang.NullPointerException: null
	at org.apache.pinot.core.util.trace.TraceContext.getTraceInfo(TraceContext.java:188) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21]
	at org.apache.pinot.core.query.executor.ServerQueryExecutorV1Impl.processQuery(ServerQueryExecutorV1Impl.java:235) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21]
	at org.apache.pinot.core.query.executor.QueryExecutor.processQuery(QueryExecutor.java:60) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21]
	at org.apache.pinot.core.query.scheduler.QueryScheduler.processQueryAndSerialize(QueryScheduler.java:155) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21]

I'm running Pinot 0.6.0 btw,

Tanmay Movva

12/07/2020, 5:06 PM

Hello, I am facing issues with setting the consumer configs for kafka in table config. I am using the image with

latest

tag. I tried by using the

stream.kafka

stream.kafka.consumer.prop

as prefixes both did not work.

Elon

12/07/2020, 5:33 PM

Hi all, we are still seeing spikes in broker query latency using the new g1 settings... after taking heap dumps, histo's, pmaps, etc. it looks like it happens when the soft references to direct buffers are cleared out. Can we create a channel to talk about this, and I can post my findings there? Or just a google doc? lmk. I feel like we are close to solving this 🙂

Xiang Fu

12/08/2020, 5:30 AM

which means you don't need to specify PLUGINS_DIR in JAVA_OPTS

👍 1

lâm nguyễn hoàng

12/08/2020, 5:46 PM

please help me

Derek

12/09/2020, 8:55 PM

does pinot handle kafka transactions and ignore uncommitted messages? we're setting

"stream.kafka.consumer.prop.auto.isolation.level": "read_committed",

in our realtime table, but it also seems like it is processing uncommitted messages

Ken Krugler

12/09/2020, 11:07 PM

If I see the table status as “bad” in the pinot ui (hostname:9000/#/tables), what’s the right way to figure out what’s wrong?

Ken Krugler

12/10/2020, 7:34 PM

It looks like there’s a max length for string fields of 512…or is data being truncated in the query response?

Tanmay Movva

12/11/2020, 5:13 AM

Hello, how frequently are the (jmx)metrics emitted by pinot? And is this configurable by the user?

Ken Krugler

12/11/2020, 8:29 PM

When running a data ingestion job where the table spec includes a star tree index, I see output lines like: Generated 1623374 star-tree records from 3291903 segment records Finished creating aggregated documents, got -1824996 aggregated records. Wondering why it’s reporting a negative number of aggregated records…

Playsted

12/12/2020, 5:04 PM

I've noticed if I delete segments from the UI it only removes them from ZK but not deep storage. The next time I run an ingestion job for the table unrelated to the deleted segments it re-adds them to the table. Is this expected? Am I missing something?

Tanmay Movva

12/14/2020, 2:41 PM

Hello, I’ve set replicas per partition to 1 for llc streaming ingestion. Whenever pinot fails to ingest records from kafka, (in our case it is schema registry restarts) it throws error and set the segment state to offline. Even after the issue is resolved, I don’t see the consumption being resumed/retried. I tried triggering the reload of the offline segments but it did not have any affect. What else can I do to resume consumption?

Ken Krugler

12/15/2020, 1:32 AM

Hey all, I’m now running a segment generation/push that’s using HDFS for input/output. The relevant bits in the job file for input/output dir are:

Copy code

inputDirURI: 'hdfs://<clustername>/user/hadoop/pinot-input/'
includeFileNamePattern: 'glob:**/us_*.gz'
outputDirURI: 'hdfs://<clustername>/user/hadoop/pinot-segments/'

When I run the job, segments are generated, but then each segment fails with something like:

Copy code

Failed to generate Pinot segment for file - hdfs:/user/hadoop/pinot-input/us_2020-03_03.gz
java.lang.IllegalStateException: Unable to extract out the relative path based on base input path: hdfs://<clustername>/user/hadoop/pinot-input/

So it looks like the input file URI is getting the authority (

<clustername>

) stripped out, which is why the

baseInputDir.relativize(inputFile)

call fails to generate appropriate results in

SegmentGenerationUtils.getRelativeOutputPath

. Or is there something else I need to be doing here to get this to work properly? I’m able to read the files, so the

inputDirURI

is set up properly (along with HDFS jars).

Elon

12/17/2020, 8:13 PM

Hi, we wanted to know if changing kafka consumer properties in the realtime config requires restarting the servers so that the consumer can pick up the new properties. Anyone familiar with this?

Taran Rishit

12/18/2020, 4:46 PM

{ "schema": {"type":"struct","fields":[{"type":"string","optional":true,"field":"name"},{"type":"string","optional":true,"field":"vhnumber"},{"type":"string","optional":true,"field":"phnnumber"},{"type":"int32","optional":false,"field":"id"},{"type":"string","optional":true,"field":"password"},{"type":"string","optional":true,"field":"vehicleType"},{"type":"int32","optional":true,"field":"status_id"}] ,"optional":false,"name":"driver"}, "payload":"name":"ss","vhnumber":"123","phnnumber":"123","id":17,"password":"2060","vehicleType":"ppol","status_id":10}} } this is the kafka event in consumer how do i convert this to a pinot schema the data that is needed is only the "payload" attribute how do i write custom decoder for it?

dhurandar

12/18/2020, 5:51 PM

Our data has more than 400 different dimensions. Cube only has 25 of them, but we are planning to increase it, We are aware that adding a new dimension would increase volume with Cardinality of the new dimension (in the worst case). Is there a recommendation on the number of the dimensions too ?? As in how many dimensions I can add around the "group by".

balci

12/18/2020, 9:12 PM

Hi folks. I’m trying to debug this github-actions failure in my PR. It seems like mvn testCompile is failing when compiling one of my tests, because it cannot find a symbol from a dependency package (pinot-spi):

Copy code

Error:  COMPILATION ERROR : 
[INFO] -------------------------------------------------------------
Error:  /home/runner/work/incubator-pinot/incubator-pinot/pinot-core/src/test/java/org/apache/pinot/core/util/TableConfigUtilsTest.java:[450,45] cannot find symbol
  symbol:   variable BATCH_TYPE
  location: class org.apache.pinot.spi.ingestion.batch.BatchConfigProperties
Error:  /home/runner/work/incubator-pinot/incubator-pinot/pinot-core/src/test/java/org/apache/pinot/core/util/TableConfigUtilsTest.java:[452,35] cannot find symbol
  symbol:   method constructBatchProperty(java.lang.String,java.lang.String)
  location: class org.apache.pinot.spi.ingestion.batch.BatchConfigProperties
...

It is interesting because the test I added is almost a copy of an existing test case using same symbols (ingestionBatchConfigTest). Does anyone have any insight into what might have gone wrong? @Neha Pawar I noticed you added the test ‘ingestionBatchConfigsTest’ recently, curious if you had a similar issue. Thanks.

Punish Garg

12/21/2020, 11:28 AM

Hi team, I am using official docker image of Apache pinot referring to this doc https://github.com/apache/incubator-pinot/tree/master/docker/images/pinot My broker and server container is going down without any exception. can someone help me to look out this issue Error log at zk side is:

Copy code

EndOfStreamException: Unable to read additional data from client sessionid 0x17685043a2e001d, likely client has closed socket

Elon

12/21/2020, 9:40 PM

Hi, anyone familiar with setting generic kafka properties, ex.

stream.kafka.consumer.prop.isolation.level

group_id

client_id

, etc. - it looks like only a specific list properties are honored, like

stream.kafka.topic.name

stream.kafka.decoder.class.name

... - I can create a github issue, lmk.

Yash Agarwal

12/22/2020, 1:40 PM

Hi, How would I go about creating a custom filesystem plugin ?

Laxman Ch

12/23/2020, 1:49 PM

Hi, anyone facing issues with Segment purging with GCS as deep store.

Elon

12/24/2020, 6:01 AM

Hi, our brokers are taking 25s (max time) to return queries, but direct server queries return instantly. I took heap dumps, pmaps, etc. and the one thing that stands out is jstack output. Looks like HelixTaskExecutor threads are all waiting on the same object. Anyone ever see this behavior?

lâm nguyễn hoàng

12/28/2020, 7:09 PM

hi team ... Now I see a problem with the realtime table ... every time I count the number of rows of a table, it changes continuously ... do you know what's wrong

lâm nguyễn hoàng

12/28/2020, 7:09 PM

{ "REALTIME": { "tableName": "ERP_ERP_PM_INPUTVOUCHERDETAIL_REALTIME", "tableType": "REALTIME", "segmentsConfig": { "schemaName": "ERP_ERP_PM_INPUTVOUCHERDETAIL", "timeType": "MILLISECONDS", "retentionTimeUnit": "DAYS", "retentionTimeValue": "2", "segmentPushFrequency": "DAILY", "segmentPushType": "APPEND", "timeColumnName": "WARRANTYDATE", "replication": "4", "replicasPerPartition": "4" }, "tenants": { "broker": "DefaultTenant", "server": "inventory" }, "tableIndexConfig": { "streamConfigs": { "streamType": "kafka", "stream.kafka.consumer.type": "lowlevel", "stream.kafka.topic.name": "PINOT.ERP.ERP.PM_INPUTVOUCHERDETAIL", "stream.kafka.table.tablename": "ERP.PM_INPUTVOUCHERDETAIL", "stream.kafka.table.part.pattern": "_[0-9]{2}_[0-9]{4}", "stream.kafka.cdc.format": "CDC", "stream.kafka.decoder.class.name": "com.mwg.pinot.realtime.KafkaCDCMessageDecoder", "stream.kafka.consumer.factory.class.name": "com.mwg.pinot.realtime.KafkaCDCConsumerFactory", "notify.line.token": "aEZ1nmvqGjhDkuKMO0ghZaAFAVyvoszUjFYJG4Vobc9", "stream.kafka.broker.list": "datastore-broker01-kafka-ovm-6-769092,datastore broker02 kafka ovm 6 779093,datastore-broker03-kafka-ovm-6-789094,datastore broker04 kafka ovm 6 1349095,datastore-broker05-kafka-ovm-6-1359096,datastore broker06 kafka ovm 6 1369097,datastore-broker07-kafka-ovm-6-1209098,datastore broker08 kafka ovm 6 1219099,datastore-broker09-kafka-ovm-6-1229101,datastore broker10 kafka ovm 6 1239102", "stream.kafka.consumer.prop.auto.offset.reset": "smallest", "realtime.segment.flush.threshold.rows": "0", "realtime.segment.flush.threshold.time": "900000", "realtime.segment.flush.threshold.segment.size": "50M", "group.id": "ERP.PM_INPUTVOUCHERDETAIL2-PINOT_INGESTION", "max.partition.fetch.bytes": "167772160", "receive.buffer.bytes": "67108864", "isolation.level": "read_committed", "max.poll.records": "5000" }, "autoGeneratedInvertedIndex": false, "createInvertedIndexDuringSegmentGeneration": false, "sortedColumn": [ "INPUTVOUCHERDETAILID" ], "loadMode": "MMAP", "enableDefaultStarTree": false, "enableDynamicStarTreeCreation": false, "aggregateMetrics": false, "nullHandlingEnabled": false }, "metadata": { "customConfigs": {} }, "routing": { "instanceSelectorType": "strictReplicaGroup" }, "upsertConfig": { "mode": "FULL" } } }

Jackie

12/28/2020, 7:16 PM

Also, seems the table is not partitioned as well. @Yupeng Fu I remember there is a check on the table config to enforce the partitioning?

Daniel Lavoie

12/30/2020, 4:51 PM

Mind running a

ps aux

so we get confirmation of the exact arguments that where provided to the jvm process?

Elon

01/04/2021, 10:48 PM

Happy new year everyone! We are experiencing a server that seems to be "stuck" - it can process raw server queries but in QueryScheduler it appears unable to get a permit - we have a rate of 10000 queries/second, it never enters this block:

Copy code

if (queryLogRateLimiter.tryAcquire() || forceLog(schedulerWaitMs, numDocsScanned)) {
      <http://LOGGER.info|LOGGER.info>("Processed requestId={},table={},segments(queried/processed/matched/consuming)={}/{}/{}/{},"
              + "schedulerWaitMs={},reqDeserMs={},totalExecMs={},resSerMs={},totalTimeMs={},minConsumingFreshnessMs={},broker={},"
              + "numDocsScanned={},scanInFilter={},scanPostFilter={},sched={}", requestId, tableNameWithType,

Ken Krugler

01/08/2021, 1:24 AM

If I do a query with a

where mvfield in ('a', 'b') group by mvfield

, and

mvfield

is a multi-valued field, I get a result with groups for values from

mvfield

that aren’t in my where clause. I assume I’m getting groups for every value found in

mvfield

from rows where

mvfield

contains a match for my filter, but it seems wrong…am I missing something?

Yash Agarwal

01/08/2021, 5:04 AM

Is there a way we can do mode calculation in Pinot ?