Apache Pinot #troubleshooting

Join Slack

Alexander Vivas

03/12/2021, 2:20 PM

All my segments are ONLINE, all in DONE state, the last segment doesn't have any CONSUMING replicas

Alexander Vivas

03/12/2021, 2:20 PM

So, what can we do to fix this?

Deepak Kumar Mishra

03/16/2021, 4:22 AM

Is update query possible using pinot

Deepak Kumar Mishra

03/16/2021, 9:54 AM

can we make ‘date’ related column as part of primary key in pinot?

Ravikumar Maddi

03/16/2021, 4:17 PM

Hi All I am getting this error: I am trying to addTable with pinot-admin. Sending request: http://localhost:9001/schemas to controller: localhost, version: Unknown Got Exception to upload Pinot Schema: aschema Need help 🙂

Ali LeClerc

03/18/2021, 3:06 AM

@Ali LeClerc has left the channel

Ravikumar Maddi

03/19/2021, 12:43 PM

Can you help me -- how to check, sample data is valid to defined schema?

Harshvardhan Surolia

03/22/2021, 12:12 PM

@Harshvardhan Surolia has left the channel

Phúc Huỳnh

03/24/2021, 6:39 AM

Hi guys, i need to set group.id in LLC, but i cannot find any configuration for that. In my company, we need permission for specify group.id to consume message.

Copy code

2021/03/24 06:13:18.879 WARN [ConsumerCoordinator] [RuleLogsQC__2__1__20210319T0728Z] [Consumer clientId=consumer-12735, groupId=] Synchronous auto-commit of offsets {c1.elk.db-gamification-consumer-log.qc-2=OffsetAndMetadata{offset=17073, metadata=''}} failed: Not authorized to access group:

Ravikumar Maddi

03/24/2021, 8:09 AM

Hi All,

Ravikumar Maddi

03/24/2021, 9:41 AM

Hi All, I am using JSON Indexing, I have a scenario, In my persons json content, a column 'addresses' contains a list of addresses. Like this: Need Help 🙂

Copy code

{
					  "personId": "9878",
					  "addresses": [
						{
						  "doorNum": "45456",
						  "Street": "Washington Road",
						  "area": "sector-1"
						},
						{
						  "doorNum": "676756",
						  "Street": "Washington Road",
						  "area": "sector-2"
						},
						{
						  "doorNum": "768768",
						  "Street": "Washington Road",
						  "area": "sector-4"
						}
					  ]
				}
				{
					  "personId": "68768",
					  "addresses": [
						{
						  "doorNum": "45456",
						  "Street": "Washington Road",
						  "area": "sector-1"
						},
						{
						  "doorNum": "676756",
						  "Street": "Washington Road",
						  "area": "sector-2"
						},
						{
						  "doorNum": "768768",
						  "Street": "Washington Road",
						  "area": "sector-4"
						}
					  ]
				}

In Schema config file I mentioned like this:

Copy code

{
				  "name": "addresses",
				  "dataType": "STRING",
				  "maxLength": 2147483647,
				  "singleValueField": false
			},

In Table Config file I mentioned like this:

Copy code

"jsonIndexColumns": [
				"addresses"
			],

But I am not able to find data in Pinot Query Console, I am not able to find any error in any log. Need Help

Prashant Kumar

03/24/2021, 3:06 PM

Hi all, I'm trying to connect presto to pinot cluster using connector. But I'm facing this null pointer exception. Has anyone else faced this or give some hints to solve this.

Mohamed Sultan

03/25/2021, 7:27 AM

Need help!!

Charles

03/26/2021, 3:25 AM

Hi ALL

Charles

03/31/2021, 12:29 AM

image.png

Charles

03/31/2021, 12:29 AM

image.png

Charles

03/31/2021, 12:30 AM

Hi All, I got one issue, I select * from db to make sure we db have this item in db

Ravikumar Maddi

03/31/2021, 1:23 PM

Can somebody help me, I created schema(table) and trying to push data to that schema. but data(rows) not appearing in Query console. And I am not able to find any exceptions in the pinotBroker/Server/Controller logs. How can I find my data issues. 🙂

Kishore G

03/31/2021, 2:22 PM

colunm1 in schema and you specify column1 in query.

Alexander Vivas

03/31/2021, 2:25 PM

Guys, I need some help with our realtime tables. I added two directories within /opt/pinot/deployment/ for two different kafka cluster certs, one of those work as expected, but the second one even following the same steps and configuring everything the same way as the first one making sure it picks up its own kafka certs is not working, it drops this error:

Copy code

org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id 514
Caused by: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
	at sun.security.ssl.Alert.createSSLException(Alert.java:131) ~[?:1.8.0_282]
    ( . . . )
	at io.confluent.kafka.schemaregistry.client.rest.RestService.sendHttpRequest(RestService.java:212) ~[pinot-confluent-avro-0.7.0-SNAPSHOT-shaded.jar:0.7.0-SNAPSHOT-a44d0b1bb64d00d851ea6f2d8bc46ff0ab080d3e]
	at io.confluent.kafka.schemaregistry.client.rest.RestService.httpRequest(RestService.java:256) ~[pinot-confluent-avro-0.7.0-SNAPSHOT-shaded.jar:0.7.0-SNAPSHOT-a44d0b1bb64d00d851ea6f2d8bc46ff0ab080d3e]
	at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:486) ~[pinot-confluent-avro-0.7.0-SNAPSHOT-shaded.jar:0.7.0-SNAPSHOT-a44d0b1bb64d00d851ea6f2d8bc46ff0ab080d3e]
	at io.confluent.kafka.schemaregistry.client.rest.RestService.getId(RestService.java:479) ~[pinot-confluent-avro-0.7.0-SNAPSHOT-shaded.jar:0.7.0-SNAPSHOT-a44d0b1bb64d00d851ea6f2d8bc46ff0ab080d3e]
	at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getSchemaByIdFromRegistry(CachedSchemaRegistryClient.java:177) ~[pinot-confluent-avro-0.7.0-SNAPSHOT-shaded.jar:0.7.0-SNAPSHOT-a44d0b1bb64d00d851ea6f2d8bc46ff0ab080d3e]
	at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getBySubjectAndId(CachedSchemaRegistryClient.java:256) ~[pinot-confluent-avro-0.7.0-SNAPSHOT-shaded.jar:0.7.0-SNAPSHOT-a44d0b1bb64d00d851ea6f2d8bc46ff0ab080d3e]
	at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.getById(CachedSchemaRegistryClient.java:235) ~[pinot-confluent-avro-0.7.0-SNAPSHOT-shaded.jar:0.7.0-SNAPSHOT-a44d0b1bb64d00d851ea6f2d8bc46ff0ab080d3e]
	at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:107) ~[pinot-confluent-avro-0.7.0-SNAPSHOT-shaded.jar:0.7.0-SNAPSHOT-a44d0b1bb64d00d851ea6f2d8bc46ff0ab080d3e]
	at io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer.deserialize(AbstractKafkaAvroDeserializer.java:79) ~[pinot-confluent-avro-0.7.0-SNAPSHOT-shaded.jar:0.7.0-SNAPSHOT-a44d0b1bb64d00d851ea6f2d8bc46ff0ab080d3e]
	at io.confluent.kafka.serializers.KafkaAvroDeserializer.deserialize(KafkaAvroDeserializer.java:55) ~[pinot-confluent-avro-0.7.0-SNAPSHOT-shaded.jar:0.7.0-SNAPSHOT-a44d0b1bb64d00d851ea6f2d8bc46ff0ab080d3e]
	at org.apache.pinot.plugin.inputformat.avro.confluent.KafkaConfluentSchemaRegistryAvroMessageDecoder.decode(KafkaConfluentSchemaRegistryAvroMessageDecoder.java:114) ~[pinot-confluent-avro-0.7.0-SNAPSHOT-shaded.jar:0.7.0-SNAPSHOT-a44d0b1bb64d00d851ea6f2d8bc46ff0ab080d3e]
	at org.apache.pinot.plugin.inputformat.avro.confluent.KafkaConfluentSchemaRegistryAvroMessageDecoder.decode(KafkaConfluentSchemaRegistryAvroMessageDecoder.java:120) ~[pinot-confluent-avro-0.7.0-SNAPSHOT-shaded.jar:0.7.0-SNAPSHOT-a44d0b1bb64d00d851ea6f2d8bc46ff0ab080d3e]
	at org.apache.pinot.plugin.inputformat.avro.confluent.KafkaConfluentSchemaRegistryAvroMessageDecoder.decode(KafkaConfluentSchemaRegistryAvroMessageDecoder.java:53) ~[pinot-confluent-avro-0.7.0-SNAPSHOT-shaded.jar:0.7.0-SNAPSHOT-a44d0b1bb64d00d851ea6f2d8bc46ff0ab080d3e]
	at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.processStreamEvents(LLRealtimeSegmentDataManager.java:471) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-a44d0b1bb64d00d851ea6f2d8bc46ff0ab080d3e]
	at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.consumeLoop(LLRealtimeSegmentDataManager.java:402) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-a44d0b1bb64d00d851ea6f2d8bc46ff0ab080d3e]
	at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager$PartitionConsumer.run(LLRealtimeSegmentDataManager.java:538) [pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-a44d0b1bb64d00d851ea6f2d8bc46ff0ab080d3e]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

I had a look at the pods in our server instances and all of them had their certs in place, all the table configs have the right path in their properties and even so the second table (the one we need to consume form a different kafka cluster) doesn't work

Elon

04/01/2021, 6:54 AM

Does pinot have an issue parsing floating point literals w scale? i.e.

Copy code

select count(*) from mytable where (( DATETRUNC( 'hour',  created_at_seconds,  'seconds')) - ( DATETRUNC( 'hour',  CAST( 1.610354466173E9 as long),  'seconds'))) >=  0

does not work but if you take the

E9

away it works. Looks like the grammar only recognizes

Copy code

FLOATING_POINT_LITERAL : SIGN? DIGIT+ '.' DIGIT* | SIGN? DIGIT* '.' DIGIT+;

This is for pinot 0.6.0, did this change in 0.7.0?

Brian Olsen

04/10/2021, 1:17 AM

Loading in the csv data for the preset demo and I am hitting this snag on date columns with nulls.

Copy code

"dateTimeFieldSpecs": [
    {
      "name": "cdc_case_earliest_dt",
      "dataType": "STRING",
      "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy/MM/dd",
      "granularity": "1:DAYS"
    },
    {
      "name": "cdc_report_dt",
      "dataType": "STRING",
      "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy/MM/dd",
      "granularity": "1:DAYS"
    },
    {
      "name": "pos_spec_dt",
      "dataType": "STRING",
      "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy/MM/dd",
      "granularity": "1:DAYS"
    },
    {
      "name": "onset_dt",
      "dataType": "STRING",
      "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy/MM/dd",
      "granularity": "1:DAYS"
    }
    ]

With csv that has various dates that don't exist

Copy code

[ec2-user@aws ~]$ head /tmp/pinot-quick-start/covid-cases.csv
cdc_case_earliest_dt ,cdc_report_dt,pos_spec_dt,onset_dt,current_status,sex,age_group,race_ethnicity_combined,hosp_yn,icu_yn,death_yn,medcond_yn
2020/10/23,2021/01/28,2020/10/23,,Laboratory-confirmed case,Female,0 - 9 Years,"Black, Non-Hispanic",Missing,Missing,No,Missing
2020/10/23,2020/10/23,2020/10/23,,Laboratory-confirmed case,Female,0 - 9 Years,"Black, Non-Hispanic",No,Unknown,No,No
2020/10/23,2020/10/25,2020/10/23,2020/10/23,Laboratory-confirmed case,Female,0 - 9 Years,"Black, Non-Hispanic",No,Missing,Missing,Missing
2020/10/23,2020/10/25,2020/10/23,,Laboratory-confirmed case,Female,0 - 9 Years,"Black, Non-Hispanic",Missing,Missing,Missing,Missing

Looks like when parsing null rows, the parser gets fed a null value. I'm tempted to update

defaultNullValue

in dateTimeFieldSpec to be a default date of

1970/01/01

but I'd like to just keep those values null if possible. Anything i'm doing wrong or any way around this?

Copy code

Failed to generate Pinot segment for file - file:/tmp/pinot-quick-start/covid-cases.csv
java.lang.IllegalArgumentException: Invalid format: "null"
	at org.joda.time.format.DateTimeParserBucket.doParseMillis(DateTimeParserBucket.java:187) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-50a4531b33475327bc9fe3c0199e7003f0a4c882]
	at org.joda.time.format.DateTimeFormatter.parseMillis(DateTimeFormatter.java:826) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-50a4531b33475327bc9fe3c0199e7003f0a4c882]
	at org.apache.pinot.core.segment.creator.impl.SegmentColumnarIndexCreator.writeMetadata(SegmentColumnarIndexCreator.java:555) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-50a4531b33475327bc9fe3c0199e7003f0a4c882]
	at org.apache.pinot.core.segment.creator.impl.SegmentColumnarIndexCreator.seal(SegmentColumnarIndexCreator.java:514) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-50a4531b33475327bc9fe3c0199e7003f0a4c882]
	at org.apache.pinot.core.segment.creator.impl.SegmentIndexCreationDriverImpl.handlePostCreation(SegmentIndexCreationDriverImpl.java:273) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-50a4531b33475327bc9fe3c0199e7003f0a4c882]
	at org.apache.pinot.core.segment.creator.impl.SegmentIndexCreationDriverImpl.build(SegmentIndexCreationDriverImpl.java:246) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-50a4531b33475327bc9fe3c0199e7003f0a4c882]
	at org.apache.pinot.plugin.ingestion.batch.common.SegmentGenerationTaskRunner.run(SegmentGenerationTaskRunner.java:111) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-50a4531b33475327bc9fe3c0199e7003f0a4c882]
	at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$submitSegmentGenTask$1(SegmentGenerationJobRunner.java:261) ~[pinot-batch-ingestion-standalone-0.7.0-SNAPSHOT-shaded.jar:0.7.0-SNAPSHOT-50a4531b33475327bc9fe3c0199e7003f0a4c882]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_282]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_282]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_282]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_282]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]

Elon

04/10/2021, 1:30 AM

Try

"nullHandlingEnabled": true,

in the

"tableIndexConfig":

section of the table config (not the schema above). Not sure if time columns can be null though, if not then defaultNullValue = 0 would work, otherwise it will be set to Long.MIN_VALUE which is also not a valid value.

👍 2

Aaron Wishnick

04/13/2021, 3:52 PM

I do seem to be able to perform aggregations over it though, like to take the average of a column

Jonathan Meyer

04/16/2021, 10:45 AM

I've got 2 questions regarding realtime ingestion filtering : • Is it possible / recommended to use a wildcard (ex: keep every

events.something.*

) ? • Is it possible to filter on a column not part of the schema (ex: when filtering on

event.this_event_type.only

) ? Thanks 😄

Surendra

04/16/2021, 11:38 PM

Copy code

{
  "id": "<>__0__1000__20210312T1413Z",
  "simpleFields": {
    "segment.crc": "1640078893",
    "segment.creation.time": "1615558396792",
    "segment.end.time": "-9223372036854775808",
    "segment.flush.threshold.size": "100000",
    "segment.flush.threshold.time": null,
    "segment.index.version": "v3",
    "segment.name": "<>__0__1000__20210312T1413Z",
    "segment.realtime.download.url": "s3://<>/pinot/<>/<>__0__1000__20210312T1413Z",
    "segment.realtime.endOffset": "62577410",
    "segment.realtime.numReplicas": "1",
    "segment.realtime.startOffset": "62565619",
    "segment.realtime.status": "DONE",
    "segment.start.time": "-9223372036854775808",
    "segment.table.name": "<>_REALTIME",
    "segment.time.unit": "MILLISECONDS",
    "segment.total.docs": "11791",
    "segment.type": "REALTIME"
  },
  "mapFields": {},
  "listFields": {}
}

Ravikumar Maddi

04/19/2021, 1:11 PM

Hi All, In my S3 bucket has last one year files exist with the folder structure '*year/month/day*' but in Pinot job spec we can specify the a folder can not cover sub folders(as per Pinot documentation) so how can I specify in my job spec yaml to cover S3 sub folders also. Need your help..

Ravikumar Maddi

04/20/2021, 2:44 PM

Dear All, I am trying a PoC which I am trying to ingest data from S3 bucket to Pinot local. 1. Data exist in S3. 2. Data in JSON format. 3. Need to push to Pinot in each 15 mins interval. 4. Output location(Pinot segments) is local file system. I have few doubts: 1. Pinot itself can do ingest, store, query or we should need any other tools to do. I find one blog, in that they are using spark and for ingestion. So we should need something to ingestion, is Pinot itself can not get data from S3. 2. In my S3 bucket the folder structure 'year/month/day' but in Pinot job spec we can specify the a folder can not cover sub folders(as per Pinot documentation) so how can I specify in my job spec yaml to cover S3 sub folders also. How can I overcome this limitation.