Apache Pinot #troubleshooting

Join Slack

Kishore G

07/08/2020, 10:52 PM

Kishore G

07/09/2020, 12:44 AM

found the issue: https://github.com/apache/incubator-pinot/pull/5669

Pradeep

07/09/2020, 12:48 AM

tested the fix, it’s working now thanks @Kishore G

Mayank

07/09/2020, 12:49 AM

@Kishore G Checked the callers of

init

, going by variable names, some are passing

config

and others are passing

fsConfig

Daniel Lavoie

07/09/2020, 1:08 AM

I'm back home, I'll review the findings and provide more context if needed.

Mayank

07/09/2020, 1:13 AM

Thanks @Daniel Lavoie. One thing this failure brings up is that we have lack of test coverage here. Perhaps we should also use this opportunity to improve on that

💯 1

Suraj

07/21/2020, 1:10 AM

Hello - we are noticing slow queries and wanted to check if there is a way to log the execution plan of the queries in the logs ?

Kishore G

07/21/2020, 1:35 AM

@Suraj you should know what’s taking time by looking at the response stats

Elon

07/21/2020, 8:10 PM

We are about to upgrade to pinot-0.4.0 - do you recommend going to head or just cutting it at the 0.4.0 release commit?

Elon

07/21/2020, 8:11 PM

Any notable config changes, or k8s changes we should be aware of? We're on pinot-0.3.0 now

Damiano

07/21/2020, 8:30 PM

Nooooo I have just upgraded my custom aggregation function 😄 did you change the API?

Damiano

07/21/2020, 8:30 PM

😂

Kishore G

07/21/2020, 8:49 PM

@Elon I would go with 0.4.0 unless you need any feature in master

👍 1

Dan Hill

07/22/2020, 12:19 AM

Sorry, I think I've asked before (I lost my slack history). Is there an easy way to have Pinot take the realtime inputs and automatically run data ingestion jobs to populate the offline tables? Mostly checking to see if I can shortcut some work for a v1 deliverable. I assume there is probably a simple setup to output the kafka topic for 1 day, split the data and run batch ingestion jobs.

Kishore G

07/22/2020, 12:31 AM

Yes, it’s doable but there is no such tool

Kishore G

07/22/2020, 12:33 AM

You can download the real-time segments use Pinot segment reader to read multiple segments to generate a new offline segment and push it

Mayank

07/22/2020, 5:11 AM

@Buchi Reddy are there specific types of queries that are failing and passing?

Buchi Reddy

07/22/2020, 5:13 AM

We’re seeing slowness of some random queries with Pinot. So far here are our observations: • We didn’t tune the segment sizes so we have smaller segments, some are in the size of ~100MB, though in one table they went to 770MB each segment. • On one of the tables, we noticed 10K segments. Queries to this table are some times failing with the exception that I posted in #CDRCA57FC channel. • If we try the queries from Pinot console, we’re seeing the response times are always better than what our service, which is using Java Pinot client, is seeing.

Kishore G

07/22/2020, 4:23 PM

@Yash Agarwal were you able to rebuild the jar?

Yash Agarwal

07/22/2020, 4:50 PM

Yes. I built the jar with the updated spark and scala versions.

Kishore G

07/22/2020, 4:55 PM

i see

Kishore G

07/22/2020, 4:56 PM

@Xiang Fu you had another command to build the spark job jar directly?

Xiang Fu

07/22/2020, 4:56 PM

pinot-spark jar?

Xiang Fu

07/22/2020, 4:57 PM

I’m also using that pinot-all jar

Yash Agarwal

07/24/2020, 5:00 AM

I am getting

Copy code

java.lang.IllegalStateException: Unable to extract out the relative path based on base input path: <hdfs://bigredns/apps/hive/warehouse/dev_phx_chargers.db/guest_sdr_gst_data_sgl>
	at shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:444)
	at org.apache.pinot.plugin.ingestion.batch.common.SegmentGenerationUtils.getRelativeOutputPath(SegmentGenerationUtils.java:144)
	at org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner$1.call(SparkSegmentGenerationJobRunner.java:292)
	at org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner$1.call(SparkSegmentGenerationJobRunner.java:214)

the job config is

Copy code

inputDirURI: '<hdfs://bigredns/apps/hive/warehouse/dev_phx_chargers.db/guest_sdr_gst_data_sgl>'
outputDirURI: '<hdfs://bigredns/apps/hive/warehouse/dev_phx_chargers.db/guest_sdr_gst_data_sgl_segments>'

Xiang Fu

07/24/2020, 5:16 AM

we get the input file like

<hdfs://bigredns/apps/hive/warehouse/dev_phx_chargers.db/guest_sdr_gst_data_sgl/a/b/c.avro>

Xiang Fu

07/24/2020, 5:16 AM

then output segment path should be

<hdfs://bigredns/apps/hive/warehouse/dev_phx_chargers.db/guest_sdr_gst_data_sgl_segments/a/b/c.tar.gz>

Xiang Fu

07/24/2020, 5:17 AM

we try to extract the relative path of

a/b/c.avro

Xiang Fu

07/24/2020, 5:17 AM

so wanna check the input file path

Yash Agarwal

07/24/2020, 5:22 AM

the input path something like

Copy code

<hdfs://bigredns/apps/hive/warehouse/dev_phx_chargers.db/guest_sdr_gst_data_sgl/partition_d=2020-05-17/00000_0>