Hmm what s the recommended structure for calling LaunchDataI Apache Pinot #troubleshooting

Hmm, what's the recommended structure for calling ...

Dan Hill

06/02/2020, 12:13 AM

Hmm, what's the recommended structure for calling LaunchDataIngestionJob? I'm trying to run it with 300 million rows and I'm hitting a tar size issue. I can split the DataIngestion calls but I'm curious as to what is recommended.

Copy code

java.lang.RuntimeException: entry size '14879990781' is too big ( > 8589934591 )
	at org.apache.commons.compress.archivers.tar.TarArchiveOutputStream.failForBigNumber(TarArchiveOutputStream.java:623) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-ed26e8589fe5f91d2876d417aebf23575010cc76]

Logs

Xiang Fu

06/02/2020, 12:41 AM

Typically we do hundreds MB per segment, the hard limit here is per column index size should not exceed 2gb.

kish

06/02/2020, 2:34 AM

Hi: Does it mean that input file can be larger than 2GB if there are N (> 1) columns as long as ANY one column index size does not exceed larger than 2GB?

Xiang Fu

06/02/2020, 3:01 AM

Yes

Dan Hill

06/02/2020, 4:27 PM

Hmm, even if I try to load a small number of rows (10 million rows), I hit this issue if I have too many star tree indices.

Dan Hill

06/02/2020, 4:28 PM

What sort of limits should I have for the star tree schemas?

Xiang Fu

06/02/2020, 5:43 PM

@Jackie ^^

Jackie

06/02/2020, 5:47 PM

@Dan Hill How many columns do you have in the star-tree?

Dan Hill

06/02/2020, 5:49 PM

Around 7-8 dimensions and 30 metrics. All of them are current numbers. There are 18 of them.

Jackie

06/02/2020, 6:10 PM

In that case, I would recommend ~1M records per segment

Jackie

06/02/2020, 6:11 PM

Also, why do you need 18 star-trees?

Dan Hill

06/02/2020, 6:15 PM

Gotcha. Are there any limits to number of segments? I have roughly 40 dimensions. I can't rely on the built in time support so I have 4 date dimensions (depending on what scope is being used). I have a entity hierarchy that's about 5 levels deep. Then I have some separate star trees depending on combinations of the remaining dimensions.

Dan Hill

06/02/2020, 6:16 PM

I'll send you the current one in a direct message.

Dan Hill

06/02/2020, 6:29 PM

Hmm, how does this scale with realtime ingestion? I can shard better in the offline case.

Open in Slack

Previous Next