Wojciech Wasik
10/05/2022, 8:56 AMOutOfMemoryException Java Heap Space
durning batch ingestion. I have the same configs as in the previous thread. The only difference is that I use an 18GB CSV file. What is the best strategy to investigate that? Any table configuration might help?Wojciech Wasik
10/05/2022, 9:08 AMjava.lang.RuntimeException: Caught exception during running - org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:152) ~[pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.runIngestionJob(IngestionJobLauncher.java:121) ~[pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:130) [pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
at org.apache.pinot.tools.Command.call(Command.java:33) [pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
at org.apache.pinot.tools.Command.call(Command.java:29) [pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
at picocli.CommandLine.executeUserObject(CommandLine.java:1953) [pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
at picocli.CommandLine.access$1300(CommandLine.java:145) [pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352) [pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
at picocli.CommandLine$RunLast.handle(CommandLine.java:2346) [pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
at picocli.CommandLine$RunLast.handle(CommandLine.java:2311) [pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179) [pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
at picocli.CommandLine.execute(CommandLine.java:2078) [pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:167) [pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:198) [pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
Caused by: java.lang.RuntimeException: Failed to generate Pinot segment for file - <s3://0x-wojciech/dump.csv>
at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$submitSegmentGenTask$1(SegmentGenerationJobRunner.java:286) ~[pinot-batch-ingestion-standalone-0.12.0-SNAPSHOT-shaded.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:829) ~[?:?]
Caused by: java.lang.OutOfMemoryError: Java heap space
at it.unimi.dsi.fastutil.doubles.DoubleOpenHashSet.rehash(DoubleOpenHashSet.java:606) ~[pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
at it.unimi.dsi.fastutil.doubles.DoubleOpenHashSet.add(DoubleOpenHashSet.java:310) ~[pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
at org.apache.pinot.segment.local.segment.creator.impl.stats.DoubleColumnPreIndexStatsCollector.collect(DoubleColumnPreIndexStatsCollector.java:53) ~[pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
at org.apache.pinot.segment.local.segment.creator.impl.stats.SegmentPreIndexStatsCollectorImpl.collectRow(SegmentPreIndexStatsCollectorImpl.java:100) ~[pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
at org.apache.pinot.segment.local.segment.creator.RecordReaderSegmentCreationDataSource.gatherStats(RecordReaderSegmentCreationDataSource.java:69) ~[pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
at org.apache.pinot.segment.local.segment.creator.RecordReaderSegmentCreationDataSource.gatherStats(RecordReaderSegmentCreationDataSource.java:37) ~[pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:181) ~[pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:153) ~[pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:102) ~[pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
at org.apache.pinot.plugin.ingestion.batch.common.SegmentGenerationTaskRunner.run(SegmentGenerationTaskRunner.java:118) ~[pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$submitSegmentGenTask$1(SegmentGenerationJobRunner.java:264) ~[pinot-batch-ingestion-standalone-0.12.0-SNAPSHOT-shaded.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner$$Lambda$613/0x0000000840557840.run(Unknown Source) ~[?:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:829) ~[?:?]
java.lang.RuntimeException: Caught exception during running - org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:152)
at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.runIngestionJob(IngestionJobLauncher.java:121)
at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:130)
at org.apache.pinot.tools.Command.call(Command.java:33)
at org.apache.pinot.tools.Command.call(Command.java:29)
at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
at picocli.CommandLine.access$1300(CommandLine.java:145)
at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
at picocli.CommandLine.execute(CommandLine.java:2078)
at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:167)
at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:198)
Caused by: java.lang.RuntimeException: Failed to generate Pinot segment for file - <s3://0x-wojciech/dump.csv>
at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$submitSegmentGenTask$1(SegmentGenerationJobRunner.java:286)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.OutOfMemoryError: Java heap space
at it.unimi.dsi.fastutil.doubles.DoubleOpenHashSet.rehash(DoubleOpenHashSet.java:606)
at it.unimi.dsi.fastutil.doubles.DoubleOpenHashSet.add(DoubleOpenHashSet.java:310)
at org.apache.pinot.segment.local.segment.creator.impl.stats.DoubleColumnPreIndexStatsCollector.collect(DoubleColumnPreIndexStatsCollector.java:53)
at org.apache.pinot.segment.local.segment.creator.impl.stats.SegmentPreIndexStatsCollectorImpl.collectRow(SegmentPreIndexStatsCollectorImpl.java:100)
at org.apache.pinot.segment.local.segment.creator.RecordReaderSegmentCreationDataSource.gatherStats(RecordReaderSegmentCreationDataSource.java:69)
at org.apache.pinot.segment.local.segment.creator.RecordReaderSegmentCreationDataSource.gatherStats(RecordReaderSegmentCreationDataSource.java:37)
at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:181)
at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:153)
at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:102)
at org.apache.pinot.plugin.ingestion.batch.common.SegmentGenerationTaskRunner.run(SegmentGenerationTaskRunner.java:118)
at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$submitSegmentGenTask$1(SegmentGenerationJobRunner.java:264)
at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner$$Lambda$613/0x0000000840557840.run(Unknown Source)
Kishore G
Wojciech Wasik
10/05/2022, 12:47 PMKishore G
Ken Krugler
10/08/2022, 11:36 PMbin/pinot-admin.sh
script). By default it’s using -Xms4G
but we’ve done some builds (during testing) with it set much higher. As @Kishore G noted, you’ll want smaller segments in real world usage. Also you can set your segmentCreationJobParallelism
to 1 in the job file, if you’re processing multiple files.