Hey, I’m getting `OutOfMemoryException Java Heap S...
# troubleshooting
w
Hey, I’m getting
OutOfMemoryException Java Heap Space
durning batch ingestion. I have the same configs as in the previous thread. The only difference is that I use an 18GB CSV file. What is the best strategy to investigate that? Any table configuration might help?
Copy code
java.lang.RuntimeException: Caught exception during running - org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
	at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:152) ~[pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
	at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.runIngestionJob(IngestionJobLauncher.java:121) ~[pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
	at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:130) [pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
	at org.apache.pinot.tools.Command.call(Command.java:33) [pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
	at org.apache.pinot.tools.Command.call(Command.java:29) [pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
	at picocli.CommandLine.executeUserObject(CommandLine.java:1953) [pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
	at picocli.CommandLine.access$1300(CommandLine.java:145) [pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
	at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352) [pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2346) [pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2311) [pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
	at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179) [pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
	at picocli.CommandLine.execute(CommandLine.java:2078) [pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
	at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:167) [pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
	at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:198) [pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
Caused by: java.lang.RuntimeException: Failed to generate Pinot segment for file - <s3://0x-wojciech/dump.csv>
	at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$submitSegmentGenTask$1(SegmentGenerationJobRunner.java:286) ~[pinot-batch-ingestion-standalone-0.12.0-SNAPSHOT-shaded.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
	at java.lang.Thread.run(Thread.java:829) ~[?:?]
Caused by: java.lang.OutOfMemoryError: Java heap space
	at it.unimi.dsi.fastutil.doubles.DoubleOpenHashSet.rehash(DoubleOpenHashSet.java:606) ~[pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
	at it.unimi.dsi.fastutil.doubles.DoubleOpenHashSet.add(DoubleOpenHashSet.java:310) ~[pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
	at org.apache.pinot.segment.local.segment.creator.impl.stats.DoubleColumnPreIndexStatsCollector.collect(DoubleColumnPreIndexStatsCollector.java:53) ~[pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
	at org.apache.pinot.segment.local.segment.creator.impl.stats.SegmentPreIndexStatsCollectorImpl.collectRow(SegmentPreIndexStatsCollectorImpl.java:100) ~[pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
	at org.apache.pinot.segment.local.segment.creator.RecordReaderSegmentCreationDataSource.gatherStats(RecordReaderSegmentCreationDataSource.java:69) ~[pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
	at org.apache.pinot.segment.local.segment.creator.RecordReaderSegmentCreationDataSource.gatherStats(RecordReaderSegmentCreationDataSource.java:37) ~[pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
	at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:181) ~[pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
	at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:153) ~[pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
	at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:102) ~[pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
	at org.apache.pinot.plugin.ingestion.batch.common.SegmentGenerationTaskRunner.run(SegmentGenerationTaskRunner.java:118) ~[pinot-all-0.12.0-SNAPSHOT-jar-with-dependencies.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
	at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$submitSegmentGenTask$1(SegmentGenerationJobRunner.java:264) ~[pinot-batch-ingestion-standalone-0.12.0-SNAPSHOT-shaded.jar:0.12.0-SNAPSHOT-ae239cd10056330c008d23b1faaec158d660446c]
	at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner$$Lambda$613/0x0000000840557840.run(Unknown Source) ~[?:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
	at java.lang.Thread.run(Thread.java:829) ~[?:?]
java.lang.RuntimeException: Caught exception during running - org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
	at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:152)
	at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.runIngestionJob(IngestionJobLauncher.java:121)
	at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.execute(LaunchDataIngestionJobCommand.java:130)
	at org.apache.pinot.tools.Command.call(Command.java:33)
	at org.apache.pinot.tools.Command.call(Command.java:29)
	at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
	at picocli.CommandLine.access$1300(CommandLine.java:145)
	at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
	at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
	at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
	at picocli.CommandLine.execute(CommandLine.java:2078)
	at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:167)
	at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:198)
Caused by: java.lang.RuntimeException: Failed to generate Pinot segment for file - <s3://0x-wojciech/dump.csv>
	at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$submitSegmentGenTask$1(SegmentGenerationJobRunner.java:286)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.OutOfMemoryError: Java heap space
	at it.unimi.dsi.fastutil.doubles.DoubleOpenHashSet.rehash(DoubleOpenHashSet.java:606)
	at it.unimi.dsi.fastutil.doubles.DoubleOpenHashSet.add(DoubleOpenHashSet.java:310)
	at org.apache.pinot.segment.local.segment.creator.impl.stats.DoubleColumnPreIndexStatsCollector.collect(DoubleColumnPreIndexStatsCollector.java:53)
	at org.apache.pinot.segment.local.segment.creator.impl.stats.SegmentPreIndexStatsCollectorImpl.collectRow(SegmentPreIndexStatsCollectorImpl.java:100)
	at org.apache.pinot.segment.local.segment.creator.RecordReaderSegmentCreationDataSource.gatherStats(RecordReaderSegmentCreationDataSource.java:69)
	at org.apache.pinot.segment.local.segment.creator.RecordReaderSegmentCreationDataSource.gatherStats(RecordReaderSegmentCreationDataSource.java:37)
	at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:181)
	at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:153)
	at org.apache.pinot.segment.local.segment.creator.impl.SegmentIndexCreationDriverImpl.init(SegmentIndexCreationDriverImpl.java:102)
	at org.apache.pinot.plugin.ingestion.batch.common.SegmentGenerationTaskRunner.run(SegmentGenerationTaskRunner.java:118)
	at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.lambda$submitSegmentGenTask$1(SegmentGenerationJobRunner.java:264)
	at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner$$Lambda$613/0x0000000840557840.run(Unknown Source)
k
Can you break your it up into smaller files? Pinot creates one segment per input file
w
What would be recommended max size of the one file?
k
1gb
k
You can also bump the amount of memory used by the admin tool (I’m assuming you’re running the
bin/pinot-admin.sh
script). By default it’s using
-Xms4G
but we’ve done some builds (during testing) with it set much higher. As @Kishore G noted, you’ll want smaller segments in real world usage. Also you can set your
segmentCreationJobParallelism
to 1 in the job file, if you’re processing multiple files.