https://pinot.apache.org/ logo
Join Slack
Powered by
# troubleshooting
  • d

    Dan Hill

    06/07/2020, 9:52 PM
    When running the LaunchDataIngestionJob using Docker, how can I increase Xmx? I'm hitting an OOM issue. I tried a few ways. I see the value set in pom.xml (to 1gb).
    Copy code
    sudo docker run --rm -ti -v /home/ec2-user/metrics:/home/ec2-user/metrics   -v /home/ec2-user/events:/home/ec2-user/events   --name pinot-data-ingestion-job   apachepinot/pinot:latest LaunchDataIngestionJob   -jobSpecFile ~/metrics/pinot/loadtest/kubernetes/aws-dev-batch-job-spec.yaml
    Copy code
    Generated 1000000 star-tree records from 1000000 segment records
    Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    	at java.util.Arrays.copyOf(Arrays.java:3181)
    	at java.util.ArrayList.grow(ArrayList.java:265)
    	at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:239)
    	at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:231)
    	at java.util.ArrayList.add(ArrayList.java:462)
    	at org.apache.pinot.core.startree.v2.builder.OffHeapSingleTreeBuilder.appendRecord(OffHeapSingleTreeBuilder.java:162)
    	at org.apache.pinot.core.startree.v2.builder.BaseSingleTreeBuilder.appendToStarTree(BaseSingleTreeBuilder.java:336)
    	at org.apache.pinot.core.startree.v2.builder.BaseSingleTreeBuilder.constructStarNode(BaseSingleTreeBuilder.java:406)
    	at org.apache.pinot.core.startree.v2.builder.BaseSingleTreeBuilder.constructStarTree(BaseSingleTreeBuilder.java:359)
    	at org.apache.pinot.core.startree.v2.builder.BaseSingleTreeBuilder.constructStarTree(BaseSingleTreeBuilder.java:365)
    	at org.apache.pinot.core.startree.v2.builder.BaseSingleTreeBuilder.constructStarTree(BaseSingleTreeBuilder.java:365)
    	at org.apache.pinot.core.startree.v2.builder.BaseSingleTreeBuilder.constructStarTree(BaseSingleTreeBuilder.java:365)
    	at org.apache.pinot.core.startree.v2.builder.BaseSingleTreeBuilder.constructStarTree(BaseSingleTreeBuilder.java:365)
    	at org.apache.pinot.core.startree.v2.builder.BaseSingleTreeBuilder.constructStarTree(BaseSingleTreeBuilder.java:365)
    	at org.apache.pinot.core.startree.v2.builder.BaseSingleTreeBuilder.constructStarTree(BaseSingleTreeBuilder.java:365)
    	at org.apache.pinot.core.startree.v2.builder.BaseSingleTreeBuilder.constructStarTree(BaseSingleTreeBuilder.java:365)
    	at org.apache.pinot.core.startree.v2.builder.BaseSingleTreeBuilder.constructStarTree(BaseSingleTreeBuilder.java:365)
    	at org.apache.pinot.core.startree.v2.builder.BaseSingleTreeBuilder.constructStarTree(BaseSingleTreeBuilder.java:365)
    	at org.apache.pinot.core.startree.v2.builder.BaseSingleTreeBuilder.constructStarTree(BaseSingleTreeBuilder.java:365)
    	at org.apache.pinot.core.startree.v2.builder.BaseSingleTreeBuilder.constructStarTree(BaseSingleTreeBuilder.java:365)
    	at org.apache.pinot.core.startree.v2.builder.BaseSingleTreeBuilder.constructStarTree(BaseSingleTreeBuilder.java:365)
    	at org.apache.pinot.core.startree.v2.builder.BaseSingleTreeBuilder.constructStarTree(BaseSingleTreeBuilder.java:365)
    	at org.apache.pinot.core.startree.v2.builder.BaseSingleTreeBuilder.constructStarTree(BaseSingleTreeBuilder.java:365)
    	at org.apache.pinot.core.startree.v2.builder.BaseSingleTreeBuilder.constructStarTree(BaseSingleTreeBuilder.java:365)
    	at org.apache.pinot.core.startree.v2.builder.BaseSingleTreeBuilder.constructStarTree(BaseSingleTreeBuilder.java:365)
    	at org.apache.pinot.core.startree.v2.builder.BaseSingleTreeBuilder.build(BaseSingleTreeBuilder.java:317)
    	at org.apache.pinot.core.startree.v2.builder.OffHeapSingleTreeBuilder.build(OffHeapSingleTreeBuilder.java:43)
    	at org.apache.pinot.core.startree.v2.builder.MultipleTreesBuilder.build(MultipleTreesBuilder.java:120)
  • k

    Kishore G

    06/08/2020, 12:40 AM
    @Xiang Fu ^^
  • x

    Xiang Fu

    06/08/2020, 12:45 AM
    You can give the environment variable to override the default one in Pinot-admin
  • e

    Elon

    06/08/2020, 7:50 PM
    We noticed our zk disks are full in production. We had allocated 20gb. Should we expect that it will fill up and increase or does this indicate an issue?
  • x

    Xiang Fu

    06/08/2020, 7:51 PM
    did you do any clean up ?
  • x

    Xiang Fu

    06/08/2020, 7:52 PM
    usually you need to config autopurge
  • x

    Xiang Fu

    06/08/2020, 7:52 PM
    Copy code
    autopurge.snapRetainCount=3
    autopurge.purgeInterval=12
  • x

    Xiang Fu

    06/08/2020, 7:52 PM
    something like this
  • e

    Elon

    06/08/2020, 8:07 PM
    Thanks! Will check
  • e

    Elon

    06/08/2020, 10:56 PM
    So that worked but now we see that 1 of 2 controllers is in a crashloop. Is it ok to have > 1 controller? Could it be stale info in zk? Here's the log from startup:
  • e

    Elon

    06/08/2020, 10:56 PM
    Untitled
    Untitled
  • x

    Xiang Fu

    06/08/2020, 11:13 PM
    we usually recommend 3 controllers
  • x

    Xiang Fu

    06/08/2020, 11:13 PM
    is this the died controller log?
  • x

    Xiang Fu

    06/08/2020, 11:14 PM
    silent failure ?
  • e

    Elon

    06/09/2020, 12:36 AM
    It was a helm issue, resolved. That's good to know
  • e

    Elon

    06/09/2020, 12:44 AM
    btw, setting ZOO_AUTOPURGE_PURGEINTERVAL seems to have worked, we now see purge tasks being completed in the logs
    👍 1
  • s

    srisudha

    06/09/2020, 11:24 AM
    Hi.. Was wondering .. When load mode = MMAP, during segment creation why does direct buffers size show an increase? Only mapped buffers size should increase isn't it? . why would this cause out of memory for direct buffers ??
  • m

    Mayank

    06/11/2020, 4:57 AM
    @Subbu Subramaniam @Jackie: @srisudha is running into an OOM for direct memory when consuming RT segments.
    Copy code
    1. 26 GB ram, 4 gb heap and 10 Gb direct memory
    2. Each VM is consuming 3 partitions, and each consuming segment ends up being 100MB segment.
    3. LoadMode in tableConfig is MMAP.
  • m

    Mayank

    06/11/2020, 4:57 AM
    From the stack trace below, it seems that pre-allocating for fwd index might be OOM'ing.
    Copy code
    java.lang.Thread.run(Thread.java:748) [?:1.8.0_252] Caused by: java.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:694) ~[?:1.8.0_252] at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) ~[?:1.8.0_252] at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) ~[?:1.8.0_252] at org.apache.pinot.core.segment.memory.PinotByteBuffer.allocateDirect(PinotByteBuffer.java:41) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-ec03154343df4831e33092a247505ef0af3d9daf] at org.apache.pinot.core.segment.memory.PinotDataBuffer.allocateDirect(PinotDataBuffer.java:116) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-ec03154343df4831e33092a247505ef0af3d9daf] at org.apache.pinot.core.io.writer.impl.DirectMemoryManager.allocateInternal(DirectMemoryManager.java:53) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-ec03154343df4831e33092a247505ef0af3d9daf] at org.apache.pinot.core.io.readerwriter.RealtimeIndexOffHeapMemoryManager.allocate(RealtimeIndexOffHeapMemoryManager.java:79) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-ec03154343df4831e33092a247505ef0af3d9daf] at org.apache.pinot.core.io.readerwriter.impl.FixedByteSingleColumnSingleValueReaderWriter.addBuffer(FixedByteSingleColumnSingleValueReaderWriter.java:179) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-ec03154343df4831e33092a247505ef0af3d9daf] at org.apache.pinot.core.io.readerwriter.impl.FixedByteSingleColumnSingleValueReaderWriter.<init>(FixedByteSingleColumnSingleValueReaderWriter.java:71) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-ec03154343df4831e33092a247505ef0af3d9daf] at org.apache.pinot.core.indexsegment.mutable.MutableSegmentImpl.<init>(MutableSegmentImpl.java:273) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-ec03154343df4831e33092a247505ef0af3d9daf] at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.<init>(LLRealtimeSegmentDataManager.java:1206) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-ec03154343df4831e33092a247505ef0af3d9daf] at
  • m

    Mayank

    06/11/2020, 4:58 AM
    @srisudha could you please add more details on when you run into OOM, and when you don't?
  • m

    Mayank

    06/11/2020, 5:05 AM
    Also, looking at the code, it seems that RT always uses direct memory for consuming segments, unless specified as MMAP in the instanceDataManagerConfig.
    Copy code
    _memoryManager = getMemoryManager(realtimeTableDataManager.getConsumerDir(), _segmentNameStr,
            indexLoadingConfig.isRealtimeOffheapAllocation(), indexLoadingConfig.isDirectRealtimeOffheapAllocation(),
            serverMetrics);
  • s

    srisudha

    06/11/2020, 5:37 AM
    We initially ran into OOM for the first time when v tried 500 mb segment size and direct memory setting wasn't configured..
  • s

    srisudha

    06/11/2020, 5:37 AM
    Later v figured out that 100 mb works well for our use case
  • s

    srisudha

    06/11/2020, 5:49 AM
    So v executed a PT with 3 partitions 3 servers 100 mb seg size and when the first set of partitions got created without issue. oOM direct memory came up when segment creation happened the second time
  • s

    srisudha

    06/11/2020, 5:50 AM
    And it happened only on one server.. And off heap shooted beyond 9.5 GB.. Direct buffers also showed same rise
  • s

    srisudha

    06/11/2020, 5:50 AM
    And v configured jvm param for direct memory as 10 gb
  • s

    srisudha

    06/11/2020, 5:51 AM
    Our confirmation wrt memory on servers is 26 GB ram , 10 gb direct buffers and 4 gb heap
  • s

    Subbu Subramaniam

    06/11/2020, 3:25 PM
    @srisudha did you try using the
    RealtimeProvisioningHelper
    tool as mentioned in https://engineering.linkedin.com/blog/2019/auto-tuning-pinot? It will help you decide the right segment size for your use case instead of guessting the size
  • m

    Mayank

    06/11/2020, 3:26 PM
    @Subbu Subramaniam yes, I believe that was tried and it was generating 10MB segments,
  • m

    Mayank

    06/11/2020, 3:27 PM
    Given the segment size is 100MB, what would cause it to OOM in direct memory?
1...109110111...166Latest