Hello we are on 0 6 and we are facing one issue with realtim Apache Pinot #troubleshooting

Hello, we are on 0.6 and we are facing one issue with realtime tables consumption from Kafka. Consumption from kafka stops all of a sudden. There are no errors/exceptions in logs. And from thread dumps, we don’t see the consumption thread. From this I assume we hit some uncaught exception in consumption thread. We are using Kafka low level consumer. At pinot bootstrap, should we register a default uncaught exception handler which just logs them with full stack trace. This helps in debugging several unhandled corner cases.

Jackie

03/25/2021, 5:56 PM

The consuming threads have a global catch which will log the exception if anything goes wrong. Can you please check the external view of the table and see if the segment runs into ERROR state.

Laxman Ch

03/25/2021, 6:02 PM

yes. checking

Jackie

03/25/2021, 6:03 PM

Also search for the

ERROR

log with logger name

LLRealtimeSegmentDataManager_<segmentName>

Laxman Ch

03/25/2021, 6:17 PM

Nope. There are no error logs for this either in controller or server

Xiang Fu

03/25/2021, 6:37 PM

hmm, have you checked the log file inside the server container?

Laxman Ch

03/25/2021, 6:42 PM

yes

Xiang Fu

03/25/2021, 11:46 PM

hmm, can you try to set the log level to info or debug on server so it may print out the stacktrace, in any case, there should be something output. Also for test purpose, you can make the logger to be sync

Laxman Ch

03/26/2021, 1:08 PM

Log level is already info. We can’t really afford to set it to debug as this is happening in a cluster with significant load. And this issue is not occurring in other clusters (they are with less load).

Laxman Ch

03/26/2021, 1:09 PM

At pinot bootstrap, should we register a default uncaught exception handler which just logs them with full stack trace.

@Xiang Fu: Should we have one?

Laxman Ch

03/31/2021, 11:29 AM

This issue occurred again in our system and here are my observations • Few partitions consumer threads are not seen in thread dumps. This means, consumption threads are not just hung but they are dead • This issue is happening when an old segment is moving from CONSUMING to ONLINE state and a new segment is being created. • No error logs and no stack traces. • Last log we see in server before consumption thread dies quietly.

Copy code

2021/03/31 07:31:39.825 INFO [LLRealtimeSegmentDataManager_spanEventView__27__2251__20210331T0731Z] [HelixTaskExecutor-message_handle_thread] RealtimeDataResourceZKMetadata contains no information about sorted column for segment spanEventView__27__2251__20210331T0731Z

• External view of the table also doesn’t show this server for this specific segment • Issue gets fixed after restart of problematic pinot-server

Laxman Ch

03/31/2021, 1:08 PM

Controller logs

Copy code

2021/03/31 07:31:37.786 INFO [PinotLLCRealtimeSegmentManager] [grizzly-http-server-1] Creating segment ZK metadata for new CONSUMING segment: spanEventView__27__2251__20210331T0731Z with start offset: 166204784 and creation time: 1617175897785
2021/03/31 07:31:37.786 INFO [SegmentSizeBasedFlushThresholdUpdater] [grizzly-http-server-1] spanEventView__27__2251__20210331T0731Z: Data from committing segment: Time 1h32m3s  numRows 182331 threshold 182331 segmentSize(bytes) 326256819
2021/03/31 07:31:37.786 INFO [SegmentSizeBasedFlushThresholdUpdater] [grizzly-http-server-1] Committing segment size 326256819, current ratio 4.443249882066186E-4, setting threshold for spanEventView__27__2251__20210331T0731Z as 186363
2021/03/31 07:31:37.786 INFO [PinotLLCRealtimeSegmentManager] [grizzly-http-server-1] Persisting segment ZK metadata for segment: spanEventView__27__2251__20210331T0731Z
2021/03/31 07:31:37.854 INFO [RealtimeSegmentAssignment] [grizzly-http-server-1] Assigning segment: spanEventView__27__2251__20210331T0731Z with instance partitions: {""instancePartitionsName"":""spanEventView_CONSUMING"",""partitionToInstancesMap"":{""0_0"":[""Server_pinot-server-3.pinot-server.traceable.svc.cluster.local_8098"",""Server_pinot-server-0.pinot-server.traceable.svc.cluster.local_8098"",""Server_pinot-server-1.pinot-server.traceable.svc.cluster.local_8098"",""Server_pinot-server-2.pinot-server.traceable.svc.cluster.local_8098""]}} for table: spanEventView_REALTIME
2021/03/31 07:31:37.854 INFO [RealtimeSegmentAssignment] [grizzly-http-server-1] Assigned segment: spanEventView__27__2251__20210331T0731Z to instances: [Server_pinot-server-1.pinot-server.traceable.svc.cluster.local_8098, Server_pinot-server-2.pinot-server.traceable.svc.cluster.local_8098] for table: spanEventView_REALTIME
2021/03/31 07:31:37.854 INFO [PinotLLCRealtimeSegmentManager] [grizzly-http-server-1] Adding new CONSUMING segment: spanEventView__27__2251__20210331T0731Z to instances: [Server_pinot-server-1.pinot-server.traceable.svc.cluster.local_8098, Server_pinot-server-2.pinot-server.traceable.svc.cluster.local_8098]
2021/03/31 07:31:39.689 INFO [MessageDispatchStage] [HelixController-pipeline-default-traceable-views-(d7a6546b_DEFAULT)] Event d7a6546b_DEFAULT : Sending Message 6d73e84e-fa4d-4987-b026-7cb93397b7a9 to Server_pinot-server-2.pinot-server.traceable.svc.cluster.local_8098 transit spanEventView_REALTIME.spanEventView__27__2251__20210331T0731Z|[] from:OFFLINE to:CONSUMING, relayMessages: 0
2021/03/31 07:31:39.689 INFO [MessageDispatchStage] [HelixController-pipeline-default-traceable-views-(d7a6546b_DEFAULT)] Event d7a6546b_DEFAULT : Sending Message 803e30a5-3c3b-4cb2-9f2d-065ee9a78404 to Server_pinot-server-1.pinot-server.traceable.svc.cluster.local_8098 transit spanEventView_REALTIME.spanEventView__27__2251__20210331T0731Z|[] from:OFFLINE to:CONSUMING, relayMessages: 0
2021/03/31 07:31:40.937 INFO [MessageGenerationPhase] [HelixController-pipeline-default-traceable-views-(f69e3778_DEFAULT)] Event f69e3778_DEFAULT : Message already exists for Server_pinot-server-2.pinot-server.traceable.svc.cluster.local_8098 to transit spanEventView_REALTIME.spanEventView__27__2251__20210331T0731Z from OFFLINE to CONSUMING, isRelay: false
2021/03/31 07:31:40.937 INFO [MessageGenerationPhase] [HelixController-pipeline-default-traceable-views-(f69e3778_DEFAULT)] Event f69e3778_DEFAULT : Message already exists for Server_pinot-server-1.pinot-server.traceable.svc.cluster.local_8098 to transit spanEventView_REALTIME.spanEventView__27__2251__20210331T0731Z from OFFLINE to CONSUMING, isRelay: false
2021/03/31 07:31:41.892 INFO [MessageGenerationPhase] [HelixController-pipeline-default-traceable-views-(1de6a604_DEFAULT)] Event 1de6a604_DEFAULT : Message already exists for Server_pinot-server-2.pinot-server.traceable.svc.cluster.local_8098 to transit spanEventView_REALTIME.spanEventView__27__2251__20210331T0731Z from OFFLINE to CONSUMING, isRelay: false

Laxman Ch

03/31/2021, 1:09 PM

Pinot server logs

Copy code

2021/03/31 07:31:39.730 INFO [HelixTaskExecutor] [ZkClient-EventThread-22-zookeeper-0.zookeeper-headless.traceable.svc.cluster.local:2181,zookeeper-1.zookeeper-headless.traceable.svc.cluster.local:2181,zookeeper-2.zookeeper-headless.traceable.svc.cluster.local:2181/pinot] Scheduling message 6d73e84e-fa4d-4987-b026-7cb93397b7a9: spanEventView_REALTIME:spanEventView__27__2251__20210331T0731Z, OFFLINE->CONSUMING
2021/03/31 07:31:39.730 INFO [HelixStateTransitionHandler] [HelixTaskExecutor-message_handle_thread] handling message: 6d73e84e-fa4d-4987-b026-7cb93397b7a9 transit spanEventView_REALTIME.spanEventView__27__2251__20210331T0731Z|[] from:OFFLINE to:CONSUMING, relayedFrom: null
2021/03/31 07:31:39.811 INFO [HelixStateTransitionHandler] [HelixTaskExecutor-message_handle_thread] Instance Server_pinot-server-2.pinot-server.traceable.svc.cluster.local_8098, partition spanEventView__27__2251__20210331T0731Z received state transition from OFFLINE to CONSUMING on session 377be7c4e47028e, message id: 6d73e84e-fa4d-4987-b026-7cb93397b7a9
2021/03/31 07:31:39.811 INFO [SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel] [HelixTaskExecutor-message_handle_thread] SegmentOnlineOfflineStateModel.onBecomeConsumingFromOffline() : ZnRecord=6d73e84e-fa4d-4987-b026-7cb93397b7a9, {CREATE_TIMESTAMP=1617175899403, ClusterEventName=MessageChange, EXECUTE_START_TIMESTAMP=1617175899730, EXE_SESSION_ID=377be7c4e47028e, FROM_STATE=OFFLINE, MSG_ID=6d73e84e-fa4d-4987-b026-7cb93397b7a9, MSG_STATE=read, MSG_TYPE=STATE_TRANSITION, PARTITION_NAME=spanEventView__27__2251__20210331T0731Z, READ_TIMESTAMP=1617175899722, RESOURCE_NAME=spanEventView_REALTIME, RESOURCE_TAG=spanEventView_REALTIME, SRC_NAME=pinot-controller-1.pinot-controller.traceable.svc.cluster.local_9000, SRC_SESSION_ID=177be7a05650257, STATE_MODEL_DEF=SegmentOnlineOfflineStateModel, STATE_MODEL_FACTORY_NAME=DEFAULT, TGT_NAME=Server_pinot-server-2.pinot-server.traceable.svc.cluster.local_8098, TGT_SESSION_ID=377be7c4e47028e, TO_STATE=CONSUMING}{}{}, Stat=Stat {_version=0, _creationTime=1617175899694, _modifiedTime=1617175899694, _ephemeralOwner=0}
2021/03/31 07:31:39.811 INFO [SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel] [HelixTaskExecutor-message_handle_thread] SegmentOnlineOfflineStateModel.onBecomeOnlineFromOffline() : ZnRecord=6d73e84e-fa4d-4987-b026-7cb93397b7a9, {CREATE_TIMESTAMP=1617175899403, ClusterEventName=MessageChange, EXECUTE_START_TIMESTAMP=1617175899730, EXE_SESSION_ID=377be7c4e47028e, FROM_STATE=OFFLINE, MSG_ID=6d73e84e-fa4d-4987-b026-7cb93397b7a9, MSG_STATE=read, MSG_TYPE=STATE_TRANSITION, PARTITION_NAME=spanEventView__27__2251__20210331T0731Z, READ_TIMESTAMP=1617175899722, RESOURCE_NAME=spanEventView_REALTIME, RESOURCE_TAG=spanEventView_REALTIME, SRC_NAME=pinot-controller-1.pinot-controller.traceable.svc.cluster.local_9000, SRC_SESSION_ID=177be7a05650257, STATE_MODEL_DEF=SegmentOnlineOfflineStateModel, STATE_MODEL_FACTORY_NAME=DEFAULT, TGT_NAME=Server_pinot-server-2.pinot-server.traceable.svc.cluster.local_8098, TGT_SESSION_ID=377be7c4e47028e, TO_STATE=CONSUMING}{}{}, Stat=Stat {_version=0, _creationTime=1617175899694, _modifiedTime=1617175899694, _ephemeralOwner=0}
2021/03/31 07:31:39.811 INFO [HelixInstanceDataManager] [HelixTaskExecutor-message_handle_thread] Adding segment: spanEventView__27__2251__20210331T0731Z to table: spanEventView_REALTIME
2021/03/31 07:31:39.825 INFO [LLRealtimeSegmentDataManager_spanEventView__27__2251__20210331T0731Z] [HelixTaskExecutor-message_handle_thread] RealtimeDataResourceZKMetadata contains no information about sorted column for segment spanEventView__27__2251__20210331T0731Z

Laxman Ch

03/31/2021, 2:38 PM

Last log on server looks problem to me. Does this mean premature notification from controller to the server even before the actual metadata is committed to ZooKeeper. cc: @Xiang Fu

Laxman Ch

03/31/2021, 2:44 PM

Updated this thread with more info. Any thoughts?

Xiang Fu

03/31/2021, 6:42 PM

@Mayank this seems to be an issue that many people facing

Xiang Fu

03/31/2021, 6:44 PM

@Laxman Ch so the segment is committed and new segment is not created or it’s just the consuming segment thread got dropped, and segment is still in consuming status in idealstates ?

Laxman Ch

03/31/2021, 6:48 PM

Here from the above logs, segment thread died while creating a new segment.

segment is still in consuming status in idealstates

I checked external view and the server is missing in the list. I didn’t check the ideal state. Its happening every few hours for us. Will check next time provide details.

Laxman Ch

03/31/2021, 6:49 PM

@Xiang Fu @Mayank: Please zoom into this log. I feel this should not happen.

Copy code

2021/03/31 07:31:39.825 INFO [LLRealtimeSegmentDataManager_spanEventView__27__2251__20210331T0731Z] [HelixTaskExecutor-message_handle_thread] RealtimeDataResourceZKMetadata contains no information about sorted column for segment spanEventView__27__2251__20210331T0731Z

This is the last log line from the segment consumption thread before it dies.

Mayank

03/31/2021, 6:56 PM

Reading the thread

Mayank

03/31/2021, 6:59 PM

Are there other log files created (eg hs_err.log) in the log directory?

Laxman Ch

03/31/2021, 7:04 PM

No @Mayank. JVM didn’t crash. Only few consumption threads stops

Mayank

03/31/2021, 7:12 PM

Is this a fair summary:

Copy code

1. Once in a while some servers stop consuming.
2. Segment is in consuming in ideal-state
3. Server is missing from external view.
4. Logs seem to suggest that an attempt commit the segment may have caused this?

Laxman Ch

03/31/2021, 7:27 PM

Made it bit verbose to be clear. Sorry.

Copy code

1. Once in a while some servers stop consuming some of the assigned partitions (Say 2 out of 16 partitions of a table) stops consuming from Kafka.
2. Segment is in consuming in ideal-state (DON'T KNOW. WILL PROVIDE THESE DETAILS WHEN IT HAPPENS AGAIN)
3. Server is missing from external view ONLY for the segments where the consumption thread dies.
4. Logs seem to suggest that an attempt commit an OLD segment and start a NEW segment may have caused this.

Laxman Ch

03/31/2021, 7:31 PM

External view of the segment where consumption thread died on pinot-server-2.

Copy code

"spanEventView__27__2251__20210331T0731Z" : {
         "Server_pinot-server-1.pinot-server.traceable.svc.cluster.local_8098" : "CONSUMING"
}

External view of the segment where consumption thread is running fine on pinot-server-2.

Copy code

"spanEventView__13__2255__20210331T0620Z" : {
         "Server_pinot-server-1.pinot-server.traceable.svc.cluster.local_8098" : "CONSUMING",
         "Server_pinot-server-2.pinot-server.traceable.svc.cluster.local_8098" : "CONSUMING"
}

Xiang Fu

03/31/2021, 7:38 PM

oh, so it’s not all segments failed

Xiang Fu

03/31/2021, 7:39 PM

Then when the segment on server-1 is sealed and goes online, server-2 should download it from controller

Laxman Ch

03/31/2021, 7:43 PM

Nope. That does not happen. This gets fixed only after pinot-server-2 restart. If we do not restart, we lose partitions slowly and finally consuming partitions count on all servers comes down to zero.

Laxman Ch

04/01/2021, 5:46 AM

2. Segment is in consuming in ideal-state (DON’T KNOW. WILL PROVIDE THESE DETAILS WHEN IT HAPPENS AGAIN)

Yes. All segments are in CONSUMING in ideal state. But they are not there in external view. cc: @Mayank @Xiang Fu

Jackie

04/01/2021, 5:42 PM

@Laxman Ch How about the EV of the committed segment? E.g. if

spanEventView__13__2255__*

is the new consuming segment, what's the state of

spanEventView__13__2254___*

Jackie

04/01/2021, 5:43 PM

Also, how much resource is each Pinot server allocated? I feel the server might stuck at segment creation step

Laxman Ch

04/14/2021, 9:17 AM

Found the root-cause. This is possibly due to a bug in groovy. From thread dumps we see all message handlers are slowly going to the following state and stuck there.

Copy code

"HelixTaskExecutor-message_handle_thread" #51 daemon prio=5 os_prio=0 cpu=70457.28ms elapsed=4885.80s tid=0x00007fe4e43d6000 nid=0x6e waiting for monitor entry  [0x00007fe4aa6e5000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at org.codehaus.groovy.reflection.ClassInfo$GlobalClassSet.add(ClassInfo.java:477)
	- waiting to lock <0x0000000702ed2218> (a org.codehaus.groovy.util.ManagedLinkedList)
	at org.codehaus.groovy.reflection.ClassInfo$1.computeValue(ClassInfo.java:83)
	at org.codehaus.groovy.reflection.ClassInfo$1.computeValue(ClassInfo.java:79)
	at org.codehaus.groovy.reflection.GroovyClassValuePreJava7$EntryWithValue.<init>(GroovyClassValuePreJava7.java:37)
	at org.codehaus.groovy.reflection.GroovyClassValuePreJava7$GroovyClassValuePreJava7Segment.createEntry(GroovyClassValuePreJava7.java:64)
	at org.codehaus.groovy.reflection.GroovyClassValuePreJava7$GroovyClassValuePreJava7Segment.createEntry(GroovyClassValuePreJava7.java:55)
	at org.codehaus.groovy.util.AbstractConcurrentMap$Segment.put(AbstractConcurrentMap.java:120)
	at org.codehaus.groovy.util.AbstractConcurrentMap$Segment.getOrPut(AbstractConcurrentMap.java:100)
	at org.codehaus.groovy.util.AbstractConcurrentMap.getOrPut(AbstractConcurrentMap.java:38)
	at org.codehaus.groovy.reflection.GroovyClassValuePreJava7.get(GroovyClassValuePreJava7.java:94)
	at org.codehaus.groovy.reflection.ClassInfo.getClassInfo(ClassInfo.java:144)
	at org.codehaus.groovy.runtime.metaclass.MetaClassRegistryImpl.getMetaClass(MetaClassRegistryImpl.java:258)
	at org.codehaus.groovy.runtime.InvokerHelper.getMetaClass(InvokerHelper.java:883)
	at groovy.lang.GroovyObjectSupport.<init>(GroovyObjectSupport.java:34)
	at groovy.lang.Script.<init>(Script.java:42)
	at Script1.<init>(Script1.groovy)
	at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(java.base@11.0.10/Native Method)
	at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(java.base@11.0.10/NativeConstructorAccessorImpl.java:62)
	at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(java.base@11.0.10/DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(java.base@11.0.10/Constructor.java:490)
	at org.codehaus.groovy.runtime.InvokerHelper.createScript(InvokerHelper.java:431)
	at groovy.lang.GroovyShell.parse(GroovyShell.java:700)
	at groovy.lang.GroovyShell.parse(GroovyShell.java:736)
	at groovy.lang.GroovyShell.parse(GroovyShell.java:727)
	at org.apache.pinot.core.data.function.GroovyFunctionEvaluator.<init>(GroovyFunctionEvaluator.java:73)
	at org.apache.pinot.core.data.function.FunctionEvaluatorFactory.getExpressionEvaluator(FunctionEvaluatorFactory.java:91)
	at org.apache.pinot.core.data.function.FunctionEvaluatorFactory.getExpressionEvaluator(FunctionEvaluatorFactory.java:78)
	at org.apache.pinot.core.util.IngestionUtils.extractFieldsFromSchema(IngestionUtils.java:61)
	at org.apache.pinot.core.util.IngestionUtils.getFieldsForRecordExtractor(IngestionUtils.java:50)
	at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.<init>(LLRealtimeSegmentDataManager.java:1186)
	at org.apache.pinot.core.data.manager.realtime.RealtimeTableDataManager.addSegment(RealtimeTableDataManager.java:314)
	at org.apache.pinot.server.starter.helix.HelixInstanceDataManager.addRealtimeSegment(HelixInstanceDataManager.java:133)
	at org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeOnlineFromOffline(SegmentOnlineOfflineStateModelFactory.java:164)
	at org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeConsumingFromOffline(SegmentOnlineOfflineStateModelFactory.java:88)
	at jdk.internal.reflect.GeneratedMethodAccessor723.invoke(Unknown Source)
	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.10/DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(java.base@11.0.10/Method.java:566)
	at org.apache.helix.messaging.handling.HelixStateTransitionHandler.invoke(HelixStateTransitionHandler.java:404)
	at org.apache.helix.messaging.handling.HelixStateTransitionHandler.handleMessage(HelixStateTransitionHandler.java:331)
	- locked <0x000000072defa200> (a org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel)
	at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:97)
	at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49)
	at java.util.concurrent.FutureTask.run(java.base@11.0.10/FutureTask.java:264)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.10/ThreadPoolExecutor.java:1128)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.10/ThreadPoolExecutor.java:628)
	at java.lang.Thread.run(java.base@11.0.10/Thread.java:834)

Laxman Ch

04/14/2021, 9:20 AM

Looks like this is the bug we are hitting in Pinot 0.6.0 (groovy 2.4.8). We are trying the following • Workaround mentioned in GROOVY-8067 • Upgrade groovy library to 2.4.9 Will provide an update soon after the test today.

Xiang Fu

04/14/2021, 9:27 AM

thanks!

Xiang Fu

04/14/2021, 9:27 AM

please file an issue or submit a PR 😛

Laxman Ch

04/14/2021, 10:15 AM

Sure @Xiang Fu. I see groovy lib is already updated in 0.7. Will validate this on 0.7.1 as well.

Laxman Ch

04/14/2021, 10:16 AM

Anyone else experiencing the same problem, following are the scenarios to watch out for the following combination in your setup • Pinot server thread dumps stuck in groovy lib • You are using map type in your schema (columns with __KEYS && __VALUES) • You are using Pinot 0.6

Laxman Ch

04/14/2021, 9:49 PM

We are running with

-Dgroovy.use.classvalue=true

JVM argument as suggested in GROOVY-8067. Workaround seems to be working fine. No occurrence in last 12 hours run (Incident used to occur every 2-3 hours earlier). Will continue to use this as workaround and upgrade to 0.7.1 as needed. @Xiang Fu @Mayank: fyi

Mayank

04/14/2021, 9:50 PM

Thanks @Laxman Ch for confirming

Mayank

04/14/2021, 9:52 PM

@Xiang Fu Is there a way to retro update the 0.6 release notes, or document it in a way that other users using that release also get notified?

Xiang Fu

04/14/2021, 10:12 PM

we can update release notes by adding a section of existing issue and workaround

Suraj

04/15/2021, 12:44 AM

@Laxman Ch this article suggests that the workaround is not recommended as it could lead to memory leaks - https://github.com/fabric8-services/fabric8-tenant-jenkins/issues/77

Laxman Ch

04/15/2021, 3:22 AM

@Suraj: Verified the details of above bug.

workaround is not recommended as it could lead to memory leaks

• GROOVY-7591 is fixed in 2.4.5. We are on 2.4.8 • This is actually due to a JDK bug and that got fixed in JDK 9

Xiang Fu

04/15/2021, 3:53 AM

hmm, have you tried jdk 11 image?

Ravi Singal

04/15/2021, 4:16 AM

we use java 11 in our pinot image: https://github.com/hypertrace/pinot/blob/main/Dockerfile

Laxman Ch

04/15/2021, 4:31 AM

• We use JDK 11 as Ravi mentioned • Verified the heap usage. There is no leak observed in last 20 hours under production load with above groovy flag enabled.

Laxman Ch

04/15/2021, 6:20 AM

Pnot server heapusage looks stable for 1 day after enabling the above mentioned groovy no class use flag. Don’t see any signs of memory leak

Xiang Fu

04/15/2021, 7:06 AM

I see, so is there anything we need to fix from pinot side?

Laxman Ch

04/15/2021, 7:09 AM

No fix needed in 0.7 as we already upgraded groovy version. If there are plans to fix 0.6 branch we can fix it by upgrading the groovy. Not sure if we maintain/support old releases in Pinot project.

Mayank

04/15/2021, 1:51 PM

We should at least document this in the 0.6 page @Xiang Fu ?

Xiang Fu

04/15/2021, 5:07 PM

Or maybe FAQ

Mayank

04/15/2021, 5:07 PM

@Laxman Ch Would you be able to add it (since you have the most details)?

Laxman Ch

04/15/2021, 5:51 PM

I can do that.

Mayank

04/15/2021, 5:52 PM

thankyou

Open in Slack

Previous Next