Hi everyone my pod for datahub gms is constantly getting ful DataHub #troubleshoot

Hi everyone, my pod for datahub-gms is constantly ...

steep-alligator-93593

05/16/2023, 10:31 PM

Hi everyone, my pod for datahub-gms is constantly getting full where i get an error in my kube pod saying

Copy code

The node was low on resource: ephemeral-storage. Container datahub-gms was using 8220940Ki, which exceeds its request of 0.
Pod The node had condition: [DiskPressure].

Any ideas on where gms stores it's data? And why this could be happening?

steep-alligator-93593

05/17/2023, 2:43 PM

@astonishing-answer-96712 @brainy-tent-14503

brainy-tent-14503

05/18/2023, 9:04 PM

ephemeral-storage

is not something that is being set by our helm charts by default. Therefore it is likely inheriting a default from your namespace or something like that. You could populate values for your environment for this resource here. It depends on the activity but probably 1-2Gi should be enough.

steep-alligator-93593

05/18/2023, 9:06 PM

@brainy-tent-14503 it is running out of data in both the

and

/etc/hosts

mount paths and both of those are

20gb

, so by nature of the application what is being stored there?

aloof-gpu-11378

05/18/2023, 9:12 PM

Definitely not writing to a file called

/etc/hosts

, only place I can think of is the logs in

/tmp/datahub/logs/gms

steep-alligator-93593

05/18/2023, 9:13 PM

hmm let me log into the pod now and check

steep-alligator-93593

05/18/2023, 9:14 PM

Also there is no

gms-log-cleaner

steep-alligator-93593

05/18/2023, 9:16 PM

looks like the logs are only accounting for 6.2G not sure where the rest is coming from

aloof-gpu-11378

05/22/2023, 5:37 PM

The logs rotate and should not grow indefinitely, therefore there is generally no need to clean them. The various logging is defined here

steep-alligator-93593

05/22/2023, 5:37 PM

@brainy-tent-14503 Hmm do you know what else could be taking up storage potentially

aloof-gpu-11378

05/22/2023, 5:38 PM

It should be possible to create your own configuration, mount it in your image, and use the jvm options to pass in the parameters to use your version.

Copy code

-Dlogback.configurationFile=file:/path/to/file

aloof-gpu-11378

05/22/2023, 5:38 PM

That’s it just the logs

steep-alligator-93593

05/22/2023, 5:39 PM

Thank you!

steep-alligator-93593

06/06/2023, 6:12 PM

@brainy-tent-14503 would you suggest adding another volume and mounting it on the same path?

brainy-tent-14503

06/06/2023, 8:39 PM

A pvc/pv mounted at

/tmp

in the pod would work. Alternatively increasing the *node*’s root volume and not using a pvc/pv are both options.

steep-alligator-93593

06/06/2023, 9:38 PM

How would I add that to the values file? Would it be extraVolume mounts then adding the path as “/tmp”

steep-alligator-93593

06/06/2023, 9:38 PM

aloof-gpu-11378

06/06/2023, 9:59 PM

First make sure your cluster supports persistent volumes and has an appropriate CSI driver. Assuming you’d want to use dynamic provisioning, you’d create a persistent volume claim, docs. Then point the pvc to the pod’s filesystem location, docs, helm configured here and here.

steep-alligator-93593

06/07/2023, 1:19 AM

@brainy-tent-14503 this is an error i see from my pod once i checked the logs

Copy code

2023-06-07 01:18:25.008:WARN:oejw.WebAppContext:main: Failed startup of context o.e.j.w.WebAppContext@4f80542f{/,null,STOPPED}{file:///datahub/datahub-gms/bin/war.war}
25
java.lang.IllegalStateException: Parent for temp dir not configured correctly: writeable=false

steep-alligator-93593

06/07/2023, 1:20 AM

i elected to add a pvc

aloof-gpu-11378

06/07/2023, 2:34 AM

Try setting the fsGroup on this setting to the datahub group id of 101

steep-alligator-93593

06/07/2023, 3:36 PM

@brainy-tent-14503 we are throwing alot at the gms pod itself, besides a pv what would be your suggestion for managing the logs and ensuring it dosent kill the pod each time? We ran our stuff overnight and for about 14hrs and the logs have already taken up about 16gb is there anyway to mediate this?

steep-alligator-93593

06/07/2023, 3:42 PM

Is there anyway to tune the logs?

steep-alligator-93593

06/07/2023, 5:42 PM

@brainy-tent-14503 sorry for the multiple messages but I am not 100% sure if the disk space issue is attributed to the logs in the datahub-gms pod this is the current mounts, however in the other screenshot when i get the space used in the same directory as the logs it only says it is utilizing 2.8G so just want to clarify that it is indeed the logs

brainy-tent-14503

06/07/2023, 7:47 PM

GMS is only writing logs to disk and those logs are rotated so as not to exceed 10GB.

brainy-tent-14503

06/07/2023, 7:48 PM

The

overlay

is not specific to the pod, it is specific to the host.

brainy-tent-14503

06/07/2023, 7:49 PM

You should be able to execute

du -sh /

from inside the pod to see what is actually used by the running pod at that mount point

I believe.

steep-alligator-93593

06/07/2023, 7:52 PM

okay i ran that command and this is what i got

brainy-tent-14503

06/07/2023, 7:56 PM

I do not believe that the GMS pod at least is the source of your disk space issues. Perhaps one of the other pods, like the actions pod?

steep-alligator-93593

06/07/2023, 7:59 PM

hmm

steep-alligator-93593

06/07/2023, 7:59 PM

this is from the actions pod

steep-alligator-93593

06/07/2023, 8:11 PM

is it possible to mount the pv to the

path just so I can give all the pods more storage?

steep-alligator-93593

06/07/2023, 8:11 PM

also if gms pod is not the source of the issues then why does that pod always get evicted?

aloof-gpu-11378

06/07/2023, 8:35 PM

I am not sure what the rules are for eviction. It might be like the most recent writes or something. The root volume could be a pvc, however you’d have to add an init container to early mount the pvc and copy all files over. Otherwise the empty pvc would be mounted as the root filesystem and then nothing would work. However this is not really commonly done, see so. The typical solution is to increase the root volume on the nodes in your node group or node group’s template.

flaky-painting-93216

08/11/2023, 12:55 PM

Hi, @steep-alligator-93593 even this is a bit old thread, out of curiosity, did you get the disk pressure issue solved? We are having the same problem now, and it is making our datahub quite unstable. Would be interesting to hear how it turned out for you?

steep-alligator-93593

08/11/2023, 5:02 PM

Hey there @flaky-painting-93216 I was able to get the disk pressure issue solved by mounting an additional volume via a PersistentVolumeClaim. More specifically I mounted it on the

/tmp

directory via the

values.yaml

steep-alligator-93593

08/11/2023, 5:03 PM

heres a snippet of my

values.yaml

below

Copy code

datahub-gms:
  enabled: true
  image:
    repository: linkedin/datahub-gms
    # tag: "v0.10.0 # defaults to .global.datahub.version
  resources:
    limits:
      memory: 3Gi
    requests:
      cpu: 100m
      memory: 2Gi
  service:
    type: ClusterIP
  extraVolumes:
    - name: data
      persistentVolumeClaim:
        claimName: edm-datahub-gms
  extraVolumeMounts:
    - mountPath: /tmp
      name: data
  podSecurityContext:
    fsGroup: 101

flaky-painting-93216

08/12/2023, 10:59 AM

Thanks @steep-alligator-93593! We will probably go with the same if not able to find other way to reduce disk usage 👍

Open in Slack

Previous Next