Hi everyone, my pod for datahub-gms is constantly ...
# troubleshoot
s
Hi everyone, my pod for datahub-gms is constantly getting full where i get an error in my kube pod saying
Copy code
The node was low on resource: ephemeral-storage. Container datahub-gms was using 8220940Ki, which exceeds its request of 0.
Pod The node had condition: [DiskPressure].
Any ideas on where gms stores it's data? And why this could be happening?
@astonishing-answer-96712 @brainy-tent-14503
b
ephemeral-storage
is not something that is being set by our helm charts by default. Therefore it is likely inheriting a default from your namespace or something like that. You could populate values for your environment for this resource here. It depends on the activity but probably 1-2Gi should be enough.
s
@brainy-tent-14503 it is running out of data in both the
/
and
/etc/hosts
mount paths and both of those are
20gb
, so by nature of the application what is being stored there?
a
Definitely not writing to a file called
/etc/hosts
, only place I can think of is the logs in
/tmp/datahub/logs/gms
s
hmm let me log into the pod now and check
Also there is no
gms-log-cleaner
?
looks like the logs are only accounting for 6.2G not sure where the rest is coming from
a
The logs rotate and should not grow indefinitely, therefore there is generally no need to clean them. The various logging is defined here
s
@brainy-tent-14503 Hmm do you know what else could be taking up storage potentially
a
It should be possible to create your own configuration, mount it in your image, and use the jvm options to pass in the parameters to use your version.
Copy code
-Dlogback.configurationFile=file:/path/to/file
That’s it just the logs
s
Thank you!
@brainy-tent-14503 would you suggest adding another volume and mounting it on the same path?
b
A pvc/pv mounted at
/tmp
in the pod would work. Alternatively increasing the *node*’s root volume and not using a pvc/pv are both options.
s
How would I add that to the values file? Would it be extraVolume mounts then adding the path as “/tmp”
?
a
First make sure your cluster supports persistent volumes and has an appropriate CSI driver. Assuming you’d want to use dynamic provisioning, you’d create a persistent volume claim, docs. Then point the pvc to the pod’s filesystem location, docs, helm configured here and here.
s
@brainy-tent-14503 this is an error i see from my pod once i checked the logs
Copy code
2023-06-07 01:18:25.008:WARN:oejw.WebAppContext:main: Failed startup of context o.e.j.w.WebAppContext@4f80542f{/,null,STOPPED}{file:///datahub/datahub-gms/bin/war.war}
25
java.lang.IllegalStateException: Parent for temp dir not configured correctly: writeable=false
i elected to add a pvc
a
Try setting the fsGroup on this setting to the datahub group id of 101
s
@brainy-tent-14503 we are throwing alot at the gms pod itself, besides a pv what would be your suggestion for managing the logs and ensuring it dosent kill the pod each time? We ran our stuff overnight and for about 14hrs and the logs have already taken up about 16gb is there anyway to mediate this?
Is there anyway to tune the logs?
@brainy-tent-14503 sorry for the multiple messages but I am not 100% sure if the disk space issue is attributed to the logs in the datahub-gms pod this is the current mounts, however in the other screenshot when i get the space used in the same directory as the logs it only says it is utilizing 2.8G so just want to clarify that it is indeed the logs
b
GMS is only writing logs to disk and those logs are rotated so as not to exceed 10GB.
The
overlay
is not specific to the pod, it is specific to the host.
You should be able to execute
du -sh /
from inside the pod to see what is actually used by the running pod at that mount point
/
I believe.
s
okay i ran that command and this is what i got
b
I do not believe that the GMS pod at least is the source of your disk space issues. Perhaps one of the other pods, like the actions pod?
s
hmm
this is from the actions pod
is it possible to mount the pv to the
/
path just so I can give all the pods more storage?
also if gms pod is not the source of the issues then why does that pod always get evicted?
a
I am not sure what the rules are for eviction. It might be like the most recent writes or something. The root volume could be a pvc, however you’d have to add an init container to early mount the pvc and copy all files over. Otherwise the empty pvc would be mounted as the root filesystem and then nothing would work. However this is not really commonly done, see so. The typical solution is to increase the root volume on the nodes in your node group or node group’s template.
f
Hi, @steep-alligator-93593 even this is a bit old thread, out of curiosity, did you get the disk pressure issue solved? We are having the same problem now, and it is making our datahub quite unstable. Would be interesting to hear how it turned out for you?
s
Hey there @flaky-painting-93216 I was able to get the disk pressure issue solved by mounting an additional volume via a PersistentVolumeClaim. More specifically I mounted it on the
/tmp
directory via the
values.yaml
heres a snippet of my
values.yaml
below
Copy code
datahub-gms:
  enabled: true
  image:
    repository: linkedin/datahub-gms
    # tag: "v0.10.0 # defaults to .global.datahub.version
  resources:
    limits:
      memory: 3Gi
    requests:
      cpu: 100m
      memory: 2Gi
  service:
    type: ClusterIP
  extraVolumes:
    - name: data
      persistentVolumeClaim:
        claimName: edm-datahub-gms
  extraVolumeMounts:
    - mountPath: /tmp
      name: data
  podSecurityContext:
    fsGroup: 101
f
Thanks @steep-alligator-93593! We will probably go with the same if not able to find other way to reduce disk usage 👍