https://linen.dev logo
Join Discord
Powered by
# hpc
  • t

    timbuk2

    01/28/2025, 3:46 PM
    Any other advice about how to utilize this machine to it's full extent? I suppose I can try running 72, 96, etc cores and see if it shows any improvement over 64
  • t

    timbuk2

    01/28/2025, 3:50 PM
    wondering too if there aren't any ways to optimize the decomposition method, the numanodes or anything else
  • j

    Jeratan

    01/28/2025, 3:52 PM
    i haven't spent much effort attempting, but did you try
    --bind-to numa
    ?
  • t

    timbuk2

    01/28/2025, 4:01 PM
    Was just looking at the 4th gen release dates, seems like those processors started coming out in Nov 2022, and we got the final bid on this machine around Dec 2023. We were requesting they optimize it for OpenFOAM.... we should have included some better langauge about benchmarking.
  • t

    timbuk2

    01/28/2025, 4:02 PM
    I tried something similar, but I'll try that too, thanks
  • j

    Jeratan

    01/28/2025, 4:18 PM
    yep, even if they were limited to 3rd gen, the cheaper 7573X processors would have been a better bet
  • j

    Jeratan

    01/28/2025, 4:23 PM
    you can also try turning off SMT but in my experience it didn't change much and made other applications slower
  • u

    ⵣAryazⵣ

    02/02/2025, 12:25 PM
    We have an overset simulation that runs perfectly fine on a local machine in parallel with 6 CPUs. However, on the cluster with more CPUs (e.g., 64), it runs fine for the first few time steps but then gets stuck. Slurm reports that the job is still running, but the simulation output stops. I checked the output before the last line and found nothing suspicious. I also tried activating multiple debug flags, but to no avail. I tried to increase the RAM for each CPU
    --mem-per-cpu=3GB
    but that doesn't help. Any ideas on how to tackle this (without compiling OpenFOAM in debug mode, etc.)?
  • t

    tkeskita

    02/02/2025, 12:46 PM
    Write after every step to get fields before the forever-loop, see whats wrong in there?
  • t

    tkeskita

    02/02/2025, 12:48 PM
    How could it be caught in a forever-loop.. iterations have a max limit typically..?
  • u

    ⵣAryazⵣ

    02/02/2025, 12:48 PM
    the output is similar to this post: https://www.cfd-online.com/Forums/openfoam-community-contributions/228713-solved-wavedymfoam-stuck-after-calculating-motion-body.html
  • u

    ⵣAryazⵣ

    02/02/2025, 12:50 PM
    I have these debug switches on the system/controlDict:
    Copy code
    DebugSwitches
    {
    
      sixDoFRigidBodyMotionConstraint 3;
      overset 3;
      lduMatrix 3;
      SolverPerformance 3;
      inverseDistance       3;
      inverseFaceDistance 3;
      inversePointDistance 3;
      inverseVolume 3;
      vanLeer 3;
    
    
    }
  • u

    ⵣAryazⵣ

    02/02/2025, 12:54 PM
    @tkeskita : FWIW the case isn't mine, so I have a limited access to it. So far I recommended to activate those flags. But to find the appropriate one (if any) is tricky
  • u

    ⵣAryazⵣ

    02/02/2025, 12:55 PM
    I would like to know about a tool/utility to see what's happening exactly on the cluster.
  • u

    ⵣAryazⵣ

    02/02/2025, 1:00 PM
    But I agree with your recommendation
  • u

    ⵣAryazⵣ

    02/02/2025, 1:00 PM
    It seems that there is some infinite loop involved here
  • t

    tkeskita

    02/02/2025, 1:03 PM
    If you can't ssh to the compute node, then maybe some kind of post-mortem analysis. Maybe limit job run time in Slurm and try to catch error messages about what is was doing when it was killed? 🤔
  • u

    ⵣAryazⵣ

    02/02/2025, 1:03 PM
    It's is stuck at this output (copied from the link above because it is similar):
    Copy code
    6-DoF rigid body motion
        Centre of rotation: (0 0 -0.000748691)
        Centre of mass: (0 0 -0.000748691)
        Orientation: (1 0 0 0 1 0 0 0 1)
        Linear velocity: (0 0 -0.0381259)
        Angular velocity: (0 0 0)
  • t

    tkeskita

    02/02/2025, 1:04 PM
    No I mean if it would print some kind of stack trace.. but maybe it won't
  • u

    ⵣAryazⵣ

    02/02/2025, 1:05 PM
    the stderr is empty
  • t

    tkeskita

    02/02/2025, 1:08 PM
    No help then. Maybe try another decomposition method or number of processors? This is going to be like stabbing in the dark though...
  • u

    ⵣAryazⵣ

    02/02/2025, 1:09 PM
    Yeah, Let's see. I will report any progress in future
  • u

    ⵣAryazⵣ

    02/02/2025, 7:22 PM
    On the cluster, when the simulation is run with only 6 CPUs with
    --mem-per-cpu=5GB
    it works just fine, like the local machine.
  • g

    Gry Llida 🦗

    02/04/2025, 6:06 AM
    i like linux simply because of better multiuser support
  • s

    silentspirit

    04/10/2025, 5:36 PM
    I ran a simulation of wind turbine on hoc for different number of processors and the results i am getting is different for every selected number of processors. Any idea why this error is occurring?
  • s

    schrummy

    04/16/2025, 3:26 PM
    @silentspirit , what is the magnitude of this difference? You will get different results as you are changing the underlining linear equations of your system as you change the number of decompositions.
  • s

    silentspirit

    04/17/2025, 10:52 AM
    Almost 30-40% difference in values
  • s

    schrummy

    04/20/2025, 12:25 AM
    Is the geometry fairly complex? I have seen simulations that run stable on one core configuration but explode when using a different one.
  • t

    Tha_Hobbist

    04/29/2025, 10:33 AM
    I have some queries about how to install openfoam in a cluster that is built using OpenHPC
  • t

    Tha_Hobbist

    04/29/2025, 10:33 AM
    Can anyone guide me to some documents that I can refer to?