https://linen.dev logo
Join Discord
Powered by
# hpc
  • u

    ⵣAryazⵣ

    02/02/2025, 12:54 PM
    @tkeskita : FWIW the case isn't mine, so I have a limited access to it. So far I recommended to activate those flags. But to find the appropriate one (if any) is tricky
  • u

    ⵣAryazⵣ

    02/02/2025, 12:55 PM
    I would like to know about a tool/utility to see what's happening exactly on the cluster.
  • u

    ⵣAryazⵣ

    02/02/2025, 1:00 PM
    But I agree with your recommendation
  • u

    ⵣAryazⵣ

    02/02/2025, 1:00 PM
    It seems that there is some infinite loop involved here
  • t

    tkeskita

    02/02/2025, 1:03 PM
    If you can't ssh to the compute node, then maybe some kind of post-mortem analysis. Maybe limit job run time in Slurm and try to catch error messages about what is was doing when it was killed? 🤔
  • u

    ⵣAryazⵣ

    02/02/2025, 1:03 PM
    It's is stuck at this output (copied from the link above because it is similar):
    Copy code
    6-DoF rigid body motion
        Centre of rotation: (0 0 -0.000748691)
        Centre of mass: (0 0 -0.000748691)
        Orientation: (1 0 0 0 1 0 0 0 1)
        Linear velocity: (0 0 -0.0381259)
        Angular velocity: (0 0 0)
  • t

    tkeskita

    02/02/2025, 1:04 PM
    No I mean if it would print some kind of stack trace.. but maybe it won't
  • u

    ⵣAryazⵣ

    02/02/2025, 1:05 PM
    the stderr is empty
  • t

    tkeskita

    02/02/2025, 1:08 PM
    No help then. Maybe try another decomposition method or number of processors? This is going to be like stabbing in the dark though...
  • u

    ⵣAryazⵣ

    02/02/2025, 1:09 PM
    Yeah, Let's see. I will report any progress in future
  • u

    ⵣAryazⵣ

    02/02/2025, 7:22 PM
    On the cluster, when the simulation is run with only 6 CPUs with
    --mem-per-cpu=5GB
    it works just fine, like the local machine.
  • g

    Gry Llida 🦗

    02/04/2025, 6:06 AM
    i like linux simply because of better multiuser support
  • s

    silentspirit

    04/10/2025, 5:36 PM
    I ran a simulation of wind turbine on hoc for different number of processors and the results i am getting is different for every selected number of processors. Any idea why this error is occurring?
  • s

    schrummy

    04/16/2025, 3:26 PM
    @silentspirit , what is the magnitude of this difference? You will get different results as you are changing the underlining linear equations of your system as you change the number of decompositions.
  • s

    silentspirit

    04/17/2025, 10:52 AM
    Almost 30-40% difference in values
  • s

    schrummy

    04/20/2025, 12:25 AM
    Is the geometry fairly complex? I have seen simulations that run stable on one core configuration but explode when using a different one.
  • t

    Tha_Hobbist

    04/29/2025, 10:33 AM
    I have some queries about how to install openfoam in a cluster that is built using OpenHPC
  • t

    Tha_Hobbist

    04/29/2025, 10:33 AM
    Can anyone guide me to some documents that I can refer to?
  • i

    ilovekiruna

    05/25/2025, 4:15 PM
    @Tha_Hobbist we generally use the spack package manager to install our software (spack.io)
  • m

    muehhlllerr

    05/28/2025, 3:06 PM
    I have an Simulation with an overset mesh. A simplified version of it, with only one overset zone, runs for the configurations: non parallel, parallel but single node. Now if I expand it by adding another zone, it doesnt even run any more in parallel, just in non parallel. The error with which it crashes is a malloc_consolidate(): unaligned fastbin chunk detected error. I guess it is either a MPI problem, or a problem with the decomposition, for which I use hierarchical, as recommended for overset cases. OpenFoam v2406 is compiled with the system MPI and not the third party one. Does anybody have any tipps/insights? I have not found any information regarding this malloc error in connection with openFoam
  • m

    muehhlllerr

    05/28/2025, 3:23 PM
    It runns in parallel on my local machine so I am quite sure it is an mpi problem
  • s

    slopezcastano

    06/03/2025, 11:41 AM
    It is not an MPI error, but a METIS error
  • s

    slopezcastano

    06/03/2025, 11:42 AM
    Have you re-compiled METIS or updated libc.so lately?
  • s

    slopezcastano

    06/03/2025, 11:53 AM
    Try using another partition algorithm, like Scotch or just hierarchical to test
  • m

    muehhlllerr

    06/05/2025, 7:45 AM
    Thank you for the advice I will look into it
  • m

    muehhlllerr

    06/05/2025, 7:46 AM
    No not that I know of
  • s

    slopezcastano

    06/05/2025, 1:19 PM
    Or has the HPC migrated to new (gcc > 11) compilers?
  • m

    muehhlllerr

    06/05/2025, 1:58 PM
    nothing of that sort the setup hasent been changed for quite a while and the problem only occurs with an overset mesh
  • m

    muehhlllerr

    06/05/2025, 2:00 PM
    and also not with every setup its just with some which is weird, and it tends to appear more with the leastSquares mapping instead of interpolation, so maybe it occures when certain cells are "linked" together accross processor boundaries
  • m

    muehhlllerr

    06/05/2025, 2:00 PM
    ill investigate it when I the next slot