Avoiding Out of Memory (OOM) collateral damage in Torque

Author: L.S.Lowe. File: oomtorque. This update: 20120101. Part of Guide to the Local System.

Out of memory conditions are an occasional problem when running user code on worker machines such as in Torque batch jobs. Difficulties can arise if the kernel OOM killer chooses to kill processes other than those which caused the out-of-memory condition, such as the sshd, syslog, acpid, or crond daemons, or even the pbs_mom main process itself! In that case, you can end up with an apparently unresponsive node, and power-cycling is the only resort.

This problem can be controlled to some extent by applying Torque memory resource limits, but this doesn't solve all such problems in my experience.

There are various ways of doing this, but my preferred administrator solution is to add a couple of lines to the Torque prologue shell script to adjust the oom setting of its parent process, to make dependent user processes more susceptible to being killed than system processes. The prologue is run (under the root ID) when each job starts. The parent of the prologue is a daughter of the main pbs_mom process and then goes on to invoke the user job. All user job processes then inherit the adjusted oom setting.

  # Make this user job more likely to be chosen by oom-killer than system processes
  oom=/proc/$PPID/oom_score_adj
  [ -w $oom ] && echo 500 > $oom

For older kernels that don't have /proc/1/oom_score_adj but do have /proc/1/oom_adj, now deprecated, the equivalent is:

  # Make this user job more likely to be chosen by oom-killer than system processes
  oom=/proc/$PPID/oom_adj
  [ -w $oom ] && echo 7 > $oom

L.S.Lowe