TORQUE Resource Manager
7.1 MPI (Message Passing Interface) Support

7.1 MPI (Message Passing Interface) Support

7.1.1 MPI (Message Passing Interface) Overview

A message passing library is used by parallel jobs to augment communication between the tasks distributed across the cluster. TORQUE can run with any message passing library and provides limited integration with some MPI libraries.

7.1.2 MPICH

One of the most popular MPI libraries is MPICH available from Argonne National Lab. If using this release, you may want to consider also using the mpiexec tool for launching MPI applications. Support for mpiexec has been integrated into TORQUE.

MPIExec Overview

mpiexec is a replacement program for the script mpirun, which is part of the mpich package. It is used to initialize a parallel job from within a PBS batch or interactive environment. mpiexec uses the task manager library of PBS to spawn copies of the executable on the nodes in a PBS allocation.

Reasons to use mpiexec rather than a script (mpirun) or an external daemon (mpd):

  • Starting tasks with the task manager (TM) interface is much faster than invoking a separate rsh * once for each process.
  • Resources used by the spawned processes are accounted correctly with mpiexec, and reported in the PBS logs, because all the processes of a parallel job remain under the control of PBS, unlike when using mpirun-like scripts.
  • Tasks that exceed their assigned limits of CPU time, wallclock time, memory usage, or disk space are killed cleanly by PBS. It is quite hard for processes to escape control of the resource manager when using mpiexec.
  • You can use mpiexec to enforce a security policy. If all jobs are forced to spawn using mpiexec and the PBS execution environment, it is not necessary to enable rsh or ssh access to the compute nodes in the cluster.

See the mpiexec homepage for more information.

MPIExec Troubleshooting

Although problems with mpiexec are rare, if issues do occur, the following steps may be useful:

  • determine current version using mpiexec --version and review the change log available on the MPI homepage to determine if the reported issue has already been corrected
  • send email to the mpiexec mailing list at mpiexec@osc.edu
  • browse the mpiexec user list archives for similar problems and resolutions
  • read the FAQ contained in the README file and the mpiexec man pages contained within the mpiexec distribution
  • increase the logging of mpiexec operation with mpiexec --verbose (reports messages to stderr)
  • increase logging of the master and slave resource manager execution daemons associated with the job (with TORQUE, use $loglevel to 5 or higher in $TORQUEROOT/mom_priv/config and look for 'tm' messages after associated join job messages).
  • use tracejob (included with TORQUE) or qtracejob (included with OSC's pbstools package) to isolate failures within the cluster.
  • if the message 'exec: Error: get_hosts: pbs_connect: Access from host not allowed, or unknown host' appears, this indicates that mpiexec cannot communicate with the pbs_server daemon. In most cases, this indicates that the '$TORQUEROOT/server_name' file points to the wrong server or the node cannot resolve the server's name. The qstat command can be run on the node to test this.

General MPI Troubleshooting

When using MPICH, some sites have issues with orphaned MPI child processes remaining on the system after the master MPI process has been terminated. To address this, TORQUE epilogue scripts can be created that properly clean up the orphaned processes.

7.1.3 MPICH-VMI

MPICH-VMI is a highly-optimized open-source message passing layer available from NCSA. Additional information can be found in the VMI tutorial.

7.1.4 Open MPI

Open MPI is a new MPI implementation that combines technologies from multiple projects to create the best possible library. It supports the TM interface for intergration with TORQUE. More inforamtion is available in the FAQ.