Moab Adaptive HPC
4.4 Resource Manager Configuration

4.4 Resource Manager Configuration

4.4.1 Linux Resource Manager Configuration

It is required that all resource managers support the same classes so Moab knows which classes are supported on which nodes. By default, MSMHPC reports the following queues:

  • HIGHEST
  • ABOVENORMAL
  • NORMAL
  • BELOWNORMAL
  • LOWEST

Verify that all resource managers have these queues configured.

Note Queue names are case sensitive.

4.4.1.1 TORQUE Configuration

For instructions on how to install TORQUE, point your browser to the following URL:

http://www.adaptivecomputing.com/resources/docs/torque/a.ltorquequickstart.php

Ensure that the resource managers on both operating systems are set to start on bootup. For example, make sure the pbs_mom init script is installed and that it has been added to the default run level. It is also helpful to set the polling interval on polling resource managers fairly low. The more responsive the resource managers are, the more responsive Moab can be.

Moab must control walltime instead of TORQUE. For Moab to control the walltime, add a configuration directive to /var/spool/torque/mom_priv/config on all the compute nodes with the following:

> ignwalltime 1

The following additional queues must be configured for TORQUE to integrate with Moab Adaptive HPC Suite:

create queue HIGHEST
set queue HIGHEST queue_type = Execution
set queue HIGHEST resources_default.walltime = 01:00:00
set queue HIGHEST enabled = True
set queue HIGHEST started = True

create queue ABOVENORMAL
set queue ABOVENORMAL queue_type = Execution
set queue ABOVENORMAL resources_default.walltime = 01:00:00
set queue ABOVENORMAL enabled = True
set queue ABOVENORMAL started = True

create queue NORMAL
set queue NORMAL queue_type = Execution
set queue NORMAL resources_default.walltime = 01:00:00
set queue NORMAL enabled = True
set queue NORMAL started = True

create queue BELOWNORMAL
set queue BELOWNORMAL queue_type = Execution
set queue BELOWNORMAL resources_default.walltime = 01:00:00
set queue BELOWNORMAL enabled = True
set queue BELOWNORMAL started = True

create queue LOWEST
set queue LOWEST queue_type = Execution
set queue LOWEST resources_default.walltime = 01:00:00
set queue LOWEST enabled = True
set queue LOWEST started = True

set server default_queue = NORMAL

To submit jobs to TORQUE that will translate nodes to cores, ensure that TORQUE is aware it has the necessary resources by running the following:

qmgr -c 'set server resources_available.nodect = X'
Set X to a number greater than or equal to the total number of cores in your system. Failing to do so will cause jobs to fail during submission and produce the following output:

qsub: Job exceeds queue resource limits MSG=cannot locate feasible nodes.

4.4.1.2 Sun Grid Engine Configuration

Refer to the SGE integration instructions for details on integrating SGE with Moab. The following are additional instructions specific to integrating with Moab Adaptive HPC Suite.

Normal Moab/SGE installs require adding a complex variable to SGE. The qconf -mc command calls the assigned editor; add the following lines:

nodelist        nodelist   RESTRING  ==  YES       NO       NONE 0
opsys           os         RESTRING  ==  YES       NO       NONE 0

The second step is similar to example 5 in the SGE integration documentation, but needs to reflect the additional complex variable:

for i in `qconf -sel | sed 's/\..*//'`
do
  echo $i
  qconf -rattr exechost complex_values nodelist=$i,opsys=linux $i
done

Queues must be configured in SGE. To do so, use the following commands:

qconf -aq HIGHEST.q
qconf -aq ABOVENORMAL.q
qconf -aq NORMAL.q
qconf -aq BELOWNORMAL.q
qconf -aq LOWEST.q

4.4.2 Windows Resource Manager Configuration

In addition to the default priorities and queues mentioned, optional queues may be configured using job templates. Job templates are configured using the HPC Cluster Manager. Additionally, if you create queues in other resource managers, such as TORQUE or SGE, you must also configure them as job templates in Windows.

To do so, right click the HPC Cluster Manager Configuration Job Templates screen. The Job Template Wizard opens, and you may create the queue there. It is possible to limit the user options when creating the new template, but because Moab schedules the resources, any specific policies should be set in Moab so that it is safe to leave the default values.

To associate a job with a specific queue:

  • If you are submitting the job from Windows, select the desired job template during job submission.
  • If you are submitting the job from Linux, specify the queue name during job submission.
    echo ping -n 100 localhost | msub
    -los=windows,walltime=100 -q Department 1
Note Job templates in Windows must not contain spaces.

Note The nodes must be recached after a job template is created in order for MSMHPC to pick up the new template.

Note You may still use the five static queues from previous versions (HIGHEST, ABOVENORMAL, NORMAL, BELOWNORMAL and LOWEST) if the default job template is selected.

The following lines of code define the interface to the HPC resource manager and call the specified Perl scripts to perform any action on the HPC cluster. You must edit the moab.cfg file by adding the following lines, adjusting the paths to reflect your directory structure:

RMCFG[HPC]            TYPE=NATIVE:MSMHPC
RMCFG[HPC]            PARTITION=local
RMCFG[HPC]            NODESTATEPOLICY=OPTIMISTIC
RMCFG[HPC]            DEFOS=windows
RMCFG[HPC]            FLAGS=USERSPACEISSEPARATE
RMCFG[HPC]            ADMINEXEC=jobsubmit
RMCFG[HPC]            ENV=OSSTRING=windows;RMNAME=MSMHPC;PUBKEY=mypubkey;DOMAIN=yourdomain;PROXY=http://winhead:5343/MSMHPC

RMCFG[HPC]            CLUSTERQUERYURL=exec://$TOOLSDIR/cluster.query.hpc.pl
RMCFG[HPC]            WORKLOADQUERYURL=exec://$TOOLSDIR/workload.query.hpc.pl
RMCFG[HPC]            JOBSUBMITURL=exec://$TOOLSDIR/job.submit.hpc.pl
RMCFG[HPC]            JOBSTARTURL=exec://$TOOLSDIR/job.start.hpc.pl
RMCFG[HPC]            JOBCANCELURL=exec://$TOOLSDIR/job.cancel.hpc.pl
RMCFG[HPC]            JOBREQUEUEURL=exec://$TOOLSDIR/job.requeue.hpc.pl

Note Setting the OSSTRING variable allows MSMHPC tools to report a custom operating system. This enables you to run multiple HPC resource managers. It is recommended to set each resource manager's DEFOS parameter to the same string set in the OSSTRING variable.