4.4 Resource Manager Configuration

4.4.1 Linux Resource Manager Configuration

It is required that all resource managers support the same classes so Moab knows which classes are supported on which nodes. By default, MSMHPC reports the following queues:

Verify that all resource managers have these queues configured.

Note Queue names are case sensitive.

4.4.1.1 TORQUE Configuration

For instructions on how to install TORQUE, point your browser to the following URL:

http://www.adaptivecomputing.com/resources/docs/torque/a.ltorquequickstart.html

Ensure that the resource managers on both operating systems are set to start on bootup. For example, make sure the pbs_mom init script is installed and that it has been added to the default run level. It is also helpful to set the polling interval on polling resource managers fairly low. The more responsive the resource managers are, the more responsive Moab can be.

Moab must control walltime instead of TORQUE. For Moab to control the walltime, add a configuration directive to /var/spool/torque/mom_priv/config on all the compute nodes with the following:

> ignwalltime 1

The following additional queues must be configured for TORQUE to integrate with Moab Adaptive HPC Suite:

create queue HIGHEST
set queue HIGHEST queue_type = Execution
set queue HIGHEST resources_default.walltime = 01:00:00
set queue HIGHEST enabled = True
set queue HIGHEST started = True
create queue ABOVENORMAL
set queue ABOVENORMAL queue_type = Execution
set queue ABOVENORMAL resources_default.walltime = 01:00:00
set queue ABOVENORMAL enabled = True
set queue ABOVENORMAL started = True
create queue NORMAL
set queue NORMAL queue_type = Execution
set queue NORMAL resources_default.walltime = 01:00:00
set queue NORMAL enabled = True
set queue NORMAL started = True
create queue BELOWNORMAL
set queue BELOWNORMAL queue_type = Execution
set queue BELOWNORMAL resources_default.walltime = 01:00:00
set queue BELOWNORMAL enabled = True
set queue BELOWNORMAL started = True
create queue LOWEST
set queue LOWEST queue_type = Execution
set queue LOWEST resources_default.walltime = 01:00:00
set queue LOWEST enabled = True
set queue LOWEST started = True
set server default_queue = NORMAL

To submit jobs to TORQUE that will translate nodes to cores, ensure that TORQUE is aware it has the necessary resources by running the following:

qmgr -c 'set server resources_available.nodect = X'
Set X to a number greater than or equal to the total number of cores in your system. Failing to do so will cause jobs to fail during submission and produce the following output:

qsub: Job exceeds queue resource limits MSG=cannot locate feasible nodes.

4.4.1.2 Sun Grid Engine Configuration

Refer to the SGE integration instructions for details on integrating SGE with Moab. The following are additional instructions specific to integrating with Moab Adaptive HPC Suite.

Normal Moab/SGE installs require adding a complex variable to SGE. The qconf -mc command calls the assigned editor; add the following lines:

nodelist        nodelist   RESTRING  ==  YES       NO       NONE 0
opsys           os         RESTRING  ==  YES       NO       NONE 0

The second step is similar to example 5 in the SGE integration documentation, but needs to reflect the additional complex variable:

for i in `qconf -sel | sed 's/\..*//'`
do
  echo $i
  qconf -rattr exechost complex_values nodelist=$i,opsys=linux $i
done

Queues must be configured in SGE. To do so, use the following commands:

qconf -aq HIGHEST.q
qconf -aq ABOVENORMAL.q
qconf -aq NORMAL.q
qconf -aq BELOWNORMAL.q
qconf -aq LOWEST.q

4.4.2 Windows Resource Manager Configuration

In addition to the default priorities and queues mentioned, optional queues may be configured using job templates. Job templates are configured using the HPC Cluster Manager. Additionally, if you create queues in other resource managers, such as TORQUE or SGE, you must also configure them as job templates in Windows.

To do so, right click the HPC Cluster Manager Configuration Job Templates screen. The Job Template Wizard opens, and you may create the queue there. It is possible to limit the user options when creating the new template, but because Moab schedules the resources, any specific policies should be set in Moab so that it is safe to leave the default values.

To associate a job with a specific queue:

NoteJob templates in Windows must not contain spaces.

NoteThe nodes must be recached after a job template is created in order for MSMHPC to pick up the new template.

NoteYou may still use the five static queues from previous versions (HIGHEST, ABOVENORMAL, NORMAL, BELOWNORMAL and LOWEST) if the default job template is selected.

The following lines of code define the interface to the HPC resource manager and call the specified Perl scripts to perform any action on the HPC cluster. You must edit the moab.cfg file by adding the following lines, adjusting the paths to reflect your directory structure:

RMCFG[HPC]            TYPE=NATIVE:MSMHPC
RMCFG[HPC]            PARTITION=local
RMCFG[HPC]            NODESTATEPOLICY=OPTIMISTIC
RMCFG[HPC]            DEFOS=windows
RMCFG[HPC]            FLAGS=USERSPACEISSEPARATE
RMCFG[HPC]            ADMINEXEC=jobsubmit
RMCFG[HPC]            ENV=OSSTRING=windows;RMNAME=MSMHPC;PUBKEY=mypubkey;DOMAIN=yourdomain;PROXY=http://winhead:5343/MSMHPC
RMCFG[HPC]            CLUSTERQUERYURL=exec://$TOOLSDIR/cluster.query.hpc.pl
RMCFG[HPC]            WORKLOADQUERYURL=exec://$TOOLSDIR/workload.query.hpc.pl
RMCFG[HPC]            JOBSUBMITURL=exec://$TOOLSDIR/job.submit.hpc.pl
RMCFG[HPC]            JOBSTARTURL=exec://$TOOLSDIR/job.start.hpc.pl
RMCFG[HPC]            JOBCANCELURL=exec://$TOOLSDIR/job.cancel.hpc.pl
RMCFG[HPC]            JOBREQUEUEURL=exec://$TOOLSDIR/job.requeue.hpc.pl

NoteSetting the OSSTRING variable allows MSMHPC tools to report a custom operating system. This enables you to run multiple HPC resource managers. It is recommended to set each resource manager's DEFOS parameter to the same string set in the OSSTRING variable.

Copyright © 2011 Adaptive Computing Enterprises, Inc.®