Installation Notes for Moab and Torque on the Cray XT

Appendices > Appendix G: Integrating Other Resources with Moab > Compute Resource Managers > Installation Notes for Moab and Torque on the Cray XT

Conventions

Installation Notes for Moab and TORQUE for Cray

This document provides information on the steps to install Moab 7.2.0 and TORQUE 4.1.0 on a Cray XT system.

Overview

Moab and TORQUE can be used to manage the batch system for Cray. This document describes how to configure Moab and TORQUE to bring Moab's unmatched scheduling capabilities to the Cray.

New to TORQUE 4.1, TORQUE now handles all communication with ALPS, specifically the pbs_mom. Previously, communication with ALPS was handled by a combination of Moab, scripts and TORQUE. In the new model, Moab treats TORQUE as a regular TORQUE cluster without any special configuration. TORQUE now uses an extra MOM called the alps_reporter MOM to communicate with ALPS regarding configured and available resources. From the information reported by the alps_reporter mom, TORQUE creates a virtual node for each Cray compute node. Previously, TORQUE only reported the login nodes.

Note: For clarity this document assumes that your SDB node is mounting a persistent /var file system from the bootnode. If you have chosen not to use persistent /var file systems please be aware that the instructions below would have to be modified for your situation.

Upgrade Notes

When upgrading to TORQUE 4.1.0 and using the new Cray model as described in this document, there should be no running jobs. Jobs may be queued but not running.

Torque Installation Notes

Perform the following steps from the boot node as root:

Many of the following examples reflect a specific setup and must be modified to fit your unique configuration.

Download the latest TORQUE release.

Example 1. Download TORQUE

# cd /rr/current/software
# wget http://www.adaptivecomputing.com/resources/downloads/torque/torque-4.1.0.tar.gz

Unpack the TORQUE tarball in an xtopview session

Using xtopview, unpack the TORQUE tarball into the software directory in the shared root.

Example 2. Unpack TORQUE

# xtopview
default/:/ # cd /software
default/:/software # tar -zxvf torque-4.1.0.tar.gz

Configure TORQUE

While still in xtopview, run configure with the options set appropriately for your installation. Run ./configure --help to see a list of configure options. Adaptive Computing recommends installing the TORQUE binaries into /opt/torque/$version and establishing a symbolic link to it from /opt/torque/default. At a minimum, you will need to specify the host name where the TORQUE server will run (--with-default-server) if it is different from the host it is being compiled on. The TORQUE server has typically been on the SDB node of your Cray system.

Example 3. Run configure

default/:/software # cd torque-4.1.0
default/:/software/torque-4.1.0 # ./configure --prefix=/opt/torque/4.1.0 --with-server-home=/var/spool/torque --with-default-server=sdb --enable-syslog --disable-gcc-warnings --with-debug --with-modulefiles=/opt/modulefiles --with-job-create CFLAGS="-DCRAY_MOAB_PASSTHRU"

Note: The --with-job-create is a change for TORQUE 2.5.9 onwards. This is not necessary on 2.4.16. Sites running TORQUE 2.5.x should upgrade to 2.5.9 or later.

Note: The -DCRAY_MOAB_PASSTHRU option tells TORQUE to not validate the qsub -l nodes syntax. For more information, see Submitting Jobs.

Compile and Install TORQUE

You must unload the current module:

# module unload moab torque

As xtopview may also load the old version of Moab and TORQUE, it is good practice to unload and load after the install so that you have the correct binary in your path.

While still in xtopview, compile and install TORQUE into the shared root. Create a link to the installed TORQUE. Exit xtopview.

Example 4. Make and Make Install

default/:/software/torque-4.1.0 # make
default/:/software/torque-4.1.0 # make packages
default/:/software/torque-4.1.0 # make install
default/:/software/torque-4.1.0 # ln -sf /opt/torque/4.1.0/ /opt/torque/default
default/:/software/torque-4.1.0 # exit

After installing, run module list to see what versions you have. If the versions are incorrect, unload and load to confirm you are using the correct versions.

Copy your TORQUE server directory to your moab server host

Example 5. On the boot node, copy the TORQUE home directory to the SDB node's persistent /var file system (as exported from the bootnode). This example assumes that the SDB is NID 3 and that you are installing it on the SDB. These instructions need to be modified if the Moab and TORQUE servers are being installed on a different node.

# cd /rr/current/var/spool
# cp -pr torque /snv/3/var/spool

Set up pbs_server to be Cray compatible

Customize the nodes file located in <TORQUE HOME>/server_priv/nodes.
1. Identify the reporter MOM with the reserved feature alps_reporter.
2. Identify all login nodes using the reserved feature alps_login.
Identifying these moms as login nodes allows pbs_server to verify that each job has a login node as its mother superior. It also tells pbs_server to place size=0 jobs on one of these login nodes.
Set up access and submit permissions from your login nodes.

$ qmgr -c 'set server acl_host_enable=true'

$ qmgr -c 'set server acl_hosts+=login1'

$ qmgr -c 'set server acl_hosts+=login2'

$ qmgr -c 'set server acl_hosts+=login3'

$ qmgr -c 'set server submit_hosts+=login1'

$ qmgr -c 'set server submit_hosts+=login2'

$ qmgr -c 'set server submit_hosts+=login3'

Set some helpful server parameters.

$ qmgr -c 'set server scheduling = true'

This parameter tells pbs_server to notify Moab when pertinent events have happened. If this isn't set, Moab will automatically set it on startup.

$ qmgr -c 'set server keep_completed = 300'

This tells TORQUE to keep information about completed jobs for 300 seconds (5 minutes) after they have completed. You can customize this number to meet your site's needs.

Set the server attribute cray_enabled to True.

$ qmgr -c 'set server cray_enabled = true'

After using qmgr to set this variable, you will need to restart so that when pbs_server parses the nodes file, it will know it is Cray-enabled.

Install the pbs_server init.d script on the server (Optional)

TORQUE provides an init.d script for starting pbs_server as a service.

Example 6. Copy in init.d script

# xtopview -n <sdb nid>
node/<sdb nid>:/ # cp /software/torque-4.1.0/contrib/init.d/suse.pbs_server /etc/init.d
node/<sdb nid>:/ # chmod +x /etc/init.d/pbs_server
node/<sdb nid>:/ # chkconfig --add pbs_server

Edit the init.d file as necessary -- i.e., change PBS_DAEMON and PBS_HOME as appropriate.

# vi /etc/init.d/pbs_server
PBS_DAEMON=/opt/torque/default/sbin/pbs_server PBS_HOME=/var/spool/torque

Install the pbs_mom init.d script on the login nodes (Optional)

TORQUE provides an init.d script for starting pbs_mom as a service.

Example 7. Copy in init.d script

# xtopview 
default/:/ # cp /software/torque-4.1.0/contrib/init.d/suse.pbs_mom /etc/init.d
default/:/ # chmod +x /etc/init.d/pbs_mom
default/:/ # chkconfig --add pbs_mom

Edit the init.d file as necessary -- i.e. change PBS_DAEMON and PBS_HOME as appropriate, retain core files, etc.

# vi /etc/init.d/pbs_mom
PBS_DAEMON=/opt/torque/default/sbin/pbs_mom PBS_HOME=/var/spool/torque

Uncomment the following line to retain core dump files:

ulimit -c unlimited # Uncomment this to preserve core files

Install the trqauthd init.d script on all TORQUE nodes and the SDB (Optional)

Torque provides an init.d script for starting trqauthd as a service.

Example 8. Copy in init.d script

# xtopview
default/:/ # cp /software/torque-4.1.0/contrib/init.d/suse.trqauthd /etc/init.d
default/:/ # chmod +x /etc/init.d/trqauthd
default/:/ # chkconfig --add trqauthd

Edit the init.d file as necessary -- i.e. change PBS_DAEMON and PBS_HOME as appropriate.

# vi /etc/init.d/trqauthd
PBS_DAEMON=/opt/torque/default/sbin/trqauthd PBS_HOME=/var/spool/torque

Stage out MOM dirs to login nodes

Stage out the MOM dirs and client server info on all login nodes. This example assumes you are using a persistent /var file systems mounted from /snv on the boot node. Alternatively, a ram var file system must be populated by a skeleton tarball on the bootnode (/rr/current/.shared/var-skel.tgz) into which these files must be added. The example below assumes that you have 3 login nodes with nids of 4, 64 and 68. Place the host name of the SDB node in the server_name file.

Example 9. Copy out MOM dirs and client server info

# cd /rr/current/software/torque-4.1.0/tpackages/mom/var/spool
# for i in 4 64 68: \
     do cp -pr torque /snv/$i/var/spool; \
     echo sdb > /snv/$i/var/spool/torque/server_name; \
     done

Note: It is possible that the host name for the SDB node is not set to SDB on your system. Run ssh sdb hostname to determine the host name in use. If the command returns, for example, sdb-p1, modify the "for loop" above to echo sdb-p1 into the server_name file.

Update the TORQUE MOM config file for the ALPS reporter mom

In the above steps, we identified the ALPS reporter MOM on the pbs_server. We now need to configure the MOM to be the ALPS reporter mom. The ALPS reporter MOM is installed on the SDB. To configure the ALPS reporter mom, set the following in the pbs_mom config file on the SDB:

# vi var/spool/torque/mom_priv/config
$reporter_mom true # defaults to false

You may also wish to set these variables:

$apbasil_path <path_to_apbasil> # defaults to /usr/bin/apbasil if not set

$apbasil_protocol <protocol> # defaults to 1.0 if not set

As of CLE 5.0, apbasil is in the /opt/cray/alps/default/bin/apbasil directory, not /usr/bin/apbasil. Supported apbasil protocols are 1.0, 1.1, and 1.2.

Cray systems do not support GPUs until ALPS version 1.2. Setting $apbasil_protocol 1.2 in mom_priv/config causes the GPU status to appear in the pbsnodes output.

Update the TORQUE MOM config file on each login node

Login nodes are service nodes running pbs_moms which are used for submission and launching of job scripts. Login nodes are responsible for creating and confirming ALPS reservations so that the script launched on a login node can access the compute nodes with the aprun command.

Edit the MOM config file so job output is copied to locally mounted directories.

Example 10. Edit the MOM config file

# vi var/spool/torque/mom_priv/config
			
$usecp *:/home/users /home/users 
$usecp *:/scratch /scratch

$login_node true

$login_node specifies that this node will create and confirm ALPS reservations.

Note: It may be acceptable to use a $usecp *:/ / in place of the sample above. Consult with the site.

You may also wish to set these variables:

$apbasil_path <path_to_apbasil> # defaults to /usr/bin/apbasil if not set

$apbasil_protocol <protocol> # defaults to 1.0 if not set

As of CLE 5.0, apbasil is in the /opt/cray/alps/default/bin/apbasil directory, not /usr/bin/apbasil. Supported apbasil protocols are 1.0, 1.1, and 1.2.

Start up the TORQUE MOM Daemons

On the boot node as root:

Example 11. Start up the pbs_moms on the login nodes.

# pdsh -w sdb,login[1-3] /opt/torque/default/sbin/pbs_mom
# pdsh -w login[1-3] trqauthd

Alternatively, if you installed the init.d script, you may run:

# pdsh -w sdb,login[1-3] /sbin/service pbs_mom start
# pdsh -w login[1-3] service trqauthd start

Startup the TORQUE Server

On the TORQUE server host as root:

Example 12. Start pbs_server

# /opt/torque/default/sbin/pbs_server
# /opt/torque/default/sbin/trqauthd

Alternatively, if you installed the init.d script, you may run:

# service pbs_server start
# service trqauthd start

Moab Installation Notes

Perform the following steps from the boot node as root:

Download the latest Moab release

Download the latest Moab release from Adaptive Computing Enterprises, Inc.

The correct tarball to install is the plain Moab & TORQUE builds. The XT4 builds are for releases prior to TORQUE 4.1.0 and MWM 7.2.0.

Example 13. Download Moab to the boot node

# cd /rr/current/software
wget --post-data="username=<username>&password=<password>&submit=submit&url=/download/mwm/moab-7.2.0-linux-x86_64-torque.tar.gz" https://www.adaptivecomputing.com/myaccount/login.php;

Unpack the Moab tarball

Using xtopview, unpack the Moab tarball into the software directory in the shared root.

Example 14. Unpack Moab

# xtopview
default/:/ # cd /software
default/:/software # tar -zxvf moab-7.2.0-linux-x86_64-torque.tar.gz

Configure Moab

While still in xtopview, run configure with the options set appropriately for your installation. Run ./configure —help to see a list of configure options. Adaptive Computing recommends installing the Moab binaries into /opt/moab/$version and establishing a symbolic link to it from /opt/moab/default. Since the Moab home directory must be read-write by root, Adaptive Computing recommends you specify the homedir in a location such as /var/spool/moab.

Example 15. Run configure

default/:/software # cd moab-7.2.0
default/:/software/moab-7.2.0 # ./configure --prefix=/opt/moab/7.0.1 --with-homedir=/var/spool/moab --with-torque=/opt/torque/default --with-modulefiles=/opt/modulefiles

Install Moab

While still in xtopview, install Moab into the shared root. You may also need to link /opt/moab/default to this installation.

Example 16. Make Install

default/:/software/moab-7.2.0 # make install
default/:/software/moab-7.2.0 # ln -sf /opt/moab/7.0.1/ /opt/moab/default

Customize the Moab configuration file for your Moab server host

The moab.cfg file should be customized for your scheduling environment. We will use /rr/current/var/spool/moab as a temporary staging area before copying them out to their final destinations. See the Moab Admin Guide for more details about Moab configuration parameters.

Example 17. Edit the Moab configuration file

# cd /rr/current/var/spool/moab
# vi moab.cfg
SCHEDCFG[moab]       SERVER=sdb:42559
RMCFG[<clustername>] TYPE=TORQUE

NODEACCESSPOLICY     SINGLEJOB

NODEALLOCATIONPOLICY PRIORITY
NODECFG[DEFAULT]     PRIORITYF=-NODEINDEX

JOBMAXTASKCOUNT      <total number of processors>
MAXNODE              <total number of nodes>

By default, ALPS reports the compute nodes in a serialized topology order. TORQUE preserves this ordering by reporting a node_index on each compute node that represents the compute nodes' placement in the ALPS topology ordering. This information is then used by Moab to allocate nodes close to each other in the network. The downside to this is that the nodes can be become fragmented. The NODEALLOCATIONPOLICYPRIORITY parameter used with NODECFG[DEFAULT] PRIORITYF=-NODEINDEX tells Moab to allocate nodes based on the nodes' node_index reported by TORQUE beginning with the first nodes in the list (-1 x node_index of 1).

It is also possible to use the same node indexes reported by TORQUE to allocate strict contiguous sets of nodes. This is configured by specifying a NODEALLOCATIONPOLICY of CONTIGUOUS. In this mode, a job won't run until it can get a strict set of contiguous nodes.

When Moab allocates nodes on the Cray it must only get compute nodes. The purpose of the login nodes are to create and confirm ALPS reservations so that the job script can access the allocated compute nodes. Moab shifts the responsibility of selecting a login node for the allocated compute nodes to TORQUE. Because Moab doesn't allocate a login node with compute nodes, the login nodes must be kept separate from other compute nodes so that Moab doesn't allocate login nodes and compute nodes for the same job. This is accomplished by putting the login nodes in a separate partition. Moab does not allocate jobs across partitions. By default, Moab creates a partition for each RMCFG with the name given in the RMCFG parameter and sticks all the nodes reported by that resource manager in that partition.

With the login and compute nodes now separated, configure all jobs to request the compute partition by default.

Place each login node in a separate partition called login. For example:

NODECFG[login1] Partition=login
NODECFG[login2] Partition=login
NODECFG[login3] Partition=login
NODECFG[login4] Partition=login

Configure all jobs submitted through msub to request the compute node partition by default.

CLIENTCFG[DEFAULT] DEFAULTSUBMITPARTITION=<clustername>

Configure all jobs submitted through qsub to request the compute node partition by default.

qmgr -c "set server resources_default.partition=<clustername>"

Login nodes can be requested to run jobs that don't require Cray compute nodes (for example, compute jobs or data transfer jobs). These jobs can be submitted to the login partition (for example, qsub -l partition=login).

Copy your Moab home directory to your Moab server host

In this example we assume the Moab server will be running on the SDB node. If you are installing Moab with its server home in /var as in this example and assuming that your var file system is being served from your boot node under /snv, you will need to login to SDB and determine the nid with cat /proc/cray_xt/nid.

Example 18. Copy out Moab home directory. This example assumes that the SDB is NID 3.

# cd /rr/current/var/spool
# cp -pr moab /snv/3/var/spool

Copy the Moab configuration file to all of the login nodes

The Moab configuration file (moab.cfg) must be copied out to the /var file system on the login nodes. The essential parameters that must be in the moab.cfg on the login nodes are the SCHEDCFG line so the clients can find the server and any client-specific parameters, such as the CLIENTCFG parameter.

Example 19. Copy out the configuration files.

# cd /rr/current/var/spool/moab
# for i in 4 64 68; do mkdir -p /snv/$i/var/spool/moab/etc /snv/$i/var/spool/moab/log; cp moab.cfg /snv/$i/var/spool/moab; done

Install the Moab init.d script (Optional)

Moab provides an init.d script for starting Moab as a service. Using xtopview into the SDB node, copy the init script into /etc/init.d.

Example 20. Copy in init.d script to the SDB node from the shared root.

# xtopview -n <sdb nid>
node/<sdb nid>:/ # cp /software/moab/moab-7.2/contrib/init.d/moab_sles_init /etc/init.d/moab
node/<sdb nid>:/ # chkconfig --add /etc/init.d/moab

Edit the init.d file as necessary -- i.e. retain core files, etc.

Uncomment the following line to retain core dump files

ulimit -c unlimited # Uncomment to preserve core files

Perform the following steps from the Moab server node (sdb) as root:

Set the proper environment

The MOABHOMEDIR environment variable must be set in your environment when starting Moab or using Moab commands. If you are on a system with a large number of nodes (thousands), you will need to increase your stack limit to unlimited. You will also want to adjust your path to include the Moab and TORQUE bin and sbin directories. The proper environment can be established by loading the appropriate Moab module, by sourcing properly edited login files, or by directly modifying your environment variables.

Example 21. Loading the Moab module

# module load moab

Example 22. Exporting the environment variables by hand (in bash)

# export MOABHOMEDIR=/var/spool/moab
# export PATH=$PATH:/opt/moab/default/bin:/opt/moab/default/sbin:/opt/torque/default/bin:/opt/torque/default/sbin
# export MOABHOMEDIR=/var/spool/moab
# export PATH=$PATH:/opt/moab/default/bin:/opt/moab/default/sbin:
/opt/torque/default/bin:/opt/torque/default/sbin

Example 23. Setting the stack limit to unlimited

If you are running on a system with large numbers of nodes (thousands), you may need to increase the stack size user limit to unlimited. This should be set in the shell from which Moab is launched. If you start Moab via an init script, this should be set in the script, otherwise it would be recommended to put this in the appropriate shell startup file for root.

# ulimit -s unlimited

Startup the Moab Workload Manager

Start up the Moab daemon.

Example 24. Start Moab

# /opt/moab/default/sbin/moab

Alternatively, if you installed the init.d script, you may run:

# service moab start

Running Moab and PBS Server outside the Cray network

Previously, Moab and TORQUE had to be run inside the Cray network. With the new model, it is now possible to run Moab and pbs_server outside of the Cray. This provides benefits of having Moab and pbs_server on bigger hardware other than that provided in the service nodes and enables the use of Moab and TORQUE's high-availability features. Also, jobs can be submitted and queued up if the Cray is down. In order to set up Moab and TORQUE to run on an external node, pbs_server must be able to communicate with the login nodes inside the Cray on ports 15002 and 15003 and the login nodes must be able to communicate with the pbs_server on ports 15001.

# [root@ext-server /]# telnet login1 15002
Trying XXX.XXX.XXX.XXX...
Connected to login1
Escape character is '^]'.

# [root@ext-server /]# telnet login1 15003
Trying XXX.XXX.XXX.XXX...
Connected to login1
Escape character is '^]'.

# [root@login1 /]# telnet ext-server 15001
Trying XXX.XXX.XXX.XXX...
Connected to ext-server
Escape character is '^]'.

System Reservations

System reservations can be done several ways. 1) Just compute nodes can be reserved leaving the login nodes available for executing non-compute jobs.

SRCFG[PM] TASKCOUNT=7832 
SRCFG[PM] HOSTLIST=!login
SRCFG[PM] PERIOD=DAY DAYS=TUE
SRCFG[PM] FLAGS=OWNERPREEMPT
SRCFG[PM] STARTTIME=8:00:00 ENDTIME=14:00:00
SRCFG[PM] JOBATTRLIST=PREEMPTEE
SRCFG[PM] TRIGGER=EType=start, Offset=300,AType=internal,Action="rsv::modify:acl:jattr-=PREEMPTEE"
SRCFG[PM] TRIGGER=EType=start,Offset=-60,AType=jobpreempt,Action="cancel"

2) Just the login nodes can be reserved leaving just the compute nodes available for execution and

SRCFG[PM] TASKCOUNT=192
SRCFG[PM] HOSTLIST=login
SRCFG[PM] PERIOD=DAY DAYS=TUE
SRCFG[PM] FLAGS=OWNERPREEMPT
SRCFG[PM] STARTTIME=8:00:00 ENDTIME=14:00:00
SRCFG[PM] JOBATTRLIST=PREEMPTEE
SRCFG[PM] TRIGGER=EType=start, Offset=300,AType=internal,Action="rsv::modify:acl:jattr-=PREEMPTEE"
SRCFG[PM] TRIGGER=EType=start,Offset=-60,AType=jobpreempt,Action="cancel"

3) Reserving the whole system, preventing any kind of job from starting.

SRCFG[PM] HOSTLIST=ALL
SRCFG[PM] PERIOD=DAY DAYS=TUE
SRCFG[PM] FLAGS=OWNERPREEMPT
SRCFG[PM] STARTTIME=8:00:00 ENDTIME=14:00:00
SRCFG[PM] JOBATTRLIST=PREEMPTEE
SRCFG[PM] TRIGGER=EType=start, Offset=300,AType=internal,Action="rsv::modify:acl:jattr-=PREEMPTEE"
SRCFG[PM] TRIGGER=EType=start,Offset=-60,AType=jobpreempt,Action="cancel"

Setting up the Cray for Use with External Nodes

This feature works with TORQUE 4.1.2 or later.

Set up the Cray as described previously. In Moab, do not place the login nodes in a separate partition. Add the external nodes to the nodes file (<TORQUEHOME>/server_priv/nodes) with a feature named external.

<hostname>  np=X external

In pbsnodes or mdiag -n -v output the login nodes have the feature alps_login and the Cray compute nodes have the cray_compute feature. These are automatically added by pbs_server.

To request a heterogeneous job:

> qsub job_script.sh -l nodes=X:cray_compute+Y:external

In the example above, Moab assigns X number of Cray compute nodes and Y number of external nodes and passes the information appropriately to pbs_server.

To request a cray only job:

> qsub job.script.sh -l nodes=X:cray_compute

To request an external only job:

> qsub job_script.sh -l nodes=X:external

To request a login-only job (such as a job that compiles the code to be run on Cray compute nodes):

> qsub job_script.sh -l nodes=X:alps_login

In this setup, all job requests must request features for all jobs. If this doesn't happen, Moab might try to schedule the login nodes for running a job or other issues might occur. This should be enforced by setting defaults or by using a submit filter (see the Applying the msub Submit Filter section for more information)

Writing job scripts for heterogeneous (Cray and external node) jobs

Once configured as above, the same job script is launched on the external nodes and on the Cray. In order for this job to function properly, the job script must detect whether or not it is on the Cray, and then execute the appropriate commands. We recommend that the script inspect the contents of $PBS_NODEFILE. This file, whose path is contained in the variable $PBS_NODEFILE, contains a list of the host names on which the job is executing, with one host name per line. The first line is the host name of the node on which the script is executing. The script should simply read this line, decide if that is a node external to or inside the Cray, and then execute appropriately.

Submitting Jobs

There are three different ways to submit jobs to the Cray. Each way works in its own way and shouldn't be mixed with other methods.

-l nodes=
-l size=
-l mpp*=

-l nodes= is the standard way of submitting jobs to Moab. It is the recommended way since this is the most supported and standard way of submitting jobs among all types of systems run by Moab. One benefit of the -l nodes= syntax is that you can submit multi-req jobs (ex., -l nodes=2:ppn=3+3:ppn=2). When using the -l nodes= syntax, TORQUE should be compiled with the -DCRAY_MOAB_PASSTHRU option. By default, -l nodes= requests the number of processors, not nodes. If you want -l nodes= to request nodes, add JOBNODEMATCHPOLICY EXACTNODE to your moab.cfg.

-l size= was created to be a very simple interface for submitting to the Cray. It requests the number of one-proc tasks to submit on the Cray. Customers that use this option usually have a submit filter that verifies that the number of tasks requested is a multiple of the number of processors per node and rejects the submission if it isn't.

-l mmp*= is standard among Cray systems, which is known among Moab/TORQUE and PBS-run systems. Most of the mpp options have an equivalent -l nodes= option.

mppwidth
Format	<INTEGER>
Default	---
Description	Number of tasks.
Example	qsub -l mppwidth=48 Requests 48 tasks of 1 processor each.
Equivalent	qsub -l nodes=4

mppnppn
Format	<INTEGER>
Default	---
Description	Number of tasks per node.
Example	qsub -l mppwidth=48,mppnppn=8 Requests 48 tasks with 8 tasks per node.
Equivalent	qsub -l nodes=48:ppn=8

mppdepth
Format	<INTEGER>
Default	---
Description	Number of processors per task.
Example	qsub -l mppwidth=24,mppdepth=2,mppnppn=8 Requests 24 tasks with 2 processors per task with 8 tasks per node.

mpphost
Format	<STRING>[:<STRING>]...
Default	---
Description	Partition access list.
Example	qsub -l mpphost=cray Specifies that the job must run on the cray partition.
Equivalent	qsub -l partition=cray

mpparch
Format	<STRING>
Default	---
Description	Required architecture.
Example	qsub -l mpparch=xt5 Specifies that the job must run on the xt5 architecture nodes.
Equivalent	qsub -l arch=xt5

mppmem
Format	<INTEGER>[kb\|mb\|gb]
Default	---
Description	Dedicated memory per task in bytes.
Example	qsub -l mppmem=200mb Specifies that the job requires 200mb per task.
Equivalent	qsub -l mem=200mb

mpplabels
Format	<FEATURE>[:<FEATURE>]...
Default	---
Description	Required list of node features.
Example	qsub -l mpplabels=featureA:featureB Specifies that the job should run on nodes that have both the featureA and featureB features.
Equivalent	qsub -l feature=featureA:featureB

mppnodes
Format	'+' delimited list of host names
Default	---
Description	Indicates an exact set, superset, or subset of nodes on which the job must run. Use the caret (^) or asterisk (*) characters to specify a host list as superset or subset respectively. A subset means the specified host list is used first to select hosts for the job. If the job requires more hosts than are in the host list, they will be obtained from elsewhere if possible. If the job does not require all of the jobs in the host list, it will use only the ones it needs. A superset means the host list is the only source of hosts that should be considered for running the job. If the job can't find the necessary resources in the hosts in this list it should not run. No other hosts should be considered in allocating the job.
Example	qsub -l mppnodes=*512+513 Specifies that the job should use nodes 512 and 513.
Equivalent	qsub -l hostlist=*512+513

The mppnppn and mppwidth parameters work with CLASSCFG MIN.NODE and MAX.NODE to filter classes based on how many nodes the job will use. The other parameters have not been tested.

Launching jobs from designated login nodes (Optional)

It is possible to direct a job to launch from a specific login node. This is done by assigning node features to specific login nodes and requesting these features at submission time with the -W login_property option. The login_property has no influence on which compute nodes are allocated to the job.

Example 25. Declaring MOM features

For example, if login2 had the himem feature, a job could request that its job launch from login2 rather than login1.

# vi /var/spool/torque/server-priv/nodes
login1 alps_login np=200 
login2 alps_login np=200 himem

qsub -W login_property=himem

Setting up msub

Point Moab to the qsub binary on the server where Moab is running (ex. sdb).

RMCFG[] SUBMITCMD=/opt/torque/default/bin/qsub

Setup Moab to schedule nodes when -l nodes is requested.

JOBNODEMATCHPOLICY EXACTNODE

Because Moab uses qsub to submit msub'd jobs, qsub must be configured to not validate the path of the working directory on the sdb as they don't exist on the sdb. (ex. msub -d /users/jdoe/tmpdir). Add VALIDATEPATHFALSE to the torque.cfg on the sdb.

As of TORQUE 2.4.11, the node count and the processors per node count can be obtained in the job's environment by using $PBS_NUM_NODES and $PBS_NUM_PPN respectively. This aids in mapping the requested nodes to aprun calls. For example, the general format for calling aprun within a job script is: aprun -n $(($PBS_NUM_NODES * $PBS_NUM_PPN)) -N $PBS_NUM_PPN

Example submissions:

#PBS -l nodes=1:ppn=16
aprun -n 16 -N 16 hostname

#PBS -l nodes=20
aprun -n 20 hostname

#PBS -l nodes=20:ppn=16
aprun -n 320 -N 16 hostname

#PBS -l nodes=2:ppn=16
#PBS -l hostlist=35+36
aprun -n 32 -N 16 hostname

#PBS -l procs=64
aprun -n 64 hostname

#run on login nodes only
#PBS -l procs=0

Interactive Jobs

By default, interactive jobs run from the login node that they are submitted from, which can be useful if you need that node's particular features. This default behavior can be changed using the TORQUE server parameter interactive_jobs_can_roam. When set to TRUE, this parameter allows interactive jobs to run on login nodes other than the one where the jobs were submitted from. For an interactive job submitted from a different node to run on it, a node must have the alps_login property set in the nodes file.

qmgr -c 'set server interactive_jobs_can_roam = True'

You can disable the behavior of interactive_jobs_can_roam = True for an individual interactive job by submitting it with the option -W login_property=<nodeId>. The job will run on node <nodeId> and will not roam.

Job Information

In TORQUE 4.1, Cray compute nodes are treated as TORQUE nodes. This also applies to job allocations. For example, a job can know which nodes were allocated to the job by cat'ing $PBS_NODEFILE.

$ qsub -l -l size=64,walltime=5:00
qsub: waiting for job 3043.cray to start
qsub: job 3043.cray ready

$ cat $PBS_NODEFILE | wc -l
64
$ cat $PBS_NODEFILE | uniq
2876
2877

You can also view where your job is running by looking at exec_host in qsat -f output.

$ qstat -f 3043 | grep -A 7 exec_host
 exec_host = 2876/31+2876/30+2876/29+2876/28+2876/27+2876/26+2876/25+2876/2
4+2876/23+2876/22+2876/21+2876/20+2876/19+2876/18+2876/17+2876/16+2876
/15+2876/14+2876/13+2876/12+2876/11+2876/10+2876/9+2876/8+2876/7+2876/
6+2876/5+2876/4+2876/3+2876/2+2876/1+2876/0+2877/31+2877/30+2877/29+28
77/28+2877/27+2877/26+2877/25+2877/24+2877/23+2877/22+2877/21+2877/20+
2877/19+2877/18+2877/17+2877/16+2877/15+2877/14+2877/13+2877/12+2877/1
1+2877/10+2877/9+2877/8+2877/7+2877/6+2877/5+2877/4+2877/3+2877/2+2877
/1+2877/0

The login node is not included in $PBS_NODEFILE or exec_host. It can be viewed through qstat -f on the login_node_id field.

$ qstat -f 3043 | grep login_node_id
 login_node_id = login6/0

pbsnodes will report the job running on compute nodes under jobs = field for compute nodes.

$ pbsnodes 2878    
 state = job-exclusive,busy
 np = 32
 ntype = cluster
 jobs = 0/3043.cray, 1/3043.cray, 2/3043.cray, 3/3043.cray, 4/3043.cray, 5/3043.cray, 6/3043.cray, 7/3043.cray, 8/3043.cray, 9/3043.cray,
10/3043.cray, 11/3043.cray, 12/3043.cray, 13/3043.cray, 14/3043.cray, 15/3043.cray, 16/3043.cray, 17/3043.cray, 18/3043.cray, 19/3043.cray,
20/3043.cray, 21/3043.cray, 22/3043.cray, 23/3043.cray, 24/3043.cray, 25/3043.cray, 26/3043.cray, 27/3043.cray, 28/3043.cray, 29/3043.cray,
30/3043.cray, 31/3043.cray
 status = rectime=1340836785,node_index=2814,state=BUSY,totmem=33554432kb,CMEMORY=32768,APROC=0,CPROC=32,name=c1-
0c2s0n2,ARCH=XT
 mom_service_port = 15002
 mom_manager_port = 15003
 gpus = 0

pbsnodes will not present Cray compute nodes on the login node's "jobs =" field but will for login only jobs. Cray compute jobs can be seen in the "jobs=" attribute in the "status=" field.

$ pbsnodes login6
 state = free
 np =24
 properties = fa,fb
 ntype = cluster 
 jobs =  
 status = rectime=1340836830,varattr=,jobs=3043.cray,state=free,netload=3044161434,gres=,loadave=0.04,ncpus=6,physmem=
7934560kb,availmem=12820180kb,totmem=15934044kb,idletime=1754,nusers=3,nsessions=29,sessions=1506 1923 1179 2127 2163 2176
2187 2195 2262 2275 2308 5809 8123 8277 8675 9547 10515 10551 12769 32351 14430 14518 22380 24082 24321 24849 30918 32371
32718,uname=Linux cray 2.6.35-32-generic #67-Ubuntu SMP Mon Mar 5 19:39:49 UTC 2012 x86_64,opsys=linux
 mom_service_port = 34001
 mom_manager_port = 34002
 gpus = 0