(Click to open topic with navigation)
Copyright © 2012 Adaptive Computing Enterprises, Inc.
This document provides information on the steps to install Moab 7.2.0 and TORQUE 4.1.0 on a Cray XT system.
Moab and TORQUE can be used to manage the batch system for Cray. This document describes how to configure Moab and TORQUE to bring Moab's unmatched scheduling capabilities to the Cray.
New to TORQUE 4.1, TORQUE now handles all communication with ALPS, specifically the pbs_mom. Previously, communication with ALPS was handled by a combination of Moab, scripts and TORQUE. In the new model, Moab treats TORQUE as a regular TORQUE cluster without any special configuration. TORQUE now uses an extra MOM called the alps_reporter MOM to communicate with ALPS regarding configured and available resources. From the information reported by the alps_reporter mom, TORQUE creates a virtual node for each Cray compute node. Previously, TORQUE only reported the login nodes.
Note: For clarity this document assumes that your SDB node is mounting a persistent /var file system from the bootnode. If you have chosen not to use persistent /var file systems please be aware that the instructions below would have to be modified for your situation.
When upgrading to TORQUE 4.1.0 and using the new Cray model as described in this document, there should be no running jobs. Jobs may be queued but not running.
Perform the following steps from the boot node as root:
Many of the following examples reflect a specific setup and must be modified to fit your unique configuration.
Download the latest TORQUE release.
Download the latest TORQUE release.
Example 1. Download TORQUE
# cd /rr/current/software # wget http://www.adaptivecomputing.com/resources/downloads/torque/torque-4.1.0.tar.gz
Unpack the TORQUE tarball in an xtopview session
Using xtopview, unpack the TORQUE tarball into the software directory in the shared root.
Example 2. Unpack TORQUE
# xtopview default/:/ # cd /software default/:/software # tar -zxvf torque-4.1.0.tar.gz
While still in xtopview, run configure with the options set appropriately for your installation. Run ./configure --help to see a list of configure options. Adaptive Computing recommends installing the TORQUE binaries into /opt/torque/$version and establishing a symbolic link to it from /opt/torque/default. At a minimum, you will need to specify the host name where the TORQUE server will run (--with-default-server) if it is different from the host it is being compiled on. The TORQUE server has typically been on the SDB node of your Cray system.
Example 3. Run configure
default/:/software # cd torque-4.1.0 default/:/software/torque-4.1.0 # ./configure --prefix=/opt/torque/4.1.0 --with-server-home=/var/spool/torque --with-default-server=sdb --enable-syslog --disable-gcc-warnings --with-debug --with-modulefiles=/opt/modulefiles --with-job-create CFLAGS="-DCRAY_MOAB_PASSTHRU"
Note: The --with-job-create is a change for TORQUE 2.5.9 onwards. This is not necessary on 2.4.16. Sites running TORQUE 2.5.x should upgrade to 2.5.9 or later.
Note: The -DCRAY_MOAB_PASSTHRU option tells TORQUE to not validate the qsub -l nodes syntax. For more information, see Submitting Jobs.
You must unload the current module:
# module unload moab torque
As xtopview may also load the old version of Moab and TORQUE, it is good practice to unload and load after the install so that you have the correct binary in your path.
While still in xtopview, compile and install TORQUE into the shared root. Create a link to the installed TORQUE. Exit xtopview.
Example 4. Make and Make Install
default/:/software/torque-4.1.0 # make default/:/software/torque-4.1.0 # make packages default/:/software/torque-4.1.0 # make install default/:/software/torque-4.1.0 # ln -sf /opt/torque/4.1.0/ /opt/torque/default default/:/software/torque-4.1.0 # exit
After installing, run module list to see what versions you have. If the versions are incorrect, unload and load to confirm you are using the correct versions.
Copy your TORQUE server directory to your moab server host
Example 5. On the boot node, copy the TORQUE home directory to the SDB node's persistent /var file system (as exported from the bootnode). This example assumes that the SDB is NID 3 and that you are installing it on the SDB. These instructions need to be modified if the Moab and TORQUE servers are being installed on a different node.
# cd /rr/current/var/spool # cp -pr torque /snv/3/var/spool
Set up pbs_server to be Cray compatible
Customize the nodes file located in <TORQUE HOME>/server_priv/nodes.
sdb alps_reporter
We recommend that you set up the SDB node as the ALPS reporter. Setting the NP for this node isn't important because this node will not appear in pbsnodes output and, therefore, will not be scheduled to run jobs.
Identify all login nodes using the reserved feature alps_login.
login1 alps_login np=X <other features>
login2 alps_login np=Y <other features>
login3 alps_login np=Z <other features>
...
Identifying these moms as login nodes allows pbs_server to verify that each job has a login node as its mother superior. It also tells pbs_server to place size=0 jobs on one of these login nodes.
$ qmgr -c 'set server acl_host_enable=true'
$ qmgr -c 'set server acl_hosts+=login1'
$ qmgr -c 'set server acl_hosts+=login2'
$ qmgr -c 'set server acl_hosts+=login3'
$ qmgr -c 'set server submit_hosts+=login1'
$ qmgr -c 'set server submit_hosts+=login2'
$ qmgr -c 'set server submit_hosts+=login3'
$ qmgr -c 'set server scheduling = true'
This parameter tells pbs_server to notify Moab when pertinent events have happened. If this isn't set, Moab will automatically set it on startup.
$ qmgr -c 'set server keep_completed = 300'
This tells TORQUE to keep information about completed jobs for 300 seconds (5 minutes) after they have completed. You can customize this number to meet your site's needs.
$ qmgr -c 'set server cray_enabled = true'
After using qmgr to set this variable, you will need to restart so that when pbs_server parses the nodes file, it will know it is Cray-enabled.
Install the pbs_server init.d script on the server (Optional)
TORQUE provides an init.d script for starting pbs_server as a service.
Example 6. Copy in init.d script
# xtopview -n <sdb nid> node/<sdb nid>:/ # cp /software/torque-4.1.0/contrib/init.d/suse.pbs_server /etc/init.d node/<sdb nid>:/ # chmod +x /etc/init.d/pbs_server node/<sdb nid>:/ # chkconfig --add pbs_server
Edit the init.d file as necessary -- i.e., change PBS_DAEMON and PBS_HOME as appropriate.
# vi /etc/init.d/pbs_server PBS_DAEMON=/opt/torque/default/sbin/pbs_server PBS_HOME=/var/spool/torque
Install the pbs_mom init.d script on the login nodes (Optional)
TORQUE provides an init.d script for starting pbs_mom as a service.
Example 7. Copy in init.d script
# xtopview default/:/ # cp /software/torque-4.1.0/contrib/init.d/suse.pbs_mom /etc/init.d default/:/ # chmod +x /etc/init.d/pbs_mom default/:/ # chkconfig --add pbs_mom
Edit the init.d file as necessary -- i.e. change PBS_DAEMON and PBS_HOME as appropriate, retain core files, etc.
# vi /etc/init.d/pbs_mom PBS_DAEMON=/opt/torque/default/sbin/pbs_mom PBS_HOME=/var/spool/torque
Uncomment the following line to retain core dump files:
ulimit -c unlimited # Uncomment this to preserve core files
Install the trqauthd init.d script on all TORQUE nodes and the SDB (Optional)
Torque provides an init.d script for starting trqauthd as a service.
Example 8. Copy in init.d script
# xtopview default/:/ # cp /software/torque-4.1.0/contrib/init.d/suse.trqauthd /etc/init.d default/:/ # chmod +x /etc/init.d/trqauthd default/:/ # chkconfig --add trqauthd
Edit the init.d file as necessary -- i.e. change PBS_DAEMON and PBS_HOME as appropriate.
# vi /etc/init.d/trqauthd PBS_DAEMON=/opt/torque/default/sbin/trqauthd PBS_HOME=/var/spool/torque
Stage out MOM dirs to login nodes
Stage out the MOM dirs and client server info on all login nodes. This example assumes you are using a persistent /var file systems mounted from /snv on the boot node. Alternatively, a ram var file system must be populated by a skeleton tarball on the bootnode (/rr/current/.shared/var-skel.tgz) into which these files must be added. The example below assumes that you have 3 login nodes with nids of 4, 64 and 68. Place the host name of the SDB node in the server_name file.
Example 9. Copy out MOM dirs and client server info
# cd /rr/current/software/torque-4.1.0/tpackages/mom/var/spool # for i in 4 64 68: \
do cp -pr torque /snv/$i/var/spool; \ echo sdb > /snv/$i/var/spool/torque/server_name; \ done
Note: It is possible that the host name for the SDB node is not set to SDB on your system. Run ssh sdb hostname to determine the host name in use. If the command returns, for example, sdb-p1, modify the "for loop" above to echo sdb-p1 into the server_name file.
Update the TORQUE MOM config file for the ALPS reporter mom
In the above steps, we identified the ALPS reporter MOM on the pbs_server. We now need to configure the MOM to be the ALPS reporter mom. The ALPS reporter MOM is installed on the SDB. To configure the ALPS reporter mom, set the following in the pbs_mom config file on the SDB:
# vi var/spool/torque/mom_priv/config $reporter_mom true # defaults to false
You may also wish to set these variables:
$apbasil_path <path_to_apbasil> # defaults to /usr/bin/apbasil if not set
$apbasil_protocol <protocol> # defaults to 1.0 if not set
As of CLE 5.0, apbasil is in the /opt/cray/alps/default/bin/apbasil
directory, not /usr/bin/apbasil
. Supported apbasil protocols are 1.0, 1.1, and 1.2.
Cray systems do not support GPUs until ALPS version 1.2. Setting $apbasil_protocol 1.2 in mom_priv/config causes the GPU status to appear in the pbsnodes output.
Update the TORQUE MOM config file on each login node
Login nodes are service nodes running pbs_moms which are used for submission and launching of job scripts. Login nodes are responsible for creating and confirming ALPS reservations so that the script launched on a login node can access the compute nodes with the aprun command.
Edit the MOM config file so job output is copied to locally mounted directories.
Example 10. Edit the MOM config file
# vi var/spool/torque/mom_priv/config $usecp *:/home/users /home/users $usecp *:/scratch /scratch $login_node true
$login_node specifies that this node will create and confirm ALPS reservations.
Note: It may be acceptable to use a $usecp *:/ / in place of the sample above. Consult with the site.
You may also wish to set these variables:
$apbasil_path <path_to_apbasil> # defaults to /usr/bin/apbasil if not set
$apbasil_protocol <protocol> # defaults to 1.0 if not set
As of CLE 5.0, apbasil is in the /opt/cray/alps/default/bin/apbasil
directory, not /usr/bin/apbasil
. Supported apbasil protocols are 1.0, 1.1, and 1.2.
Start up the TORQUE MOM Daemons
On the boot node as root:
Example 11. Start up the pbs_moms on the login nodes.
# pdsh -w sdb,login[1-3] /opt/torque/default/sbin/pbs_mom # pdsh -w login[1-3] trqauthd
Alternatively, if you installed the init.d script, you may run:
# pdsh -w sdb,login[1-3] /sbin/service pbs_mom start # pdsh -w login[1-3] service trqauthd start
On the TORQUE server host as root:
Example 12. Start pbs_server
# /opt/torque/default/sbin/pbs_server # /opt/torque/default/sbin/trqauthd
Alternatively, if you installed the init.d script, you may run:
# service pbs_server start # service trqauthd start
Perform the following steps from the boot node as root:
Download the latest Moab release
Download the latest Moab release from Adaptive Computing Enterprises, Inc.
The correct tarball to install is the plain Moab & TORQUE builds. The XT4 builds are for releases prior to TORQUE 4.1.0 and MWM 7.2.0.
Example 13. Download Moab to the boot node
# cd /rr/current/software wget --post-data="username=<username>&password=<password>&submit=submit&url=/download/mwm/moab-7.2.0-linux-x86_64-torque.tar.gz" https://www.adaptivecomputing.com/myaccount/login.php;
Using xtopview, unpack the Moab tarball into the software directory in the shared root.
Example 14. Unpack Moab
# xtopview default/:/ # cd /software default/:/software # tar -zxvf moab-7.2.0-linux-x86_64-torque.tar.gz
While still in xtopview, run configure with the options set appropriately for your installation. Run ./configure —help to see a list of configure options. Adaptive Computing recommends installing the Moab binaries into /opt/moab/$version and establishing a symbolic link to it from /opt/moab/default. Since the Moab home directory must be read-write by root, Adaptive Computing recommends you specify the homedir in a location such as /var/spool/moab.
Example 15. Run configure
default/:/software # cd moab-7.2.0 default/:/software/moab-7.2.0 # ./configure --prefix=/opt/moab/7.0.1 --with-homedir=/var/spool/moab --with-torque=/opt/torque/default --with-modulefiles=/opt/modulefiles
While still in xtopview, install Moab into the shared root. You may also need to link /opt/moab/default to this installation.
Example 16. Make Install
default/:/software/moab-7.2.0 # make install default/:/software/moab-7.2.0 # ln -sf /opt/moab/7.0.1/ /opt/moab/default
Customize the Moab configuration file for your Moab server host
The moab.cfg file should be customized for your scheduling environment. We will use /rr/current/var/spool/moab as a temporary staging area before copying them out to their final destinations. See the Moab Admin Guide for more details about Moab configuration parameters.
Example 17. Edit the Moab configuration file
# cd /rr/current/var/spool/moab # vi moab.cfg SCHEDCFG[moab] SERVER=sdb:42559 RMCFG[<clustername>] TYPE=TORQUE
NODEACCESSPOLICY SINGLEJOB
NODEALLOCATIONPOLICY PRIORITY NODECFG[DEFAULT] PRIORITYF=-NODEINDEX JOBMAXTASKCOUNT <total number of processors> MAXNODE <total number of nodes>
By default, ALPS reports the compute nodes in a serialized topology order. TORQUE preserves this ordering by reporting a node_index on each compute node that represents the compute nodes' placement in the ALPS topology ordering. This information is then used by Moab to allocate nodes close to each other in the network. The downside to this is that the nodes can be become fragmented. The NODEALLOCATIONPOLICYPRIORITY parameter used with NODECFG[DEFAULT] PRIORITYF=-NODEINDEX tells Moab to allocate nodes based on the nodes' node_index reported by TORQUE beginning with the first nodes in the list (-1 x node_index of 1).
It is also possible to use the same node indexes reported by TORQUE to allocate strict contiguous sets of nodes. This is configured by specifying a NODEALLOCATIONPOLICY of CONTIGUOUS. In this mode, a job won't run until it can get a strict set of contiguous nodes.
When Moab allocates nodes on the Cray it must only get compute nodes. The purpose of the login nodes are to create and confirm ALPS reservations so that the job script can access the allocated compute nodes. Moab shifts the responsibility of selecting a login node for the allocated compute nodes to TORQUE. Because Moab doesn't allocate a login node with compute nodes, the login nodes must be kept separate from other compute nodes so that Moab doesn't allocate login nodes and compute nodes for the same job. This is accomplished by putting the login nodes in a separate partition. Moab does not allocate jobs across partitions. By default, Moab creates a partition for each RMCFG with the name given in the RMCFG parameter and sticks all the nodes reported by that resource manager in that partition.
With the login and compute nodes now separated, configure all jobs to request the compute partition by default.
Place each login node in a separate partition called login. For example:
NODECFG[login1] Partition=login NODECFG[login2] Partition=login NODECFG[login3] Partition=login NODECFG[login4] Partition=login
Configure all jobs submitted through msub to request the compute node partition by default.
CLIENTCFG[DEFAULT] DEFAULTSUBMITPARTITION=<clustername>
Configure all jobs submitted through qsub to request the compute node partition by default.
qmgr -c "set server resources_default.partition=<clustername>"
Login nodes can be requested to run jobs that don't require Cray compute nodes (for example, compute jobs or data transfer jobs). These jobs can be submitted to the login partition (for example, qsub -l partition=login).
Copy your Moab home directory to your Moab server host
In this example we assume the Moab server will be running on the SDB node. If you are installing Moab with its server home in /var as in this example and assuming that your var file system is being served from your boot node under /snv, you will need to login to SDB and determine the nid with cat /proc/cray_xt/nid.
Example 18. Copy out Moab home directory. This example assumes that the SDB is NID 3.
# cd /rr/current/var/spool # cp -pr moab /snv/3/var/spool
Copy the Moab configuration file to all of the login nodes
The Moab configuration file (moab.cfg) must be copied out to the /var file system on the login nodes. The essential parameters that must be in the moab.cfg on the login nodes are the SCHEDCFG line so the clients can find the server and any client-specific parameters, such as the CLIENTCFG parameter.
Example 19. Copy out the configuration files.
# cd /rr/current/var/spool/moab # for i in 4 64 68; do mkdir -p /snv/$i/var/spool/moab/etc /snv/$i/var/spool/moab/log; cp moab.cfg /snv/$i/var/spool/moab; done
Install the Moab init.d script (Optional)
Moab provides an init.d script for starting Moab as a service. Using xtopview into the SDB node, copy the init script into /etc/init.d.
Example 20. Copy in init.d script to the SDB node from the shared root.
# xtopview -n <sdb nid> node/<sdb nid>:/ # cp /software/moab/moab-7.2/contrib/init.d/moab_sles_init /etc/init.d/moab node/<sdb nid>:/ # chkconfig --add /etc/init.d/moab
Edit the init.d file as necessary -- i.e. retain core files, etc.
Uncomment the following line to retain core dump files
ulimit -c unlimited # Uncomment to preserve core files
Perform the following steps from the Moab server node (sdb) as root:
The MOABHOMEDIR environment variable must be set in your environment when starting Moab or using Moab commands. If you are on a system with a large number of nodes (thousands), you will need to increase your stack limit to unlimited. You will also want to adjust your path to include the Moab and TORQUE bin and sbin directories. The proper environment can be established by loading the appropriate Moab module, by sourcing properly edited login files, or by directly modifying your environment variables.
Example 21. Loading the Moab module
# module load moab
Example 22. Exporting the environment variables by hand (in bash)
# export MOABHOMEDIR=/var/spool/moab # export PATH=$PATH:/opt/moab/default/bin:/opt/moab/default/sbin:/opt/torque/default/bin:/opt/torque/default/sbin # export MOABHOMEDIR=/var/spool/moab # export PATH=$PATH:/opt/moab/default/bin:/opt/moab/default/sbin: /opt/torque/default/bin:/opt/torque/default/sbin
Example 23. Setting the stack limit to unlimited
If you are running on a system with large numbers of nodes (thousands), you may need to increase the stack size user limit to unlimited. This should be set in the shell from which Moab is launched. If you start Moab via an init script, this should be set in the script, otherwise it would be recommended to put this in the appropriate shell startup file for root.
# ulimit -s unlimited
Startup the Moab Workload Manager
Start up the Moab daemon.
Example 24. Start Moab
# /opt/moab/default/sbin/moab
Alternatively, if you installed the init.d script, you may run:
Previously, Moab and TORQUE had to be run inside the Cray network. With the new model, it is now possible to run Moab and pbs_server outside of the Cray. This provides benefits of having Moab and pbs_server on bigger hardware other than that provided in the service nodes and enables the use of Moab and TORQUE's high-availability features. Also, jobs can be submitted and queued up if the Cray is down. In order to set up Moab and TORQUE to run on an external node, pbs_server must be able to communicate with the login nodes inside the Cray on ports 15002 and 15003 and the login nodes must be able to communicate with the pbs_server on ports 15001.
# [root@ext-server /]# telnet login1 15002 Trying XXX.XXX.XXX.XXX... Connected to login1 Escape character is '^]'.
# [root@ext-server /]# telnet login1 15003 Trying XXX.XXX.XXX.XXX... Connected to login1 Escape character is '^]'.
# [root@login1 /]# telnet ext-server 15001 Trying XXX.XXX.XXX.XXX... Connected to ext-server Escape character is '^]'.
System reservations can be done several ways. 1) Just compute nodes can be reserved leaving the login nodes available for executing non-compute jobs.
SRCFG[PM] TASKCOUNT=7832 SRCFG[PM] HOSTLIST=!login SRCFG[PM] PERIOD=DAY DAYS=TUE SRCFG[PM] FLAGS=OWNERPREEMPT SRCFG[PM] STARTTIME=8:00:00 ENDTIME=14:00:00 SRCFG[PM] JOBATTRLIST=PREEMPTEE SRCFG[PM] TRIGGER=EType=start, Offset=300,AType=internal,Action="rsv::modify:acl:jattr-=PREEMPTEE" SRCFG[PM] TRIGGER=EType=start,Offset=-60,AType=jobpreempt,Action="cancel"
2) Just the login nodes can be reserved leaving just the compute nodes available for execution and
SRCFG[PM] TASKCOUNT=192 SRCFG[PM] HOSTLIST=login SRCFG[PM] PERIOD=DAY DAYS=TUE SRCFG[PM] FLAGS=OWNERPREEMPT SRCFG[PM] STARTTIME=8:00:00 ENDTIME=14:00:00 SRCFG[PM] JOBATTRLIST=PREEMPTEE SRCFG[PM] TRIGGER=EType=start, Offset=300,AType=internal,Action="rsv::modify:acl:jattr-=PREEMPTEE" SRCFG[PM] TRIGGER=EType=start,Offset=-60,AType=jobpreempt,Action="cancel"
3) Reserving the whole system, preventing any kind of job from starting.
SRCFG[PM] HOSTLIST=ALL SRCFG[PM] PERIOD=DAY DAYS=TUE SRCFG[PM] FLAGS=OWNERPREEMPT SRCFG[PM] STARTTIME=8:00:00 ENDTIME=14:00:00 SRCFG[PM] JOBATTRLIST=PREEMPTEE SRCFG[PM] TRIGGER=EType=start, Offset=300,AType=internal,Action="rsv::modify:acl:jattr-=PREEMPTEE" SRCFG[PM] TRIGGER=EType=start,Offset=-60,AType=jobpreempt,Action="cancel"
This feature works with TORQUE 4.1.2 or later.
Set up the Cray as described previously. In Moab, do not place the login nodes in a separate partition. Add the external nodes to the nodes file (<TORQUEHOME>/server_priv/nodes) with a feature named external.
<hostname> np=X external
In pbsnodes or mdiag -n -v output the login nodes have the feature alps_login and the Cray compute nodes have the cray_compute feature. These are automatically added by pbs_server.
To request a heterogeneous job:
> qsub job_script.sh -l nodes=X:cray_compute+Y:external
In the example above, Moab assigns X number of Cray compute nodes and Y number of external nodes and passes the information appropriately to pbs_server.
To request a cray only job:
> qsub job.script.sh -l nodes=X:cray_compute
To request an external only job:
> qsub job_script.sh -l nodes=X:external
To request a login-only job (such as a job that compiles the code to be run on Cray compute nodes):
> qsub job_script.sh -l nodes=X:alps_login
In this setup, all job requests must request features for all jobs. If this doesn't happen, Moab might try to schedule the login nodes for running a job or other issues might occur. This should be enforced by setting defaults or by using a submit filter (see the Applying the msub Submit Filter section for more information)
Once configured as above, the same job script is launched on the external nodes and on the Cray. In order for this job to function properly, the job script must detect whether or not it is on the Cray, and then execute the appropriate commands. We recommend that the script inspect the contents of $PBS_NODEFILE. This file, whose path is contained in the variable $PBS_NODEFILE, contains a list of the host names on which the job is executing, with one host name per line. The first line is the host name of the node on which the script is executing. The script should simply read this line, decide if that is a node external to or inside the Cray, and then execute appropriately.
There are three different ways to submit jobs to the Cray. Each way works in its own way and shouldn't be mixed with other methods.
-l nodes= is the standard way of submitting jobs to Moab. It is the recommended way since this is the most supported and standard way of submitting jobs among all types of systems run by Moab. One benefit of the -l nodes= syntax is that you can submit multi-req jobs (ex., -l nodes=2:ppn=3+3:ppn=2). When using the -l nodes= syntax, TORQUE should be compiled with the -DCRAY_MOAB_PASSTHRU option. By default, -l nodes= requests the number of processors, not nodes. If you want -l nodes= to request nodes, add JOBNODEMATCHPOLICY EXACTNODE to your moab.cfg.
-l size= was created to be a very simple interface for submitting to the Cray. It requests the number of one-proc tasks to submit on the Cray. Customers that use this option usually have a submit filter that verifies that the number of tasks requested is a multiple of the number of processors per node and rejects the submission if it isn't.
-l mmp*= is standard among Cray systems, which is known among Moab/TORQUE and PBS-run systems. Most of the mpp options have an equivalent -l nodes= option.
mppwidth | |
---|---|
Format | <INTEGER> |
Default | --- |
Description | Number of tasks. |
Example |
qsub -l mppwidth=48 Requests 48 tasks of 1 processor each. |
Equivalent |
qsub -l nodes=4 |
mppnppn | |
---|---|
Format | <INTEGER> |
Default | --- |
Description | Number of tasks per node. |
Example |
qsub -l mppwidth=48,mppnppn=8 Requests 48 tasks with 8 tasks per node. |
Equivalent |
qsub -l nodes=48:ppn=8 |
mppdepth | |
---|---|
Format | <INTEGER> |
Default | --- |
Description | Number of processors per task. |
Example |
qsub -l mppwidth=24,mppdepth=2,mppnppn=8 Requests 24 tasks with 2 processors per task with 8 tasks per node. |
mpparch | |
---|---|
Format | <STRING> |
Default | --- |
Description | Required architecture. |
Example |
qsub -l mpparch=xt5 Specifies that the job must run on the xt5 architecture nodes. |
Equivalent |
qsub -l arch=xt5 |
mppmem | |
---|---|
Format | <INTEGER>[kb|mb|gb] |
Default | --- |
Description | Dedicated memory per task in bytes. |
Example |
qsub -l mppmem=200mb Specifies that the job requires 200mb per task. |
Equivalent |
qsub -l mem=200mb |
The mppnppn and mppwidth parameters work with CLASSCFG MIN.NODE and MAX.NODE to filter classes based on how many nodes the job will use. The other parameters have not been tested.
It is possible to direct a job to launch from a specific login node. This is done by assigning node features to specific login nodes and requesting these features at submission time with the -W login_property option. The login_property has no influence on which compute nodes are allocated to the job.
Example 25. Declaring MOM features
For example, if login2 had the himem feature, a job could request that its job launch from login2 rather than login1.
# vi /var/spool/torque/server-priv/nodes login1 alps_login np=200 login2 alps_login np=200 himem
qsub -W login_property=himem
Point Moab to the qsub binary on the server where Moab is running (ex. sdb).
RMCFG[] SUBMITCMD=/opt/torque/default/bin/qsub
Setup Moab to schedule nodes when -l nodes is requested.
JOBNODEMATCHPOLICY EXACTNODE
Because Moab uses qsub to submit msub'd jobs, qsub must be configured to not validate the path of the working directory on the sdb as they don't exist on the sdb. (ex. msub -d /users/jdoe/tmpdir). Add VALIDATEPATHFALSE to the torque.cfg on the sdb.
As of TORQUE 2.4.11, the node count and the processors per node count can be obtained in the job's environment by using $PBS_NUM_NODES and $PBS_NUM_PPN respectively. This aids in mapping the requested nodes to aprun calls. For example, the general format for calling aprun within a job script is: aprun -n $(($PBS_NUM_NODES * $PBS_NUM_PPN)) -N $PBS_NUM_PPN
Example submissions:
#PBS -l nodes=1:ppn=16 aprun -n 16 -N 16 hostname
#PBS -l nodes=20 aprun -n 20 hostname
#PBS -l nodes=20:ppn=16 aprun -n 320 -N 16 hostname
#PBS -l nodes=2:ppn=16 #PBS -l hostlist=35+36 aprun -n 32 -N 16 hostname
#PBS -l procs=64 aprun -n 64 hostname
#run on login nodes only #PBS -l procs=0
By default, interactive jobs run from the login node that they are submitted from, which can be useful if you need that node's particular features. This default behavior can be changed using the TORQUE server parameter interactive_jobs_can_roam. When set to TRUE, this parameter allows interactive jobs to run on login nodes other than the one where the jobs were submitted from. For an interactive job submitted from a different node to run on it, a node must have the alps_login property set in the nodes file.
qmgr -c 'set server interactive_jobs_can_roam = True'
You can disable the behavior of interactive_jobs_can_roam = True for an individual interactive job by submitting it with the option -W login_property=<nodeId>. The job will run on node <nodeId> and will not roam.
In TORQUE 4.1, Cray compute nodes are treated as TORQUE nodes. This also applies to job allocations. For example, a job can know which nodes were allocated to the job by cat'ing $PBS_NODEFILE.
$ qsub -l -l size=64,walltime=5:00 qsub: waiting for job 3043.cray to start qsub: job 3043.cray ready $ cat $PBS_NODEFILE | wc -l 64 $ cat $PBS_NODEFILE | uniq 2876 2877
You can also view where your job is running by looking at exec_host in qsat -f output.
$ qstat -f 3043 | grep -A 7 exec_host exec_host = 2876/31+2876/30+2876/29+2876/28+2876/27+2876/26+2876/25+2876/2 4+2876/23+2876/22+2876/21+2876/20+2876/19+2876/18+2876/17+2876/16+2876 /15+2876/14+2876/13+2876/12+2876/11+2876/10+2876/9+2876/8+2876/7+2876/ 6+2876/5+2876/4+2876/3+2876/2+2876/1+2876/0+2877/31+2877/30+2877/29+28 77/28+2877/27+2877/26+2877/25+2877/24+2877/23+2877/22+2877/21+2877/20+ 2877/19+2877/18+2877/17+2877/16+2877/15+2877/14+2877/13+2877/12+2877/1 1+2877/10+2877/9+2877/8+2877/7+2877/6+2877/5+2877/4+2877/3+2877/2+2877 /1+2877/0
The login node is not included in $PBS_NODEFILE or exec_host. It can be viewed through qstat -f on the login_node_id field.
$ qstat -f 3043 | grep login_node_id login_node_id = login6/0
pbsnodes will report the job running on compute nodes under jobs = field for compute nodes.
$ pbsnodes 2878 state = job-exclusive,busy np = 32 ntype = cluster jobs = 0/3043.cray, 1/3043.cray, 2/3043.cray, 3/3043.cray, 4/3043.cray, 5/3043.cray, 6/3043.cray, 7/3043.cray, 8/3043.cray, 9/3043.cray, 10/3043.cray, 11/3043.cray, 12/3043.cray, 13/3043.cray, 14/3043.cray, 15/3043.cray, 16/3043.cray, 17/3043.cray, 18/3043.cray, 19/3043.cray, 20/3043.cray, 21/3043.cray, 22/3043.cray, 23/3043.cray, 24/3043.cray, 25/3043.cray, 26/3043.cray, 27/3043.cray, 28/3043.cray, 29/3043.cray, 30/3043.cray, 31/3043.cray status = rectime=1340836785,node_index=2814,state=BUSY,totmem=33554432kb,CMEMORY=32768,APROC=0,CPROC=32,name=c1- 0c2s0n2,ARCH=XT mom_service_port = 15002 mom_manager_port = 15003 gpus = 0
pbsnodes will not present Cray compute nodes on the login node's "jobs =" field but will for login only jobs. Cray compute jobs can be seen in the "jobs=" attribute in the "status=" field.
$ pbsnodes login6 state = free np =24 properties = fa,fb ntype = cluster jobs = status = rectime=1340836830,varattr=,jobs=3043.cray,state=free,netload=3044161434,gres=,loadave=0.04,ncpus=6,physmem= 7934560kb,availmem=12820180kb,totmem=15934044kb,idletime=1754,nusers=3,nsessions=29,sessions=1506 1923 1179 2127 2163 2176 2187 2195 2262 2275 2308 5809 8123 8277 8675 9547 10515 10551 12769 32351 14430 14518 22380 24082 24321 24849 30918 32371 32718,uname=Linux cray 2.6.35-32-generic #67-Ubuntu SMP Mon Mar 5 19:39:49 UTC 2012 x86_64,opsys=linux mom_service_port = 34001 mom_manager_port = 34002 gpus = 0