Appendix L: TORQUE Quick Start Guide
Appendix L: TORQUE Quick Start Guide
L.1 Initial Installation
- Download the TORQUE distribution file from http://clusterresources.com/downloads/torque
- Extract and build the distribution on the machine that will act as the "TORQUE server" - the machine that will monitor and control all compute nodes by running the pbs_server daemon. See the example below:
> tar -xzvf torque.tar.gz
> cd torque
> ./configure
> make
> make install
|
OSX 10.4 users need to change the #define __TDARWIN in src/include/pbs_config.h to #define __TDARWIN_8. |
|
After installation, verify you have PATH environment variables configured for /usr/local/bin/ and /usr/local/sbin/. Client commands are installed to /usr/local/bin and server binaries are installed to /usr/local/sbin.< |
|
In this document TORQUE_HOME corresponds to where TORQUE stores its configuration files. The default is /var/spool/torque. |
L.2 Initialize/Configure TORQUE on the Server (pbs_server)
- Once installation on the TORQUE server is complete, configure the pbs_server daemon by executing the command torque.setup <USER> found packaged with the distribution source code, where <USER> is a username that will act as the TORQUE admin. This script will set up a basic batch queue to get you started. If you experience problems, make sure that the most recent TORQUE executables are being executed, or that the executables are in your current PATH.
- If doing this step manually, be certain to run the command 'pbs_server -t create' to create the new batch database. If this step is not taken, the pbs_server daemon will be unable to start.
- Proper server configuration can be verified by following the steps listed in Section 1.4 Testing
L.3 Install TORQUE on the Compute Nodes
To configure a compute node do the following on each machine (see page 19, Section 3.2.1 of PBS Administrator's Manual for full details):
- Create the self-extracting, distributable packages with make packages (See the INSTALL file for additional options and features of the distributable packages) and use the parallel shell command from your cluster management suite to copy and execute the package on all nodes (ie: xCAT users might do prcp torque-package-linux-i686.sh main:/tmp/; psh main /tmp/torque-package-linux-i686.sh --install. Optionally, distribute and install the clients package.
L.4 Configure TORQUE on the Compute Nodes
- For each compute host, the MOM daemon must be configured to trust the pbs_server daemon. In TORQUE 2.0.0p4 and earlier, this is done by creating the TORQUE_HOME/mom_priv/config file and setting the $pbsserver parameter. In TORQUE 2.0.0p5 and later, this can also be done by creating the TORQUE_HOME/server_name file and placing the server hostname inside.
- Additional config parameters may be added to TORQUE_HOME/mom_priv/config (See the MOM Config page for details.)
L.5 Configure Data Management on the Compute Nodes
Data management allows jobs' data to be staged in/out or to and from the server and compute nodes.
- For shared filesystems (i.e., NFS, DFS, AFS, etc.) use the $usecp parameter in the mom_priv/config files to specify how to map a user's home directory.
(Example: $usecp gridmaster.tmx.com:/home /home)
- For local, non-shared filesystems, rcp or scp must be configured to allow direct copy without prompting for passwords (key authentication, etc.)
L.6 Update TORQUE Server Configuration
- On the TORQUE server, append the list of newly configured compute nodes to the TORQUE_HOME/server_priv/nodes file:
server_priv/nodes
computenode001.cluster.org
computenode002.cluster.org
computenode003.cluster.org
L.7 Start the pbs_mom Daemons on Compute Nodes
- Next start the pbs_mom daemon on each compute node by running the pbs_mom executable.
L.8 Verifying Correct TORQUE Installation
The
pbs_server daemon was started on the TORQUE server when the
torque.setup file was executed or when it was manually configured. It must now be restarted so it can reload the updated configuration changes.
# shutdown server
> qterm # shutdown server
# start server
> pbs_server
# verify all queues are properly configured
> qstat -q
# view additional server configuration
> qmgr -c 'p s'
# verify all nodes are correctly reporting
> pbsnodes -a
# submit a basic job
> echo "sleep 30" | qsub
# verify jobs display
> qstat
At this point, the job will not start because there is no scheduler running. The scheduler is enabled in the next step below.
L.9 Enabling the Scheduler
Selecting the cluster scheduler is an important decision and significantly affects cluster utilization, responsiveness, availability, and intelligence. The default TORQUE scheduler,
pbs_sched, is very basic and will provide poor utilization of your cluster's resources. Other options, such as
Maui Scheduler or
Moab Workload Manager are highly recommended. If using Maui/Moab, refer to the Moab-PBS Integration Guide. If using
pbs_sched, start this daemon now.
|
If you are installing ClusterSuite, TORQUE and Moab were configured at installation for interoperability and no further action is required. |
L.10 Startup/Shutdown Service Script for TORQUE/Moab (OPTIONAL)
Optional startup/shutdown service scripts are provided as an example of how to run TORQUE as an OS service that starts at bootup. The scripts are located in the contrib/init.d/ directory of the TORQUE tarball you downloaded. In order to use the script you must:
- Determine which init.d script suits your platform the best.
- Modify the script to point to TORQUE's install location. This should only be necessary if you used a non-default install location for TORQUE (by using the --prefix option of ./configure).
- Place the script in the /etc/init.d/ directory.
- Use a tool like chkconfig to activate the start-up scripts or make symbolic links (S99moab and K15moab, for example) in desired runtimes (/etc/rc.d/rc3.d/ on Redhat, etc.).
See Also: