(Click to open topic with navigation)
Torque 6.1 binaries are backward compatible with Torque 5.0 or later. However they are not backward compatible with Torque versions prior to 5.0. When you upgrade to Torque 6.1.0 from versions prior to 5.0, all MOM and server daemons must be upgraded at the same time.
The job format is compatible between 6.1 and previous versions of Torque and any queued jobs will upgrade to the new version. It is not recommended to upgrade Torque while jobs are in a running state.
This topic contains instructions on how to upgrade and start Torque Resource Manager (Torque).
If you need to upgrade a Torque version prior to 4.0, contact Adaptive Computing.
See 5.631 Considerations Before Upgrading in the Torque Resource Manager Administrator Guide for additional important information, including about how to handle running jobs during an upgrade, mixed server/MOM versions, and the possibility of upgrading the MOMs without having to take compute nodes offline.
In this topic:
This section contains information of which you should be aware before upgrading.
In this section:
Before upgrading the system, all running jobs must complete. To prevent queued jobs from starting, nodes can be set to offline or all queues can be disabled (using the "started" queue attribute). See pbsnodes or Queue Attributes in the Torque Resource Manager Administrator Guide for more information.
For upgrading Torque to 6.1.0 on a Cray system, refer to the Installation Notes for Moab and Torque for Cray in Appendix G of the Moab Workload Manager Administrator Guide.
Using
When cgroups are enabled (recommended), hwloc version 1.9.1 or later is required. NVIDIA K80 requires libhwloc 1.11.0.
The following instructions are for installing version 1.9.1.
Do the following:
[root]# yum install gcc make [root]# tar -xzvf hwloc-1.9.1.tar.gz [root]# cd hwloc-1.9.1 [root]# ./configure [root]# make [root]# make install
[root]# echo /usr/local/lib >/etc/ld.so.conf.d/hwloc.conf [root]# ldconfig
Do the following:
[root]# service pbs_server stop
Confirm all jobs have completed before stopping pbs_mom. You can do this by typing "momctl -d3". If there are no jobs running, you will see the message "NOTE: no local jobs detected" towards the bottom of the output. If jobs are still running and the MOM is shutdown, you will only be able to track when the job completes and you will not be able to get completion codes or statistics.
[root]# service pbs_mom stop
[root]# service trqauthd stop
2.20.3 Upgrade the Torque Server
You must complete all the previous upgrade steps in this topic before upgrading Torque server. See the list of steps at the beginning of this topic.
On the Torque Server Host, do the following:
[root]# tar -cvf backup.tar.gz TORQUE_HOME/server_priv
[root]# yum install boost-devel
Depending on your system configuration, you will need to add ./configure command options.
At a minimum, you add:
These instructions assume you are using cgroups. When cgroups are supported, cpusets are handled by the cgroup cpuset subsystem. If you are not using cgroups, use ‑‑enable‑cpusets instead.
See
[root]# cd /tmp [root]# tar xzvf torque-6.1.0.tar.gz [root]# cd torque-6.1.0 [root]# ./configure --enable-cgroups --with-hwloc-path=/usr/local # add any other specified options [root]# make [root]# make install
Do the following:
[root]# make packages Building ./torque-package-clients-linux-x86_64.sh ... Building ./torque-package-mom-linux-x86_64.sh ... Building ./torque-package-server-linux-x86_64.sh ... Building ./torque-package-gui-linux-x86_64.sh ... Building ./torque-package-devel-linux-x86_64.sh ... Done. The package files are self-extracting packages that can be copied and executed on your production machines. Use --help for options.
Adaptive Computing recommends that you use a remote shell, such as SSH, to install packages on remote systems. Set up shared SSH keys if you do not want to supply a password for each Torque MOM Host.
[root]# scp torque-package-mom-linux-x86_64.sh <torque-mom-host>:
ns perf_event net_prio cpuset /cgroup/cpuset cpu /cgroup/cpu cpuacct /cgroup/cpuacct memory /cgroup/memory devices /cgroup/devices freezer /cgroup/freezer net_cls /cgroup/net_cls blkio /cgroup/blkio
[root]# yum install libcgroup [root]# service cgconfig start
[root]# ./torque-package-mom-linux-x86_64.sh --install
2.20.5 Update the Torque Clients
This section contains instructions on updating the Torque clients on the Torque Client Hosts (including the Moab Server Host and Torque MOM Hosts, if applicable).
Adaptive Computing recommends that you use a remote shell, such as SSH, to install packages on remote systems. Set up shared SSH keys if you do not want to supply a password for each Torque MOM Host.
[root]# scp torque-package-clients-linux-x86_64.sh <torque-client-host>:
[root]# scp torque-package-devel-linux-x86_64.sh <moab-server-host>:
This step can be done from the Torque server from a remote shell, such as SSH. Set up shared SSH keys if you do not want to supply a password for each Torque Client Host.
[root]# ./torque-package-clients-linux-x86_64.sh --install
[root]# ./torque-package-devel-linux-x86_64.sh --install
Do the following:
[root]# service trqauthd start
[root]# service pbs_mom start
[root]# service pbs_server start
2.20.7 Perform Status and Error Checks
On the Torque Server Host, do the following:
Verify that the status of the nodes and jobs are as expected.
[root]# pbsnodes [root]# qstat