Appendices > Appendix E: Considerations before upgrading
Appendix E: Considerations before upgrading
TORQUE is flexible in regards to how it can be upgraded. In most cases, a TORQUE "shutdown" followed by a configure, make, make install procedure as documented in this guide is all that is required (see Installing TORQUE). This process will preserve existing configuration and in most cases, existing workload.
A few considerations are included below:
- If upgrading from OpenPBS, PBSPro, or TORQUE 1.0.3 or earlier, queued jobs whether active or idle will be lost. In such situations, job queues should be completely drained of all jobs.
- If not using the pbs_mom -r or -p flag (see Command-line arguments), running jobs may be lost. In such cases, running jobs should be allowed to be completed or should be requeued before upgrading TORQUE.
- pbs_mom and pbs_server daemons of differing versions may be run together. However, not all combinations have been tested and unexpected failures may occur.
- trqauthd is an intermediary between client commands and pbs_server. It is recommended that when you upgrade pbs_server you also upgrade the client utilities and trqautd to prevent unexpected failures when you execute client commands. Because no direct relationship exists between the MOMs and trqauthd, you can upgrade trqauthd without upgrading the MOMs.
- When upgrading from early versions of TORQUE (pre-4.0) and Moab, you may encounter a problem where Moab core files are regularly created in /opt/moab. This can be caused by old TORQUE library files used by Moab that try to authorize with the old TORQUE pbs_iff authorization daemon. You can resolve the problem by removing the old version library files from /usr/local/lib.
To upgrade
- Build new release (do not install).
- Stop all TORQUE daemons (see qterm and momctl -s).
- Install new TORQUE (use make install).
- Start all TORQUE daemons.
Rolling upgrade
If you are upgrading to a new point release of your current version (for example, from 4.2.2 to 4.2.3) and not to a new major release from your current version (for example, from 4.1 to 4.2), you can use the following procedure to upgrade TORQUE without taking your nodes offline.
Because TORQUE version 4.1.4 changed the way that pbs_server communicates with the MOMs, it is not recommended that you perform a rolling upgrade of TORQUE from version 4.1.3 to 4.1.4.
To perform a rolling upgrade in TORQUE
- Enable the enablemomrestart flag on the MOMs you want to upgrade. The enablemomrestart option causes a MOM to check if its binary has been updated and restart itself at a safe point when no jobs are running. You can enable this in the MOM configuration file, but it is recommended that you use momctl instead.
> momctl -q enablemomrestart=1 -h :ALL
The enablemomrestart flag is enabled on all nodes.
- Replace the pbs_mom binary, located in /usr/local/bin by default. pbs_mom will continue to run uninterrupted because the pbs_mom binary has already been loaded in RAM.
> torque-package-mom-linux-x86_64.sh --install
The next time pbs_mom is in an idle state, it will check for changes in the binary. If pbs_mom detects that the binary on disk has changed, it will restart automatically, causing the new pbs_mom version to load.
After the pbs_mom restarts on each node, the enablemomrestart parameter will be set back to false (0) for that node.
If you have cluster with high utilization, you may find that the nodes never enter an idle state so pbs_mom never restarts. When this occurs, you must manually take the nodes offline and wait for the running jobs to complete before restarting pbs_mom. To set the node to an offline state, which will allow running jobs to complete but will not allow any new jobs to be scheduled on that node, use pbsnodes -o <nodeName>. After the new MOM has started, you must make the node active again by running pbsnodes -c <nodeName>.