5.632 Considerations Before Upgrading

Torque is flexible in regards to how it can be upgraded. In most cases, a Torque "shutdown" followed by a configure, make, make install procedure as documented in this guide is all that is required (see Installing ). This process will preserve existing configuration and in most cases, existing workload.

A few considerations are included below:

To upgrade

  1. Build new release (do not install).
  2. Stop all Torque daemons (see qterm and momctl -s).
  3. Install new Torque (use make install).
  4. Start all Torque daemons.

5.632.1 Rolling Upgrade

If you are upgrading to a new point release of your current version (for example, from 4.2.2 to 4.2.3) and not to a new major release from your current version (for example, from 4.1 to 4.2), you can use the following procedure to upgrade Torque without taking your nodes offline.

Because Torque version 4.1.4 changed the way that pbs_server communicates with the MOMs, it is not recommended that you perform a rolling upgrade of Torque from version 4.1.3 to 4.1.4.

To perform a rolling upgrade in Torque

  1. Enable the pbs_mom flag on the MOMs you want to upgrade. The enablemomrestart option causes a MOM to check if its binary has been updated and restart itself at a safe point when no jobs are running. You can enable this in the MOM configuration file, but it is recommended that you use momctl instead.
  2. > momctl -q enablemomrestart=1 -h :ALL

    The enablemomrestart flag is enabled on all nodes.

  3. Replace the pbs_mom binary, located in /usr/local/bin by default. pbs_mom will continue to run uninterrupted because the pbs_mom binary has already been loaded in RAM.
    > torque-package-mom-linux-x86_64.sh --install

    The next time pbs_mom is in an idle state, it will check for changes in the binary. If pbs_mom detects that the binary on disk has changed, it will restart automatically, causing the new pbs_mom version to load.

    After the pbs_mom restarts on each node, the enablemomrestart parameter will be set back to false (0) for that node.

If you have cluster with high utilization, you may find that the nodes never enter an idle state so pbs_mom never restarts. When this occurs, you must manually take the nodes offline and wait for the running jobs to complete before restarting pbs_mom. To set the node to an offline state, which will allow running jobs to complete but will not allow any new jobs to be scheduled on that node, use pbsnodes -o <nodeName>. After the new MOM has started, you must make the node active again by running pbsnodes -c <nodeName>.

© 2016 Adaptive Computing