N.57 Managing Node State

There are multiple models in which Moab can operate allowing it to either honor the node state set by an external service or locally determine and set the node state. This section covers the following:

In this topic:

N.57.1 Node State Definitions

State Definition
Down Node is either not reporting status, is reporting status but failures are detected, or is reporting status but has been marked down by an administrator.
Idle Node is reporting status, currently is not executing any workload, and is ready to accept additional workload.
Busy Node is reporting status, currently is executing workload, and cannot accept additional workload due to load.
Running Node is reporting status, currently is executing workload, and can accept additional workload.
Drained Node is reporting status, currently is not executing workload, and cannot accept additional workload due to administrative action.
Draining Node is reporting status, currently is executing workload, and cannot accept additional workload due to administrative action.

N.57.2 Specifying Node States within Native Resource Managers

Native resource managers can report node state implicitly and explicitly, using NODESTATE, LOAD, and other attributes. See Managing Resources Directly with the Native Interface for more information.

N.57.3 Moab Based Node State Adjustment

Node state can be adjusted based on reported processor, memory, or other load factors. It can also be adjusted based on reports of one or more resource managers in a multi-resource manager configuration. Also, both generic events and generic metrics can be used to adjust node state.

N.57.4 Adjusting Scheduling Behavior Based on Reported Node State

Based on reported node state, Moab can support various policies to make better use of available resources. For more information, see the Green Computing Overview.

N.57.4.A Down State

N.57.5 Adding or Removing Nodes

When a node has been deleted by a resource manager and the resource manager no longer reports data for the node, the node continues to exist in Moab until the next restart.

As a best practice, Adaptive Computing recommends adding or removing nodes only during cluster maintenance, rather than during periods of production activity. A restart of Moab must follow the addition and/or removal of nodes. This guarantees that Moab will handle nodes in a reliable, predictable way. If you want to remove nodes from service, but cannot immediately restart Moab after doing so, we recommend marking the nodes offline (for example, with pbsnodes -o <nodeID> or mnodectl -m state=down <nodeID>) and/or placing an administrative reservation over the nodes, until such time as you can follow the recommended removal procedure during a planned maintenance window.

Related Topics 

© 2017 Adaptive Computing