4.1   Dynamic Nodes

Dynamic nodes are nodes that can be added and removed from Torque at any time. Specifically, any node that has a TTL (time to live) is considered a dynamic node. The following section explains how to add and delete nodes via qmgr.

As of Moab version 9.1.2, dynamic node procs are no longer counted against the total procs listed in the Moab license. This allows you to do as many bursts as you desire without exceeding the total procs used for on-premises nodes. If your version of Moab is before 9.1.2, please contact your Adaptive Computing sales representative.

4.1.1 Dynamic Node Parameters

The table below describes the parameters that are used while adding and removing dynamic nodes.

Parameter Name Required/ Optional Data Format Description
TTL Optional

yyyy-mm-ddThh:mm:ss±hh

 

OR

yyyy-mm-ddThh:mm:ss±

hhmm

 

OR

yyyy-mm-ddThh:mm:ssZ

Time, given as a UTC time, for the node to be removed. The time is Greenwich Mean Time with either an offset or a Z to indicate zero offset.
requestid Optional Any sequence of non-white-space characters Identifier used by Moab to identify a group of nodes. See requestid Parameter (Adding or Removing Nodes) for more information.
acl optional user==user1:user2,host==host1

List of credentials that can run jobs on this dynamic node.

4.1.2 Dynamic Node Events

You can record dynamic node activity using RECORDEVENTLIST in the moab.cfg using one or both of these events:

4.1.2.A  NODEADD

The NODEADD event is generated when the RM first reports a new node to Moab.

The following is an example from the event_xxx file in the $MOAB_HOME/stats directory:

16:22:32 1412202152:359437 node     nuc2         NODEADD      nuc2 STATE=Idle PARTITION=bdaw ADISK=1 AMEMORY=15193 APROC=4 ASWAP=16717 CDISK=1 CMEMORY=15918 CPROC=4 CSWAP=17442 OS=linux RM=bdaw NODEACCESSPOLICY=SHARED CCLASS=[DevQ][batch] MSG='Node 'nuc2' was newly reported in the last cluster query.  RequestID = 1234, TTL = 1420070400'

4.1.2.B  NODEREMOVE

The NODEREMOVE event is generated when Moab removes a dynamic node after TTL has expired, or if the node is no longer reported to Moab by the RM.

The following is an example from the event_xxx file in the $MOAB_HOME/stats directory:

16:21:44 1412202104:359401 node     nuc2         NODEREMOVE   nuc2 STATE=Idle PARTITION=bdaw ADISK=1 AMEMORY=15192 APROC=4 ASWAP=16716 CDISK=1 CMEMORY=15918 CPROC=4 CSWAP=17442 OS=linux RM=bdaw NODEACCESSPOLICY=SHARED FEATURE=[DEV] CCLASS=[DevQ][batch] MSG='Dynamic node 'nuc2' is being removed.  RequestID = 1234, TTL = 1420070400, Reason = node removed because the RM did not report it in the cluster query'

4.1.3 Configuring Dynamic Nodes

This section contains information on configuration options when adding or removing nodes.

During the creation of a dynamic node, the pbs_server will attempt to resolve the node name to an IP address. If pbs_server is unable to resolve the name, it will not create the node; nor will it retry the creation later.

Immediately after a dynamic node is created, it is assigned a state of "down|MOM-list-not-sent". Once the new node has received the list of all moms, it will be assigned a state of "free" and be available for job scheduling.

4.1.3.A  TTL Parameter (Creating Nodes)

The dynamic nodes are added to the RM with a TTL parameter. The TTL parameter is passed to Moab by the RM. Moab does not schedule workload for a node beyond the TTL assigned to it. Moab removes a dynamic node when it reaches its expiration date as set by TTL. A node end trigger will then fire to notify the service that the dynamic node has been removed in Moab and the service may destroy the virtual machine or deprovision the physical nodes at its convenience.

The following is an example of a node being created with a TTL parameter:

qmgr -c 'create node node003[,node004,node005...] [np=n,][TTL=2015-05-16T05:26:30Z,][acl="user==user1:user2:user3",][requestid=n]'

In the above example, node003 is created with TTL=2015-05-16T05:26:30Z as the TTL parameter. The dynamic node will be removed when the TTL is expired.

4.1.3.B  requestid Parameter (Adding or Removing Nodes)

The dynamic nodes are added to the RM with a requestid parameter that is passed to Moab by the RM. Moab reports the requestid parameter along with the node ID in Moab logs, events, and node end triggers. This allows the external service to tag the nodes allocated together in a block. The tagged nodes are then associated as events, and are reported on a node-by-node basis by Moab.

The requestid can also be used by the external service to de-allocate nodes together in the same block as they were created by the service. For example, a group of nodes has their node end trigger fired due to node idle purge time or TTL expiration.

The requestid is useful if nodes are dynamically added, removed, and then re-added at some later time with the same node ID. Using a requestid when a node is re-added, will help identify each unique instance of a dynamic node’s lifetime in logs, events, etc.

Moab also uses the requestid with the NODEIDLEPURGETIME parameter. The requestid parameter groups the nodes and then references the NODEIDLEPURGETIME information, if specified, to determine when to remove the group of nodes. When all the nodes associated with the requestid have reached the idle purge time threshold defined by the NODEIDLEPURGETIME parameter, Moab fires the node end trigger for all the nodes with the same requestid.

When requestid is configured with NODEIDLEPURGETIME, all of the nodes must be idle.

4.1.3.C  NODEIDLEPURGETIME Parameter (Removing Nodes)

The NODEIDLEPURGETIME parameter instructs Moab to fire a node end trigger when all the nodes in the requestid group have been idle for the time period specified by NODEIDLEPURGETIME.

Setting the NODEIDLEPURGETIME to 0 effectively disables the NODEIDLEPURGETIME. The default value is 0 if NODEIDLEPURGETIME is not configured in the moab.cfg file. See "NODEIDLEPURGETIME" in the Moab Workload Manager Administrator Guide for more information.

The following is an example of configuring the node end trigger in moab.cfg

NODECFG[DEFAULT] TRIGGER=EType=end,TType=elastic,AType=exec,Action="/$HOME/tools/nodeend.sh $OID"

In this example, the nodeend.sh trigger will be called with the name of each node in the requestid group.

The node end trigger notifies the external service that the node (along with all the other nodes with the same requestid) has met the node idle purge time set by the NODEIDLEPURGETIME parameter. The external service may then choose to remove the node from Torque (which in turn removes it from Moab).

The following is an example of the command that a service will run to remove a node from Torque.

qmgr -c 'delete node node003'

If a job is running on a node when it is deleted, the job will be requeued if the job is requeueable or deleted if it is not. If the node has already been shut down, any jobs running on the node will be immediately purged.

© 2018 Adaptive Computing