You are here: 17 Object Triggers > References > Node Maintenance Example

17.19 Node Maintenance Example

Example scenario

An administrator wants to create the following setup in Moab:

When a node's temperature exceeds 34°C, Moab reserves it. If the temperature increases to more than 40°C, Moab requeues all jobs on the node. If the node's temperature exceeds 50°C, Moab shuts it down. Moab removes the node's reservation and unsets the variables when the node cools to less than 25°C. The administrator wants to receive an email whenever any of these events occur.

The first trigger reserves the node when its reported temperature exceeds 34°C. Note that the gmetric name in the trigger must match the name of the configured gmetric exactly, including its case (See Enabling Generic Metrics for more information.).

NODECFG[DEFAULT] TRIGGER=Description="ThresholdA",EType=threshold,Threshold=gmetric[temp]>34,AType=internal,Action="node:-:reserve",RearmTime=30,Offset=2:00,Sets=temp_rsv

The  administrator wants the trigger to fire any time a node overheats, so it must be rearmable. It also needs to specify that the node must be over 34°C for at least two minutes for Moab to reserve it. If the trigger succeeds, it will set a variable to be received by the next trigger in order to make them sequential.

The administrator wants to know when this trigger has fired, so another trigger will send an email once the first trigger has fired and the temp_rsv variable is set. This one does so via a script:

NODECFG[DEFAULT] Trigger=Description="Email on Reservation",EType=start,AType=exec,Action="$TOOLSDIR/node_temp_emailReserve.pl $OID",RearmTime=3:00,Requires=temp_rsv

The second threshold trigger requeues the node's jobs if the node exceeds 40°C and the temp_rsv variable is set. It uses a script to do so. It sets node_evac variable when it fires, regardless of whether it succeeds or fails.

NODECFG[DEFAULT] Trigger=Description="Threshold B",EType=threshold,Threshold=gmetric[temp]>40,Atype=exec,Action="$TOOLSDIR/node_evacuate.pl $OID",RearmTime=3:00,requires=temp_rsv,Sets=node_evac,!node_evac

The administrator wants another email to inform him that the node is still overheating and has been evacuated. Another email trigger fires once it receives the node_evac variable.

NODECFG[DEFAULT] Trigger=Description="Email on Evacuation",EType=start,AType=exec,Action="$TOOLSDIR/node_temp_emailEvac.pl $OID",RearmTime=3:00,Requires=node_evac

The third threshold trigger uses a script to shut down the node if the temp gmetric exceeds 50 and the node_evac variable is set. It sets a node_shutdown variable to be received by the notification email.

NODECFG[DEFAULT TRIGGER=Description="Threshold C",EType=threshold,Threshold=gmetric[temp]>50,AType=exec,Action="$TOOLSDIR/node_shutdown.pl $OID",RearmTime=3:00,Requires=node_evac,Sets=node_shutdown

 

NODECFG[DEFAULT] Trigger=Description="Email on Shutdown",EType=start,AType=exec,Action="$TOOLSDIR/node_temp_emailShutdown.pl $OID",RearmTime=3:00,Requires=node_shutdown

The final trigger removes the reservation and unsets the variables once the node's temp gmetric is less than 25.

NODECFG[DEFAULT] Trigger=Description="Remove Reservation",EType=threshold,Threshold=gmetric[temp]<25,AType=exec,Action="opt/moab/bin/mrsvctl -r r:$OID",RearmTime=3:00,Requires=temp_rsv,unsets=temp_rsv.node_evac.node_shutdown

© 2016 Adaptive Computing