Moab Workload Manager

19.6 Trigger Examples

19.6.1 Trigger on a Standing Reservation

Create standing reservation Mail2 with a trigger for the script /tmp/email.sh. Launch the script 200 seconds after the start of the reservation.

SRCFG[Mail2] TRIGGER=EType=start,Offset=200,AType=exec,Action="/tmp/email.sh"
...

19.6.2 Job Trigger that Launches Script Prior to Wallclock Limit

Create a trigger associated with the job job46 and 150 seconds before the job is scheduled to finish, launch the /tmp/email.sh script with the command line argument Hello.

> mschedctl -c trigger EType=end,offset=-150,AType=exec,Action="/tmp/email.sh Hello" -o Job:job46

19.6.3 Admin Reservation with Two Triggers

This example includes the reservation, the two scripts, and the output of the scripts.

> mrsvctl -c -h keiko \
-T 'Sets=Var1.Var2,EType=start,AType=exec,Action="/tmp/trigs/trig1.sh ReservationStart NewReservation"' \
-T 'Requires=Var1.Var2,EType=start,AType=exec,Action="/tmp/trigs/trig2.sh $Var1 stuff $Var2"'

#!/bin/sh
echo -e "$1, $2
$*" >> /tmp/trigs/trig1
echo "Var1=1"
echo "Var2=2"

exit 0

#!/bin/sh
echo -e "$1, $2, $3
$*" > /tmp/trigs/trig2

exit 0

The preceding example creates the following output:

ReservationStart, NewReservation
ReservationStart NewReservation

1, stuff, 2
1 stuff 2

19.6.4 Launch a Local Script Each Time a Node Goes Down

This example places a trigger on all nodes and will fire when any node changes to a down state.

NODECFG[DEFAULT] TRIGGER=AType=exec,Action="/tmp/nodedown.sh $OID",EType=fail,MultiFire=TRUE,RearmTime=5:00

When any node changes its state to failure (from a non-failure state) Moab will execute the script /tmp/nodedown.sh. The MultiFire attribute means this trigger will fire every time a node fails and the RearmTime attribute means Moab will wait at least five minutes between firing the triggers in succession. (This example can be easily modified to trigger for only one node by replacing DEFAULT with the node name.)

The next example creates a diagnostic trigger that will fire when a node's state changes to down; then a second trigger fires based on the diagnostic information obtained by the first trigger.

NODECFG[DEFAULT]        TRIGGER=atype=exec,action="/tmp/node_diagnostics.sh $OID",etype=fail,multifire=true,rearmtime=5:00,sets=OUTPUT
NODECFG[DEFAULT]        TRIGGER=atype=exec,action="/tmp/node_recovery.sh $OID $OUTPUT",etype=fail,requires=OUTPUT,multifire=true,rearmtime=5:00,unsets=OUTPUT

In this example the first trigger will run the script "node_diagnostics.sh" and will set a variable called OUTPUT. The second trigger will use this OUTPUT information to decide what action to take.

19.6.5 Sending Email on Scheduler or Node Failure

This example places a mail trigger onto the scheduler that will fire whenever a failure is detected. In addition, the example configuration will also place a trigger on each compute node that will fire if the node goes down.

SCHEDCFG[MyCluster] TRIGGER=AType=mail,EType=fail,Action="scheduler failure detected on $TIME",MultiFire=TRUE,RearmTime=5:00
NODECFG[DEFAULT]    TRIGGER=AType=mail,EType=fail,Action="node $OID has failed on $TIME",MultiFire=TRUE,RearmTime=15:00
...

19.6.6 Resource Manager Failure Trigger

The FAILTIME is set on the resource manager, along with a failure trigger. This trigger will fire if the resource manager base goes down for more than three minutes.

RMCFG[base] TYPE=PBS FAILTIME=3:00
RMCFG[base] TRIGGER=atype=exec,action="/opt/moab/tools/diagnose_rm.pl $OID",etype=failure

19.6.7 Running the Support Script on Job-hold

The Support Diagnostic Script can be used to save a scheduler snapshot based on particular events. To save a system snapshot when the scheduler places a hold on a job, the following trigger can be configured:

CLASSCFG[batch] JOBTRIGGER=atype=exec,etype=hold,action="$HOME/tools/support.diag.pl"

19.6.8 Creating a Periodic Standing Trigger

Standing triggers can be created on the scheduler object using the SCHEDCFG parameter as in the following example:

SCHEDCFG[base] TRIGGER=atype=exec,name=jobsubmit_hour,etype=standing,action="/opt/moab/tools/createjobs_hour.pl",period=hour,offset=05:00
SCHEDCFG[base] TRIGGER=atype=exec,name=jobsubmit_day,etype=standing,action="/opt/moab/tools/createjobs_day.pl",period=day,offset=30:00

This example will launch the createjobs_hour.pl script at 5 minutes past every hour and will run the createjobs_day.pl at 12:30 AM every morning.

19.6.9 Fragmenting a Diagnostic System Job Across the Entire Cluster

In this example a system job is submitted requesting a specific set of nodes. The job is submitted with the FRAGMENT flag, which will split the job up and run one job per distinct allocated host. Three triggers are then attached to each of those jobs:

Run a diagnostic script.
Run a recovery script.
If recovery script was successful, complete the system job.

#!/bin/sh
echo nothing | msub -l nodes=ALL,walltime=10:00,flags=NORMSTART:SYSTEMJOB:FRAGMENT,\
trig=atype=exec\&action=/tmp/diag.sh\ \$HOSTLIST\&etype=start\&sets=DIAG,\
trig=atype=exec\&action=/tmp/step.sh\ \$HOSTLIST\ \$DIAG\&etype=start\&requires=DIAG\&sets=good.\!bad,\
trig=atype=internal\&action=job:-:complete\&etype=start\&requires=good

19.6.10 Successive Job Failure Trigger

Create a reservation on a node that has five successive job failures to block the node from running any other jobs until the problem is resolved.

NODECFG[DEFAULT] TRIGGER=atype=internal,action="node:-:reserve",etype=threshold,threshold=statistic[successivejobfailures]>5,multifire=true

To reset the trigger and release the reservation, clear out the metric. This is done by either restarting Moab with mschedctl -R or modifying the reservation to grant yourself access and running a job inside of it.

19.6.11 Specifying an RSVPROFILE on an Internal Node Trigger

You can specify an RSVPROFILE on an internal node trigger reservation action.

NODECFG[DEFAULT] TRIGGER=atype=internal,action="node:-:reserve:rsvprofile=<rsvprofile_name>"

19.6.12 Greater scheduling flexibility with migration internal actions

Two new internal actions (for kicking off overcommit and green migrations) can be attached to triggers, which allow for greater scheduling flexibility.

TRIGGER=atype=internal,etype=<whatever>,action=sched:-:vmmigrate:[rsv|green|overcommit]

These three options still need to be enabled in moab.cfg and in moab.lic.