Create standing reservation Mail2 with a trigger for the script /tmp/email.sh. Launch the script 200 seconds after the start of the reservation.
SRCFG[Mail2] TRIGGER=EType=start,Offset=200,AType=exec,Action="/tmp/email.sh" ...
Create a trigger associated with the job job46 and 150 seconds before the job is scheduled to finish, launch the /tmp/email.sh script with the command line argument Hello.
> mschedctl -c trigger EType=end,offset=-150,AType=exec,Action="/tmp/email.sh Hello" -o Job:job46
This example includes the reservation, the two scripts, and the output of the scripts.
> mrsvctl -c -h keiko \ -T 'Sets=Var1.Var2,EType=start,AType=exec,Action="/tmp/trigs/trig1.sh ReservationStart NewReservation"' \ -T 'Requires=Var1.Var2,EType=start,AType=exec,Action="/tmp/trigs/trig2.sh $Var1 stuff $Var2"'
#!/bin/sh echo -e "$1, $2 $*" >> /tmp/trigs/trig1 echo "Var1=1" echo "Var2=2" exit 0
#!/bin/sh echo -e "$1, $2, $3 $*" > /tmp/trigs/trig2 exit 0
The preceding example creates the following output:
ReservationStart, NewReservation ReservationStart NewReservation
1, stuff, 2 1 stuff 2
This example places a trigger on all nodes and will fire when any node changes to a down state.
NODECFG[DEFAULT] TRIGGER=AType=exec,Action="/tmp/nodedown.sh $OID",EType=fail,MultiFire=TRUE,RearmTime=5:00
When any node changes its state to failure (from a non-failure state) Moab will execute the script /tmp/nodedown.sh. The MultiFire attribute means this trigger will fire every time a node fails and the RearmTime attribute means Moab will wait at least five minutes between firing the triggers in succession. (This example can be easily modified to trigger for only one node by replacing DEFAULT with the node name.)
The next example creates a diagnostic trigger that will fire when a node's state changes to down; then a second trigger fires based on the diagnostic information obtained by the first trigger.
NODECFG[DEFAULT] TRIGGER=atype=exec,action="/tmp/node_diagnostics.sh $OID",etype=fail,multifire=true,rearmtime=5:00,sets=OUTPUT NODECFG[DEFAULT] TRIGGER=atype=exec,action="/tmp/node_recovery.sh $OID $OUTPUT",etype=fail,requires=OUTPUT,multifire=true,rearmtime=5:00,unsets=OUTPUT
In this example the first trigger will run the script "node_diagnostics.sh" and will set a variable called OUTPUT. The second trigger will use this OUTPUT information to decide what action to take.
This example places a mail trigger onto the scheduler that will fire whenever a failure is detected. In addition, the example configuration will also place a trigger on each compute node that will fire if the node goes down.
SCHEDCFG[MyCluster] TRIGGER=AType=mail,EType=fail,Action="scheduler failure detected on $TIME",MultiFire=TRUE,RearmTime=5:00 NODECFG[DEFAULT] TRIGGER=AType=mail,EType=fail,Action="node $OID has failed on $TIME",MultiFire=TRUE,RearmTime=15:00 ...
The FAILTIME is set on the resource manager, along with a failure trigger. This trigger will fire if the resource manager base goes down for more than three minutes.
RMCFG[base] TYPE=PBS FAILTIME=3:00 RMCFG[base] TRIGGER=atype=exec,action="/opt/moab/tools/diagnose_rm.pl $OID",etype=failure
The Support Diagnostic Script can be used to save a scheduler snapshot based on particular events. To save a system snapshot when the scheduler places a hold on a job, the following trigger can be configured:
CLASSCFG[batch] JOBTRIGGER=atype=exec,etype=hold,action="$HOME/tools/support.diag.pl"
Standing triggers can be created on the scheduler object using the SCHEDCFG parameter as in the following example:
SCHEDCFG[base] TRIGGER=atype=exec,name=jobsubmit_hour,etype=standing,action="/opt/moab/tools/createjobs_hour.pl",period=hour,offset=05:00 SCHEDCFG[base] TRIGGER=atype=exec,name=jobsubmit_day,etype=standing,action="/opt/moab/tools/createjobs_day.pl",period=day,offset=30:00
This example will launch the createjobs_hour.pl script at 5 minutes past every hour and will run the createjobs_day.pl at 12:30 AM every morning.
In this example a system job is submitted requesting a specific set of nodes. The job is submitted with the FRAGMENT flag, which will split the job up and run one job per distinct allocated host. Three triggers are then attached to each of those jobs:
#!/bin/sh echo nothing | msub -l nodes=ALL,walltime=10:00,flags=NORMSTART:SYSTEMJOB:FRAGMENT,\ trig=atype=exec\&action=/tmp/diag.sh\ \$HOSTLIST\&etype=start\&sets=DIAG,\ trig=atype=exec\&action=/tmp/step.sh\ \$HOSTLIST\ \$DIAG\&etype=start\&requires=DIAG\&sets=good.\!bad,\ trig=atype=internal\&action=job:-:complete\&etype=start\&requires=good
Create a reservation on a node that has five successive job failures to block the node from running any other jobs until the problem is resolved.
NODECFG[DEFAULT] TRIGGER=atype=internal,action="node:-:reserve",etype=threshold,threshold=statistic[successivejobfailures]>5,multifire=true
To reset the trigger and release the reservation, clear out the metric. This is done by either restarting Moab with mschedctl -R or modifying the reservation to grant yourself access and running a job inside of it.
You can specify an RSVPROFILE on an internal node trigger reservation action.
NODECFG[DEFAULT] TRIGGER=atype=internal,action="node:-:reserve:rsvprofile=<rsvprofile_name>"
Two new internal actions (for kicking off overcommit and green migrations) can be attached to triggers, which allow for greater scheduling flexibility.
TRIGGER=atype=internal,etype=<whatever>,action=sched:-:vmmigrate:[rsv|green|overcommit]
These three options still need to be enabled in moab.cfg and in moab.lic.