Create standing reservation Mail2 with a trigger for the script /tmp/email.sh. Launch the script 200 seconds after the start of the reservation.
SRCFG[Mail2] TRIGGER=EType=start,Offset=200,AType=exec,Action="/tmp/email.sh" ...
Create a trigger associated with the job job46 and 150 seconds before the job is scheduled to finish, launch the /tmp/email.sh script with the command-line argument Hello.
> mschedctl -c trigger EType=end,offset=-150,AType=exec,Action="/tmp/email.sh Hello" -o Job:job46
This example includes the reservation, the two scripts, and the output of the scripts.
> mrsvctl -c -h keiko \ -T 'Sets=Var1.Var2,EType=start,AType=exec,Action="/tmp/trigs/trig1.sh ReservationStart NewReservation"' \ -T 'Requires=Var1.Var2,EType=start,AType=exec,Action="/tmp/trigs/trig2.sh $Var1 stuff $Var2"'
#!/bin/sh echo -e "$1, $2 $*" >> /tmp/trigs/trig1 echo "Var1=1" echo "Var2=2" exit 0
#!/bin/sh echo -e "$1, $2, $3 $*" > /tmp/trigs/trig2 exit 0
The preceding example creates the following output:
ReservationStart, NewReservation ReservationStart NewReservation
1, stuff, 2 1 stuff 2
This example places a trigger on all nodes and will fire when any node changes to a down state.
NODECFG[DEFAULT] TRIGGER=AType=exec,Action="/tmp/nodedown.sh $OID",EType=fail,MultiFire=TRUE,RearmTime=5:00
When any node changes its state to failure (from a non-failure state) Moab will execute the script /tmp/nodedown.sh. The MultiFire attribute means this trigger will fire every time a node fails and the RearmTime attribute means Moab will wait at least five minutes between firing the triggers in succession. (This example can be easily modified to trigger for only one node by replacing DEFAULT with the node name.)
The next example creates a diagnostic trigger that will fire when a node's state changes to down; then a second trigger fires based on the diagnostic information obtained by the first trigger.
NODECFG[DEFAULT] TRIGGER=atype=exec,action="/tmp/node_diagnostics.sh $OID",etype=fail,multifire=true,rearmtime=5:00,sets=OUTPUT NODECFG[DEFAULT] TRIGGER=atype=exec,action="/tmp/node_recovery.sh $OID $OUTPUT",etype=fail,requires=OUTPUT,multifire=true,rearmtime=5:00,unsets=OUTPUT
In this example the first trigger will run the script "node_diagnostics.sh" and will set a variable called OUTPUT. The second trigger will use this OUTPUT information to decide what action to take.
This example places a mail trigger onto the scheduler that will fire whenever a failure is detected. In addition, the example configuration will also place a trigger on each compute node that will fire if the node goes down.
SCHEDCFG[MyCluster] TRIGGER=AType=mail,EType=fail,Action="scheduler failure detected on $TIME",MultiFire=TRUE,RearmTime=5:00 NODECFG[DEFAULT] TRIGGER=AType=mail,EType=fail,Action="node $OID has failed on $TIME",MultiFire=TRUE,RearmTime=15:00 ...
The FAILTIME is set on the resource manager, along with a failure trigger. This trigger will fire if the resource manager base goes down for more than three minutes.
RMCFG[base] TYPE=PBS FAILTIME=3:00 RMCFG[base] TRIGGER=atype=exec,action="/opt/moab/tools/diagnose_rm.pl $OID",etype=failure
The Support Diagnostic Script can be used to save a scheduler snapshot based on particular events. To save a system snapshot when the scheduler places a hold on a job, the following trigger can be configured:
CLASSCFG[batch] JOBTRIGGER=atype=exec,etype=hold,action="$HOME/tools/moab/support.diag.pl"
Standing triggers can be created on the scheduler object using the SCHEDCFG parameter as in the following example:
SCHEDCFG[base] TRIGGER=atype=exec,name=jobsubmit_hour,etype=standing,action="/opt/moab/tools/createjobs_hour.pl",period=hour,offset=05:00 SCHEDCFG[base] TRIGGER=atype=exec,name=jobsubmit_day,etype=standing,action="/opt/moab/tools/createjobs_day.pl",period=day,offset=30:00
This example will launch the createjobs_hour.pl script at 5 minutes past every hour and will run the createjobs_day.pl at 12:30 AM every morning.
In this example, a system job is submitted requesting a specific set of nodes. The job is submitted with the FRAGMENT flag, which will split the job up and run one job per distinct allocated host. Three triggers are then attached to each of those jobs:
#!/bin/sh echo nothing | msub -l nodes=ALL,walltime=10:00,flags=NORMSTART:SYSTEMJOB:FRAGMENT,\ trig=atype=exec\&action=/tmp/diag.sh\ \$HOSTLIST\&etype=start\&sets=DIAG,\ trig=atype=exec\&action=/tmp/step.sh\ \$HOSTLIST\ \$DIAG\&etype=start\&requires=DIAG\&sets=good.\!bad,\ trig=atype=internal\&action=job:-:complete\&etype=start\&requires=good
You can specify an RSVPROFILE on an internal node trigger reservation action.
NODECFG[DEFAULT] TRIGGER=etype=fail,atype=internal,action="node:-:reserve:rsvprofile=<rsvprofile_name>"
The RSVPROFILE you attach to the reservation can perform useful actions based on the event that just occurred. In the example above, the reservation might be configured so that Moab launches triggers to diagnose or troubleshoot the failed node when the reservation is created.
Two new internal actions (for kicking off overcommit and green migrations) can be attached to triggers, which allow for greater scheduling flexibility.
TRIGGER=atype=internal,etype=<whatever>,action=sched:-:vmmigrate:[rsv|green|overcommit]
These three options still need to be enabled in moab.cfg and in moab.lic.
Threshold triggers attach to virtual container (VC) objects within Moab. Resource managers provide job usage data via the WORKLOADQUERYURL interface; GMETRIC[keyword]=<FLOAT> is returned on each job line that has such tracking enabled. To generate a VC trigger, a user must generate the GMETRIC data and the ACTION_SCRIPT specified by the create trigger command example that follows. Moab simply provides a mechanism, not policy, regarding the trigger action performed.
All jobs within a VC (and all nested VCs and their jobs) have respective keyword values harvested (as threshold values) per iteration, and an average of the jobs is generated. This average usage is then used in the trigger expression to determine if the trigger needs to fire.
To enable Moab to create and fire a VC trigger, do the following:
RMCFG[machine] WORKLOADQUERYURL=file://$HOME/scripts/qworkload.txt
Moab.1 COMMENT="SID=Moab?SJID=Moab.1?SRMJID=Moab.1" EXEC=/opt/moab/spool/moab.job.DEbZod GATTR=PREEMPTEE GNAME=company IWD=/opt/moab QUEUETIME=1317397677 STARTTIME=1317397677 STATE=Running TASKLIST=node10 TASKS=1 UNAME=root WCLIMIT=70000 GMETRIC[load]=22.0
The generic metric is derived from the average load of all jobs in the VC (and all nested VCs and their jobs). |
After performing the necessary configuration steps in the previous section, to create and fire a VC trigger, do the following:
> msub . . .
> mvcctl -a job:<NAME> <VCNAME>
You can submit a job and attach it to a VC in one step via the following command:> msub -W x=vc=<VCNAME> . . . |
> mschedctl -c trigger AType=exec,Action="<ACTION FILE NAME>",etype="threshold,threshold=gmetric[<NAME>]{>|>=|<|<=|==|!=}<VALUE>" -o vc:<NAME>
See Trigger Components for more information on individual trigger components such as AType, Action, and Threshold.
Copyright © 2012 Adaptive Computing Enterprises, Inc.®