Trigger Examples

Trigger on a Standing Reservation
Job Trigger that Launches Script Prior to Wallclock Limit
Admin Reservation with Two Triggers
Launch a Local Script Each Time a Node Goes Down
Sending Email on Scheduler or Node Failure
Resource Manager Failure Trigger
Running the Support Script on Job-hold
Creating a Periodic Standing Trigger
Fragmenting a diagnostic system job across the entire cluster
Successive Job Failure Trigger
Specifying an RSVPROFILE on an Internal Node Trigger
Greater scheduling flexibility with migration internal actions
Virtual Container (VC) Trigger

18.6.1 Trigger on a Standing Reservation

Create standing reservation Mail2 with a trigger for the script /tmp/email.sh. Launch the script 200 seconds after the start of the reservation.

SRCFG[Mail2] TRIGGER=EType=start,Offset=200,AType=exec,Action="/tmp/email.sh"
...

18.6.2 Job Trigger that Launches Script Prior to Wallclock Limit

Create a trigger associated with the job job46 and 150 seconds before the job is scheduled to finish, launch the /tmp/email.sh script with the command-line argument Hello.

> mschedctl -c trigger EType=end,offset=-150,AType=exec,Action="/tmp/email.sh Hello" -o Job:job46

18.6.3 Admin Reservation with Two Triggers

This example includes the reservation, the two scripts, and the output of the scripts.

> mrsvctl -c -h keiko \
-T 'Sets=Var1.Var2,EType=start,AType=exec,Action="/tmp/trigs/trig1.sh ReservationStart NewReservation"' \
-T 'Requires=Var1.Var2,EType=start,AType=exec,Action="/tmp/trigs/trig2.sh $Var1 stuff $Var2"'

#!/bin/sh
echo -e "$1, $2
$*" >> /tmp/trigs/trig1
echo "Var1=1"
echo "Var2=2"
exit 0

#!/bin/sh
echo -e "$1, $2, $3
$*" > /tmp/trigs/trig2
exit 0

The preceding example creates the following output:

ReservationStart, NewReservation
ReservationStart NewReservation

1, stuff, 2
1 stuff 2

18.6.4 Launch a Local Script Each Time a Node Goes Down

This example places a trigger on all nodes and will fire when any node changes to a down state.

NODECFG[DEFAULT] TRIGGER=AType=exec,Action="/tmp/nodedown.sh $OID",EType=fail,MultiFire=TRUE,RearmTime=5:00

When any node changes its state to failure (from a non-failure state) Moab will execute the script /tmp/nodedown.sh. The MultiFire attribute means this trigger will fire every time a node fails and the RearmTime attribute means Moab will wait at least five minutes between firing the triggers in succession. (This example can be easily modified to trigger for only one node by replacing DEFAULT with the node name.)

The next example creates a diagnostic trigger that will fire when a node's state changes to down; then a second trigger fires based on the diagnostic information obtained by the first trigger.

NODECFG[DEFAULT]        TRIGGER=atype=exec,action="/tmp/node_diagnostics.sh $OID",etype=fail,multifire=true,rearmtime=5:00,sets=OUTPUT
NODECFG[DEFAULT]        TRIGGER=atype=exec,action="/tmp/node_recovery.sh $OID $OUTPUT",etype=fail,requires=OUTPUT,multifire=true,rearmtime=5:00,unsets=OUTPUT

In this example the first trigger will run the script "node_diagnostics.sh" and will set a variable called OUTPUT. The second trigger will use this OUTPUT information to decide what action to take.

18.6.5 Sending Email on Scheduler or Node Failure

This example places a mail trigger onto the scheduler that will fire whenever a failure is detected. In addition, the example configuration will also place a trigger on each compute node that will fire if the node goes down.

SCHEDCFG[MyCluster] TRIGGER=AType=mail,EType=fail,Action="scheduler failure detected on $TIME",MultiFire=TRUE,RearmTime=5:00
NODECFG[DEFAULT]    TRIGGER=AType=mail,EType=fail,Action="node $OID has failed on $TIME",MultiFire=TRUE,RearmTime=15:00
...

18.6.6 Resource Manager Failure Trigger

The FAILTIME is set on the resource manager, along with a failure trigger. This trigger will fire if the resource manager base goes down for more than three minutes.

RMCFG[base] TYPE=PBS FAILTIME=3:00
RMCFG[base] TRIGGER=atype=exec,action="/opt/moab/tools/diagnose_rm.pl $OID",etype=failure

18.6.7 Running the Support Script on Job-hold

The Support Diagnostic Script can be used to save a scheduler snapshot based on particular events. To save a system snapshot when the scheduler places a hold on a job, the following trigger can be configured:

CLASSCFG[batch] JOBTRIGGER=atype=exec,etype=hold,action="$HOME/tools/moab/support.diag.pl"

18.6.8 Creating a Periodic Standing Trigger

Standing triggers can be created on the scheduler object using the SCHEDCFG parameter as in the following example:

SCHEDCFG[base] TRIGGER=atype=exec,name=jobsubmit_hour,etype=standing,action="/opt/moab/tools/createjobs_hour.pl",period=hour,offset=05:00
SCHEDCFG[base] TRIGGER=atype=exec,name=jobsubmit_day,etype=standing,action="/opt/moab/tools/createjobs_day.pl",period=day,offset=30:00

This example will launch the createjobs_hour.pl script at 5 minutes past every hour and will run the createjobs_day.pl at 12:30 AM every morning.

18.6.9 Fragmenting a Diagnostic System Job Across the Entire Cluster

In this example, a system job is submitted requesting a specific set of nodes. The job is submitted with the FRAGMENT flag, which will split the job up and run one job per distinct allocated host. Three triggers are then attached to each of those jobs:

Run a diagnostic script.
Run a recovery script.
If recovery script was successful, complete the system job.

#!/bin/sh
echo nothing | msub -l nodes=ALL,walltime=10:00,flags=NORMSTART:SYSTEMJOB:FRAGMENT,\
trig=atype=exec\&action=/tmp/diag.sh\ \$HOSTLIST\&etype=start\&sets=DIAG,\
trig=atype=exec\&action=/tmp/step.sh\ \$HOSTLIST\ \$DIAG\&etype=start\&requires=DIAG\&sets=good.\!bad,\
trig=atype=internal\&action=job:-:complete\&etype=start\&requires=good

18.6.10 Specifying an RSVPROFILE on an Internal Node Trigger

You can specify an RSVPROFILE on an internal node trigger reservation action.

NODECFG[DEFAULT] TRIGGER=etype=fail,atype=internal,action="node:-:reserve:rsvprofile=<rsvprofile_name>"

The RSVPROFILE you attach to the reservation can perform useful actions based on the event that just occurred. In the example above, the reservation might be configured so that Moab launches triggers to diagnose or troubleshoot the failed node when the reservation is created.

18.6.11 Greater scheduling flexibility with migration internal actions

Two new internal actions (for kicking off overcommit and green migrations) can be attached to triggers, which allow for greater scheduling flexibility.

TRIGGER=atype=internal,etype=<whatever>,action=sched:-:vmmigrate:[rsv|green|overcommit]

These three options still need to be enabled in moab.cfg and in moab.lic.

18.6.12 Virtual Container (VC) Trigger

Threshold triggers attach to virtual container (VC) objects within Moab. Resource managers provide job usage data via the WORKLOADQUERYURL interface; GMETRIC[keyword]=<FLOAT> is returned on each job line that has such tracking enabled. To generate a VC trigger, a user must generate the GMETRIC data and the ACTION_SCRIPT specified by the create trigger command example that follows. Moab simply provides a mechanism, not policy, regarding the trigger action performed.

All jobs within a VC (and all nested VCs and their jobs) have respective keyword values harvested (as threshold values) per iteration, and an average of the jobs is generated. This average usage is then used in the trigger expression to determine if the trigger needs to fire.

18.6.12.1 Configure VC trigger

To enable Moab to create and fire a VC trigger, do the following:

In the moab.cfg file, define the resource manager WORKLOADQUERYURL attribute. For example, you might specify a path (that is treated as an executable) or a text file in the format of file://<PATH>, such as the following:
```
RMCFG[machine]     WORKLOADQUERYURL=file://$HOME/scripts/qworkload.txt
```
Recycle Moab.
```
> mschedctl -R
```
Allocate preferences in the destination specified in the WORKLOADQUERYURL attribute. Each job listed contains associated metadata; be sure to specify the generic metric name and value. For example, if you specified file:// as the output type, you might create output such as the following:
```
Moab.1 COMMENT="SID=Moab?SJID=Moab.1?SRMJID=Moab.1" EXEC=/opt/moab/spool/moab.job.DEbZod GATTR=PREEMPTEE GNAME=company IWD=/opt/moab QUEUETIME=1317397677 STARTTIME=1317397677 STATE=Running TASKLIST=node10 TASKS=1 UNAME=root WCLIMIT=70000 GMETRIC[load]=22.0
```
Specify the action script triggered when the generic metric meets the allocated threshold.
The generic metric is derived from the average load of all jobs in the VC (and all nested VCs and their jobs).

18.6.12.2 Create and fire VC trigger

After performing the necessary configuration steps in the previous section, to create and fire a VC trigger, do the following:

Create a VC.
```
> mvcctl -c
```
Submit a job.
```
> msub . . .
```
Attach job to VC.
```
> mvcctl -a job:<NAME> <VCNAME>
```

You can submit a job and attach it to a VC in one step via the following command:

> msub -W x=vc=<VCNAME> . . .

Attach the trigger to the VC.

> mschedctl -c trigger AType=exec,Action="<ACTION FILE NAME>",etype="threshold,threshold=gmetric[<NAME>]{>|>=|<|<=|==|!=}<VALUE>" -o vc:<NAME>

See Trigger Components for more information on individual trigger components such as AType, Action, and Threshold.

18.6 Trigger Examples