When the mail option is used for the AType parameter, Moab will send mail (configurable using MAILPROGRAM) to the primary administrator. When AType=mail, the Action parameter contains the message body of the email. This can be configured to include certain variables. The following example sends an email to the primary administrator containing the name of the trigger's object, and the time:
>mrsvctl -c -h node01 -T AType=mail,EType=start \ Action="rsv $OID started on $TIME on nodes $HOSTLIST" rsv 'system.1' created
When this trigger launches, it will send an email with the variables (that begin with a $) filled in. Possible variables that can be inserted into the Action parameter include the following:
These mail triggers can be configured to launch for node failures, reservation creation or release, scheduler failures, and even job events. In this way, site administrators can keep track of scheduler events through email.
Triggers can be used to modify object internals using the following format:
action="<OBJECT_TYPE>:<OBJECT_ID>:<ACTION>:<CONTEXT_DATA>
If the action takes place on the same object to which the trigger is attached, a dash (-) can replace the object ID, as in the following example:
NODECFG[node01]TRIGGER=EType=start,AType=internal,action="node:-:reserve:rsvprofile=<rsvprofile_name>"
Several different actions are valid depending on what type of object the internal trigger is acting upon. The following table shows some of the different available actions:
Action | Object Type | Description |
---|---|---|
cancel | Jobs | Cancels job. |
complete | System Jobs | Causes system job to exit as if it had completed successfully. |
destroy | VPC | Destroys the specified VPC. |
evacvms | VM | Evacuates all VMs from all nodes covered by a reservation. |
modify | All | Allows modification of object internals. |
Examples — Modifying scheduler internals
SRCFG[provision] TRIGGER=atype=internal,etype=start,action="node:$(HOSTLIST):modify:os=rhel4"
$ msub -l qos=triggerok,walltime=60,trig=AType=internal\&Action=vpc:vpc.20:destroy\&EType=end testscript
RSVPROFILE[evac]TRIGGER=EType=start,AType=internal,action=node:$(HOSTLIST):evacvms
Threshold triggers allow sites to configure triggers to launch based on internal scheduler statistics, such as the percentage of available nodes, the backlog of a particular QoS, or the xfactor of a particular group. For example, a site may need to guarantee a particular account a certain level of service on the cluster. Should the specified account have a backlog of over one hour, the administrators would like to receive an email, or create a reservation, or perhaps contact a hosting utility to allocate more nodes. All of this can be done using threshold triggers. The following sample configuration file contains a few examples:
RMCFG[TORQUE] TRIGGER=atype=internal,action=RM:ODM:modify:nodes:1,etype=threshold,threshold=availability<300,multifire=true,failoffset=1:00 CLASSCFG[batch] TRIGGER=atype=mail,action="batch has a lot of jobs waiting",etype=threshold,threshold=queuetime>100000,multifire=true QOSCFG[high] TRIGGER=atype=exec,action=/tmp/reserve_high.sh,etype=threshold,threshold=backlog>1000,multifire=true,failoffset=10:00 USERCFG[jdoe] TRIGGER=atype=mail,action="high xfactor on $OID",etype=threshold,threshold=xfactor>10.0,multifire=true,failoffset=1:00 ACCTCFG[hyper] TRIGGER=atype=exec,action="/tmp/notify.sh $TIME $OID",etype=threshold,threshold=xfactor>0.01,multifire=true,failoffset=1:00 NODECFG[node04] TRIGGER=atype=exec,action="$HOME/hightemp.py $OID",etype=threshold,threshold=gmetric[TEMP]>70
Exec triggers will launch an external program or script when their dependencies are fulfilled. The following example will submit job.cmd and then execute monitor.pl three minutes after the job is started.
> msub -l trig=atype=exec\&etype=start\&offset=03:00\&action="monitor.pl" job.cmd
By default Moab considers any non-zero exit code as a failure and marks the trigger as having failed. If a trigger is killed by a signal outside of Moab, Moab treats the signal as the exit code and (in almost all cases) marks the trigger as having failed. Only exec triggers that exit with an exit code of 0 are marked as successful.
Copyright © 2012 Adaptive Computing Enterprises, Inc.®