Generic system jobs are system jobs with a trigger. They are useful for specifying steps in a workflow.
Generic system jobs are specified via a job template. The template can be selectable and you must use the GENERICSYSJOB attribute to let Moab know that this job template describes a generic system job. The trigger is specified normally as part of the job template, as shown in the following example:
JOBCFG[gen] GENERICSYSJOB=TRUE JOBCFG[gen] TRIGGER=EType=start,AType=exec,Action="$HOME/genericTrig.py",Timeout=5:00
The generic system job must only have one trigger. This trigger must have a timeout, an Atype=Exec, and the EType must equal "start". The timeout of the trigger will be used as the walltime for the job. The trigger will begin when the system job begins and the job will be considered completed when the trigger completes. The job will have the same completion code as the trigger. The walltime on the job template is not applicable in this case since the timeout of the trigger will be the walltime.
If the trigger fails, an error message will be attached to all of the job's parent VCs. You can view this in the --xml output of the VC query. The message includes the location of STDIN, STDOUT, and STDERR files. For example:
mvcctl -q ALL --xml
<Data>
<vc CREATETIME="1320184350" DESCRIPTION="Moab.1"
FLAGS="DESTROYOBJECTS,DESTROYWHENEMPTY,HASSTARTED,WORKFLOW"
JOBS="Moab.1" NAME="vc1" OWNER="user:frank">
<ACL aff="positive" cmp="%=" name="frank" type="USER"></ACL>
<MESSAGES>
<message COUNT="1" CTIME="1320184362"
DATA="Trigger 10 failed on job Moab.1.setup- STDIN:
/tmp/ByLLl2wv/spool/vm.py.ieWPPS5 STDOUT:
/tmp/ByLLl2wv/spool/vm.py.oDMIXAW STDERR /tmp/ByLLl2wv/spool/vm.py.e2jD5iN"
EXPIRETIME="1322776362" OWNER="frank" PRIORITY="0"
TYPE="other" index="0"></message>
</MESSAGES>
<Variables>
<Variable name="VMID">vm1</Variable>
<Variable name="HV">TRUE</Variable>
</Variables>
</vc>
</Data>
To create workflows, use the following format:
JOBCFG[gen] TEMPLATEDEPEND=AFTERANY:otherTemplate
This will create a job based on the template "otherTemplate". The generic job will run after the otherTemplate job has finished. "Afterany" in the example means after all other jobs have completed, regardless of success.
The INHERITRES flag can be used to cause the same resources in one step of a workflow to be passed to the next step:
JOBCFG[gen] TEMPLATEDEPEND=AFTERANY:otherTemplate JOBCFG[otherTemplate] INHERITRES=TRUE
This example forces the job based on "otherTemplate" to have the same resource requirements as its parent. When the "otherTemplate" job is finished, the INHERITRES flag will cause the parent to run on the same resources as the child.
The job that finishes first will pass its allocation up.
Any variables on the original job will be passed to the other jobs in the workflow. Variables can be added by other jobs in the workflow via the sets attribute in the generic system job's trigger. Other triggers must then request that variable name in the command line options.
You will need to set the carat (^) in order for the variable to be sent up to the job group. |
If you set the variable, you need to set it in the STDOUT of the trigger script. See the example below:
JOBCFG[W1] TRIGGER=...,action='$HOME/W1.py $ipaddress' TEMPLATEDEPEND=AFTER:W2 GENERICSYSJOB=TRUE JOBCFG[W2] TRIGGER=...,action='$HOME/W2.py',sets=^ipaddress
If a variable value is not set in STDOUT, it will be set to TRUE.
To set the variable to a specific value, the W2.py script must set the value in its STDOUT:
print "ipaddress=10.10.10.1" #This will be parsed by Moab and set as the value of the "ipaddress" variable
Example:
To create a VM with a workflow using job template dependencies and generic system jobs, use the following format:
#The job template that is "gate" to the workflow JOBCFG[CreateVMWithSoftware] TEMPLATEDEPEND=AFTEROK:InstallSoftware SELECT=TRUE JOBCFG[InstallSoftware] GENERICSYSJOB=TRUE JOBCFG[InstallSoftware] TRIGGER=EType=start,AType=exec,Action="$HOME/setupSoftware.py $IPAddr",Timeout=30:00 JOBCFG[InstallSoftware] INHERITRES=TRUE JOBCFG[InstallSoftware] TEMPLATEDEPEND=AFTEROK:CreateVM JOBCFG[CreateVM] GENERICSYSJOB=TRUE JOBCFG[CreateVM] INHERITRES=TRUE JOBCFG[CreateVM] TRIGGER=EType=start,AType=exec,Action=$HOME/installVM.py $HOSTLIST",Timeout=1:00:00,sets=^IPAddr
The user will then submit the job requesting what they need in the VM:
msub -1 walltime=2:00:00,template=CreateVMWithSoftware,nodes=1:ppn=4,mem=1024 ActualWorkload.py
The job will have the CreateVMWithSoftware template applied to it and will create the InstallSoftware job. The InstallSoftware job, because of INHERITRES, will have the same resource request (4 procs, 1GM of memory). This job then has its template applied to it which will do the same thing in creating the CreateVM job. The CreateVM job will then run, the trigger script will return the IP address of the new VM and pass its allocation up to the InstallSoftware job. The InstallSoftware job will use the IPAddr variable to find the VM and install the software. It will then return its resources up to the parent job, which will run the actual workload.
Copyright © 2012 Adaptive Computing Enterprises, Inc.®