Moab Workload Manager

mjobctl

(Moab Job Control)

Synopsis

mjobctl -c jobexp
mjobctl -c -w attr=val
mjobctl -C jobexp
mjobctl -h [User|System|Batch|Defer|All] jobexp
mjobctl -m attr{+=|=|-=}val jobexp
mjobctl -N [<SIGNO>] jobexp
mjobctl -n <JOBNAME>
mjobctl -p <PRIORITY> jobexp
mjobctl -q {diag|starttime|hostlist} jobexp
mjobctl -r jobexp
mjobctl -R jobexp
mjobctl -s jobexp 
mjobctl -u [User|System|Batch|Defer|All] jobexp
mjobctl -w attr{+=|=|-=}val jobexp
mjobctl -x [-w flags=val] jobexp

Overview

The mjobctl command controls various aspects of jobs. It is used to submit, cancel, execute, and checkpoint jobs. It can also display diagnostic information about each job. The mjobctl command enables the Moab administrator to control almost all aspects of job behavior. See 11.0 General Job Administration for more details on jobs and their attributes.

Format

-c — Cancel
JOBID
---
Cancel a job.

Note Use -w (following a -c flag) to specify job cancellation according to given credentials or job attributes. See -c -w for more information.

> mjobctl -c job1045
Cancel job job1045.
   
-c -w — Cancel Where
<ATTR>=<VALUE>

where <ATTR>=[ user | account | qos | class | reqreservation (RsvName) | state (JobState) | jobname (JobName, not job ID)] | partition
---
Cancel a job based on a given credential or job attribute.

Use -w following a -c flag to specify job cancellation according to credentials or job attributes. (See examples.)

Also, you can cancel jobs from given partitions using -w partition=<PAR1>[<PAR2>...]]; however, you must also either use another -w flag to specify a job or use the standard job expression.
> mjobctl -c -w state=USERHOLD
Cancels all jobs that currently have a USERHOLD on them.

> mjobctl -c -w user=user1 -w acct=acct1
Cancels all jobs assigned to user1 or acct1.
   
-C — Checkpoint
JOBID
---
Checkpoint a job
> mjobctl -C job1045
Checkpoint job job1045.
   
-h — Hold
<HOLDTYPE> <JOBEXP>
 
<HOLDTYPE> = { user | batch | system | defer | ALL }
user
Set or release a job hold

See Section 11.1, Job Holds for more information
> mjobctl -h user job1045
Set a user hold on job job1045.

> mjobctl -u all job1045
Unset all holds on job job1045.
   
-m — Modify
<ATTR>{ += | = | -= } <VAL>

  <ATTR>={ account | awduration | class | deadline | depend | eeduration | env | features | feature | flags | gres | group | hold | hostlist | jobdisk | jobmem | jobname | jobswap | loglevel | messages | minstarttime | nodecount | notificationaddress | partition | priority | proccount | queue | qos | reqreservation | rmxstring | reqawduration | sysprio | trig | trigvar | userprio | var | wclimit }
---
Modify a specific job attribute.

For priority, use the '-p' flag.

Modification of the job dependency is also communicated to the resource manager in the case of SLURM and PBS/Torque.

To define values for awduration, eeduration, minstarttime, reqawduration, and wclimit, use the time spec format.

A non-active job's partition list can be modified by adding or subtracting partitions. Note, though, that when adding or subtracting multiple partitions, each partition must have its own -m partition{+= | = | -=}name on the command line. (See example for adding multiple partitions.)
> mjobctl -m reqawduration+=600 1664
Add 10 minutes to the job walltime.

> mjobctl -m eeduration=-1 1664
Reset job's effective queue time.

> mjobctl -m var=Flag1=TRUE 1664
Set the job variable Flag1 to TRUE.

> mjobctl -m notificationaddress="name@server.com"
Sets the notification e-mail address associated with a job to name@server.com.

> mjobctl -m partition+=p3 -m partition+=p4 Moab.5
Adds multiple partitions (p3 and p4) to job Moab.5.
   
-N — Notify
[signal=]<SIGID> <JOBID>
---
Send a signal to all jobs matching the job expression.
> mjobctl -N INT 1664
Send an interrupt signal to job 1664.

> mjobctl -N 47 1664
Send signal 47 to job 1664.
   
-n — Name
 
---
Select jobs by job name.
 
   
-p — Priority
 
---
Modify a job's system priority.
> mjobctl -p +1000 job1045
Adds 1000 points to the max system priority, ensuring that this job will be higher priority than all normal jobs. The new priority of job1045 is 1000001000.

> mjobctl -p 1000 job1045 --flags=relative
Adds 1000 points to what the priority of the job would be from normal calculation. The new priority for job1045 is 1250.
   
-q — Query
[ diag | hostlist | starttime ] <JOBEXP>
---
Query a job.
> mjobctl -q diag job1045
Query job job1045.

> mjobctl -q starttime job1045
Query starttime of job job1045.

Note --flags=completed will only work with the diag option.

   
-r — Resume
JOBID
---
Resume a job.
> mjobctl -r job1045
Resume job job1045.
   
-R — Requeue
JOBID
---
Requeue a job.
> mjobctl -R job1045
Requeue job job1045.
   
-s — Suspend
JOBID
---
Suspend a job.
> mjobctl -s job1045
Suspend job job1045.
   
-S — Submit
JOBID
---
Submit a job.
> mjobctl -S job1045
Submit job job1045.
   
-u — Unhold
[<TYPE>[,<TYPE>]] <JOBEXP>
 
<TYPE> = [ user | system | batch | defer | ALL ]
ALL
Release a hold on a job

See Section 11.1, Job Holds for more information.
> mjobctl -u user,system scrib.1045
Release user and system holds on job scrib.1045.
   
-w — Where
[CompletionTime | StartTime][<= | = | >=]<EPOCH_TIME>
---
Add a where constraint clause to the current command. As it pertains to CompletionTime | StartTime, the where constraint only works for completed jobs. CompletionTime filters according to the completed jobs' completion times; StartTime filters according to the completed jobs' start times.
> mjobctl -q diag ALL --flags=COMPLETED --format=xml 
-w CompletionTime>=1246428000 -w CompletionTime<=1254376800
Prints all completed jobs still in memory that completed between July 1, 2009 and October 1, 2009.
   
-x — Execute
JOBID
---
Execute a job. The -w option allows flags to be set for the job. Allowable flags are, ignorepolicies, ignorenodestate, and ignorersv.

> mjobctl -x job1045
Execute job job1045.

> mjobctl -x -w flags=ignorepolicies job1046
Execute job job1046 and ignore policies, such as MaxJobPerUser.

Parameters

JOB EXPRESSION
<STRING>
---
The name of a job or a regular expression for several jobs.

Note Moab uses regular expressions conforming to the POSIX 1003.2 standard. This standard is somewhat different than the regular expressions commonly used for filename matching in Unix® environments (see 'man 7 regex'). To interpret a job expression as a regular expression, either specify the expression using a designated expression or wildcard character (one of '[]*?^$') or in the Moab configuration file (moab.cfg), set the parameter USEJOBREGEX to TRUE (and take note of the following caution).

Caution If you set USEJOBREGEX to TRUE, treat all mjobctl job expressions as regular expressions regardless of whether wildcards are specified. This should be used with extreme caution since there is high potential for unintended consequences. For example, specifying canceljob m.1 will not only cancel m.1, but also m.11,m.12,m13, and so on.

Note In most cases, it is necessary to quote the job expression (i.e. "job13[5-9]") to prevent the shell from intercepting and interpreting the special characters.

Note The mjobctl command accepts a comma delimited list of job expressions. Example usage might be mjobctl -c job[1-2],job4 or mjobctl -c job1,job2,job4.

> mjobctl -c "80.*"

job '802' cancelled
job '803' cancelled
job '804' cancelled
job '805' cancelled
job '806' cancelled
job '807' cancelled
job '808' cancelled
job '809' cancelled
Cancel all jobs starting with '80'.

> mjobctl -m priority+=200 "74[3-5]" 

job '743' system priority modified
job '744' system priority modified
job '745' system priority modified

XML Output

mjobctl information can be reported as XML as well. This is done with the command "mjobctl -q diag <JOB_ID>". In addition to the attributes listed below, mjobctl's XML children describe a job's requirements (req XML element) and messages (Messages XML element).

XML Attributes

Name Description
the account assigned to the job
the nodes allocated to the job
the job's executable arguments
the active wall time consumed
the block message index for the reason the job is not eligible
Number of times the job has been bypassed by other jobs
the job's timeframe constraint calendar
the class assigned to the job
the command file path
the return code of the job as extracted from the RM
the time of the job's completion
the cost of executing the job relative to an allocation manager
the CPU limit for the job
any dependencies on the status of other jobs
the master destination RM
the master destination RM job ID
the duration of time the job has been eligible for scheduling
the stderr file
the job's environment variables set for execution
the job's overriding environment variables set for execution
the expected state of the job
the estimated historical start time
the estimated priority start time
the estimated reservation start time
the estimated walltime the job will execute
the excluded host list
Command delimited list of Moab flags on the jo
the requested generic attributes
the global job ID
the group assigned to the job
the hold list
the time the job was put on hold
the hop count between the job's peers
the requested host list
the internal flags for the job
if set, the job is interactive
if set, the job is restartable
if set, the job is suspendable
the directory where the job is executed
the job's batch ID.
the user-specifed name for the job
the job ID relative to its group
the individual log level for the job
the specified host to run primary tasks on
any messages reported by Moab regarding the job
the minimum amount of time the job must run before being eligible for preemption
any events generated to notify the job's user
the stdout file
any messages reported by Moab in the old message style regarding the job
the original wallclock limit
the partition access list relative to the job
the job's queue status as generated this iteration
the QOS assigned to the job
the requested QOS for the job
the requested active walltime duration
the requested latest allowed completion time
the total memory requested/dedicated to the job
the number of requested nodes for the job
the number of requested procs for the job
the required reservation for the job
the required RM type
the requested earliest start time
the master source resource manager
the resource manager extension string
the list of reservations accessible by the job
the reservation start time
the effective job priority
the execution shell's output
the job's system ID (parent cluster)
the job's computational size
the average CPU load tracked across all nodes
the max CPU load tracked across all nodes
the average memory usage tracked across all nodes
the max memory usage tracked across all nodes
the source RM's ID for the job
the number of the times the job has tried to start
the effective job priority
the most recent time the job started executing
the state of the job as reported by Moab
the total number of memory seconds utilized
the total number of processor seconds dedicated to the job
the total number of processor seconds utilized by the job
the path to the stderr file
the path to the stdin file
the path to the stdout file
StepID of the job (used with LoadLeveler systems)
the host where the job was submitted
the RM langauge that the submission request was performed
the string containing the entire submission request
the time the job was submitted
the amount of time the job has been suspended
the admin specified job priority
the system specified min. start time
the allocation taskmap for the job
the time the job was terminated
the user assigned to the job
the user specified job priority
the utilized memory of the job
the number of utilized processors by the job
the virtual wallclock limit

Example 1

> mjobctl -q diag ALL --format=xml

<Data><job AWDuration="346" Class="batch" CmdFile="jobsleep.sh" EEDuration="0" 
EState="Running" Flags="RESTARTABLE" Group="test" IWD="/home/test" JobID="11578" QOS="high" 
RMJID="11578.lolo.icluster.org" ReqAWDuration="00:10:00" ReqNodes="1" ReqProcs="1" StartCount="1" 
StartPriority="1" StartTime="1083861225" StatMSUtl="903.570" StatPSDed="364.610" StatPSUtl="364.610" 
State="Running" SubmissionTime="1083861225" SuspendDuration="0" SysPrio="0" SysSMinTime="00:00:00" 
User="test"><req AllocNodeList="hana" AllocPartition="access" ReqNodeFeature="[NONE]" 
ReqPartition="access"></req></job><job AWDuration="346" Class="batch" CmdFile="jobsleep.sh" 
EEDuration="0" EState="Running" Flags="RESTARTABLE" Group="test" IWD="/home/test" JobID="11579" 
QOS="high" RMJID="11579.lolo.icluster.org" ReqAWDuration="00:10:00" ReqNodes="1" ReqProcs="1" 
StartCount="1" StartPriority="1" StartTime="1083861225" StatMSUtl="602.380" StatPSDed="364.610" 
StatPSUtl="364.610" State="Running" SubmissionTime="1083861225" SuspendDuration="0" SysPrio="0" 
SysSMinTime="00:00:00" User="test"><req AllocNodeList="lolo" AllocPartition="access" 
ReqNodeFeature="[NONE]" ReqPartition="access"></req></job></Data>

See Also