4.0 Scheduler Commands > Commands > mjobctl

Conventions

mjobctl

Synopsis

mjobctl -cjobexp

mjobctl -c -wattr=val

mjobctl -Cjobexp

mjobctl -e jobid

mjobctl -Fjobexp

mjobctl -h [User|System|Batch|Defer|All] jobexp

mjobctl -mattr{+=|=|-=}valjobexp

mjobctl -N [<SIGNO>] jobexp

mjobctl -n <JOBNAME>

mjobctl -p <PRIORITY> jobexp

mjobctl -q {diag|starttime|hostlist} jobexp

mjobctl -rjobexp

mjobctl -Rjobexp

mjobctl -s

mjobctl -wattr{+=|=|-=}valjobexp

mjobctl -x [-w flags=val] jobexp

Overview

The mjobctl command controls various aspects of jobs. It is used to submit, cancel, execute, and checkpoint jobs. It can also display diagnostic information about each job. The mjobctl command enables the Moab administrator to control almost all aspects of job behavior. See 11.0 General Job Administration for more details on jobs and their attributes.

Format

-c - Cancel
Format JOBEXP
Description Cancel a job.

Use -w (following a -c flag) to specify job cancellation according to given credentials or job attributes. See -c -w for more information.

Example:
> mjobctl -c job1045

Cancel job job1045.

-c -w - Cancel Where
Format <ATTR>=<VALUE>

where <ATTR>=[ user | account | qos | class | reqreservation(RsvName) | state (JobState) | jobname(JobName, not job ID)] | partition
Description

Cancel a job based on a given credential or job attribute.

Use -w following a -c flag to specify job cancellation according to credentials or job attributes. (See examples.)

See Job States for a list of all valid job states.

Also, you can cancel jobs from given partitions using -w partition=<PAR1>[<PAR2>...]]; however, you must also either use another -w flag to specify a job or use the standard job expression.

Example
> mjobctl -c -w state=USERHOLD

Cancels all jobs that currently have a USERHOLD on them.

> mjobctl -c -w user=user1 -w acct=acct1

Cancels all jobs assigned to user1 or acct1.

-C - Checkpoint
Format JOBEXP
Description Checkpoint a job. See Checkpoint/Restart Facilities for more information.
Example
> mjobctl -C job1045

Checkpoint job job1045.

-e - Rerun
Format JOBID
Description Rerun the completed TORQUE job. This works only for jobs that are completed and show up in TORQUE as completed. This flag does not work with other resource managers.
Example
> mjobctl -e job1045

Rerun job job1045.

-F - Force Cancel
Format JOBEXP
Description Forces a job to cancel and ignores previous cancellation attempts.
Example
> mjobctl -F job1045

Force cancel job job1045.

-h - Hold
Format <HOLDTYPE><JOBEXP>

<HOLDTYPE> = { user | batch | system | defer | ALL }
Default user
Description Set or release a job hold

See Job Holds for more information
Example
> mjobctl -h user job1045

Set a user hold on job job1045.

> mjobctl -u all job1045

Unset all holds on job job1045.

-m - Modify
Format <ATTR>{ += | =| -= } <VAL>

<ATTR>={ account | arraylimit | awduration| class | deadline | depend | eeduration | env | features | feature | flags | gres | group | hold | hostlist | jobdisk | jobmem | jobname | jobswap | loglevel | messages | minstarttime | nodecount | notificationaddress | partition | priority | queue | qos | reqreservation | rmxstring | reqawduration | sysprio | trig | trigvar | userprio | var | wclimit}
Description

Modify a specific job attribute.

If an mjobctl -m attribute can affect how a job starts, then it generally cannot affect a job that is already running. For example, it is not feasible to change the hostlist of a job that is already running.

The userprio attribute allows you to specify user priority. For job priority, use the '-p' flag.

Modification of the job dependency is also communicated to the resource manager in the case of SLURM and PBS/Torque.

Adding --flags=warnifcompleted causes a warning message to print when a job completes.

To define values for awduration, eeduration, minstarttime (Note that the minstarttime attribute performs the same function as msub -a.), reqawduration, and wclimit, use the time spec format.

A non-active job's partition list can be modified by adding or subtracting partitions. Note, though, that when adding or subtracting multiple partitions, each partition must have its own -m partition{+= | = | -=}name on the command line. (See example for adding multiple partitions.)

To modify a job's generic resources, use the following format: gres{ += | = | -= } <gresName>[:<count>]. <gresName> is a single resource, not a list. <count> is an integer that, if not specified, is assumed to be 1. Modifying a job's generic resources causes Moab to append the new gres (+=), subtract the specified gres (-=), or clear out all existing generic resources attached to the job and override them with the newly-specified one (=).

Example
> mjobctl -m reqawduration+=600 1664

Add 10 minutes to the job walltime.

> mjobctl -m eeduration=-1 1664

Reset job's effective queue time, to when the job was submitted.

> mjobctl -m var=Flag1=TRUE 1664

Set the job variable Flag1 to TRUE.

> mjobctl -m notificationaddress="[email protected]"

Sets the notification e-mail address associated with a job to [email protected].

> mjobctl -m partition+=p3 -m partition+=p4 Moab.5

Adds multiple partitions (p3 and p4) to job Moab.5.

> mjobctl -m arraylimit=10 sim.25

Changes the concurrently running sub-job limit to 10 for array sim.25.

> mjobctl -m gres=matlab:1 job0201

Overrides all generic resources applied to job job0201 and replaces them with 1 matlab.

> mjobctl -m userprio-=100 Moab.4

Reduces the user priority of Moab.4 by 100.

-N - Notify
Format [signal=]<SIGID>JOBEXP
Description Send a signal to all jobs matching the job expression.
Example
> mjobctl -N INT 1664

Send an interrupt signal to job 1664.

> mjobctl -N 47 1664

Send signal 47 to job 1664.

-n - Name
Format
Description Select jobs by job name.
Example
-p - Priority
Format [+|+=|-=]<VAL><JOBID> [--flags=relative]
Description Modify a job's system priority.
Example

Priority is the job priority plus the system priority. Each format affects the job and system priorities differently. Using the format <VAL><JOBID> or +<VAL><JOBID> will set the system priority to the maximum system priority plus the specified value. Using +=<VAL><JOBID> or <VAL><JOBID> --flags=relative will relatively increase the job's priority and set the system priority. Using the format -=<VAL>  <JOBID> sets the system priority to 0, and does not change priority based on <VAL> (it will not decrease priority by that number).

For the following example, job1045 has a priority of 10, which is composed of a job priority of 10 and a system priority of 0.

> mjobctl -p +1000 job1045

The system priority changes to the max system priority plus 1000 points, ensuring that this job will be higher priority than all normal jobs. In this case, the job priority of 10 is not added, so the priority of job1045 is now 1000001000.

> mjobctl -p -=1 job1045

The system priority of job1045 resets to 0. The job priority is still 10, so the  overall priority becomes 10.

> mjobctl -p 3 job1045 --flags=relative

Adds 3 points to the relative system priority. The priority for job1045 changes from 10 to 13.

-q - Query
Format [ diag( ALL)| hostlist | starttime| template] <JOBEXP>
Description Query a job.
Example
> mjobctl -q diag job1045

Query job job1045.

> mjobctl -q diag ALL --format=xml

Query all jobs and return the output in machine-readable XML.

> mjobctl -q starttime job1045

Query starttime of job job1045.

> mjobctl -q template <job>

Query job templates. If the <job> is set to ALL or empty, it will return information for all job templates.

> mjobctl -q wiki <jobName> 

Query a job with the output displayed in a WIKI string. The job's name may be replaced with ALL.

--flags=completed will only work with the diag option.

-r - Resume
Format JOBEXP
Description Resume a job.
Example
> mjobctl -r job1045

Resume jobjob1045.

-R - Requeue
Format JOBEXP
Description Requeue a job.
Example
> mjobctl -R job1045

Requeue job job1045.

-s - Suspend
Format JOBEXP
Description Suspend a job. For more information, see Suspend/Resume Handling.
Example
> mjobctl -s job1045

Suspend job job1045.

-u - Unhold
Format [<TYPE>[,<TYPE>]]JOBEXP

<TYPE> = [ user | system | batch | defer | ALL ]
Default ALL
Description Release a hold on a job

See Job Holds for more information.
Example
> mjobctl -u user,system scrib.1045

Release user and system holds on job scrib.1045.

-w - Where
Format [CompletionTime | StartTime][<= | = | >=]<EPOCH_TIME>
Description Add a where constraint clause to the current command. As it pertains to CompletionTime | StartTime, the where constraint only works for completed jobs. CompletionTime filters according to the completed jobs' completion times; StartTime filters according to the completed jobs' start times.
Example
> mjobctl -q diag ALL --flags=COMPLETED --format=xml 
-w CompletionTime>=1246428000 -w CompletionTime<=1254376800

Prints all completed jobs still in memory that completed between July 1, 2009 and October 1, 2009.

-x - Execute
Format JOBEXP
Description Execute a job. The -w option allows flags to be set for the job. Allowable flags are, ignorepolicies, ignorenodestate, and ignorersv.
Example
> mjobctl -x job1045

Execute job job1045.

> mjobctl -x -w flags=ignorepolicies job1046

Execute job job1046 and ignore policies, such as MaxJobPerUser.

Parameters

JOB EXPRESSION
Format <STRING>
Description The name of a job or a regular expression for several jobs. The flags that support job expressions can use node expression syntax as described in Node Selection. Using x: indicates the following string is to be interpreted as a regular expression, and using r: indicates the following string is to be interpreted as a range. Job expressions do not work for array sub-jobs.

Moab uses regular expressions conforming to the POSIX 1003.2 standard. This standard is somewhat different than the regular expressions commonly used for filename matching in Unix environments (see man 7 regex). To interpret a job expression as a regular expression, use x: or in the Moab configuration file (moab.cfg), set the parameter USEJOBREGEX to TRUE (and take note of the following caution).

If you set USEJOBREGEX to TRUE, Moab treats all mjobctl job expressions as regular expressions regardless of whether wildcards are specified. This should be used with extreme caution since there is high potential for unintended consequences. For example, specifying canceljob m.1 will not only cancel m.1, but also m.11,m.12,m13, and so on.

In most cases, it is necessary to quote the job expression (for example, job13[5-9]) to prevent the shell from intercepting and interpreting the special characters.

The mjobctl command accepts a comma delimited list of job expressions. Example usage might be mjobctl -r job[1-2],job4 or mjobctl -c job1,job2,job4.

Example:
> mjobctl -c "x:80.*"
job '802' cancelled
job '803' cancelled
job '804' cancelled
job '805' cancelled
job '806' cancelled
job '807' cancelled
job '808' cancelled
job '809' cancelled

Cancel all jobs starting with 80.

> mjobctl -m priority+=200 "x:74[3-5]" 
job '743' system priority modified
job '744' system priority modified
job '745' system priority modified
> mjobctl -h x:17.*
# This puts a hold on any job that has a 17 that is followed by an unlimited amount of any
# character and includes jobs 1701, 17mjk10, and 17DjN_JW-07

> mjobctl -h r:1-17
# This puts a hold on jobs 1 through 17.

XML Output

mjobctl information can be reported as XML as well. This is done with the command mjobctl -q diag <JOB_ID>.

XML Attributes

Name Description
Account The account assigned to the job
AllocNodeList The nodes allocated to the job
Args The job's executable arguments
AWDuration The active wall time consumed
BlockReason The block message index for the reason the job is not eligible
Bypass Number of times the job has been bypassed by other jobs
Calendar The job's timeframe constraint calendar
Class The class assigned to the job
CmdFile The command file path
CompletionCode The return code of the job as extracted from the RM
CompletionTime The time of the job's completion
Cost The cost of executing the job relative to an allocation manager
CPULimit The CPU limit for the job
Depend Any dependencies on the status of other jobs
DRM The master destination RM
DRMJID The master destination RM job ID
EEDuration The duration of time the job has been eligible for scheduling
EFile The stderr file
Env The job's environment variables set for execution
EnvOverride The job's overriding environment variables set for execution
EState The expected state of the job
EstHistStartTime The estimated historical start time
EstPrioStartTime The estimated priority start time
EstRsvStartTime The estimated reservation start time
EstWCTime The estimated walltime the job will execute
ExcHList The excluded host list
Flags Command delimited list of Moab flags on the job
GAttr The requested generic attributes
GJID The global job ID
Group The group assigned to the job
Hold The hold list
Holdtime The time the job was put on hold
HopCount The hop count between the job's peers
HostList The requested host list
IFlags The internal flags for the job
IsInteractive If set, the job is interactive
IsRestartable If set, the job is restartable
IsSuspendable If set, the job is suspendable
IWD The directory where the job is executed
JobID The job's batch ID.
JobName The user-specified name for the job
JobGroup The job ID relative to its group
LogLevel The individual log level for the job
MasterHost The specified host to run primary tasks on
Messages Any messages reported by Moab regarding the job
MinPreemptTime The minimum amount of time the job must run before being eligible for preemption
Notification Any events generated to notify the job's user
OFile The stdout file
OldMessages Any messages reported by Moab in the old message style regarding the job
OWCLimit The original wallclock limit
PAL The partition access list relative to the job
QueueStatus The job's queue status as generated this iteration
QOS The QoS assigned to the job
QOSReq The requested QoS for the job
ReqAWDuration The requested active walltime duration
ReqCMaxTime The requested latest allowed completion time
ReqMem The total memory requested/dedicated to the job
ReqNodes The number of requested nodes for the job
ReqProcs The number of requested procs for the job
ReqReservation The required reservation for the job
ReqRMType The required RM type
ReqSMinTime The requested earliest start time
RM The master source resource manager
RMXString The resource manager extension string
RsvAccess The list of reservations accessible by the job
RsvStartTime The reservation start time
RunPriority The effective job priority
Shell The execution shell's output
SID The job's system ID (parent cluster)
Size The job's computational size
STotCPU The average CPU load tracked across all nodes
SMaxCPU The max CPU load tracked across all nodes
STotMem The average memory usage tracked across all nodes
SMaxMem The max memory usage tracked across all nodes
SRMJID The source RM's ID for the job
StartCount The number of the times the job has tried to start
StartPriority The effective job priority
StartTime The most recent time the job started executing
State The state of the job as reported by Moab
StatMSUtl The total number of memory seconds utilized
StatPSDed The total number of processor seconds dedicated to the job
StatPSUtl The total number of processor seconds utilized by the job
StdErr The path to the stderr file
StdIn The path to the stdin file
StdOut The path to the stdout file
StepID StepID of the job (used with LoadLeveler systems)
SubmitHost The host where the job was submitted
SubmitLanguage The RM language that the submission request was performed
SubmitString The string containing the entire submission request
SubmissionTime The time the job was submitted
SuspendDuration The amount of time the job has been suspended
SysPrio The admin specified job priority
SysSMinTime The system specified min. start time
TaskMap The allocation taskmap for the job
TermTime The time the job was terminated
User The user assigned to the job
UserPrio The user specified job priority
UtlMem The utilized memory of the job
UtlProcs The number of utilized processors by the job
Variable
VWCTime The virtual wallclock limit

Examples

Example 4-25:  

> mjobctl -q diag ALL --format=xml
<Data><job AWDuration="346" Class="batch" CmdFile="jobsleep.sh" EEDuration="0" 
EState="Running" Flags="RESTARTABLE" Group="test" IWD="/home/test" JobID="11578" QOS="high" 
RMJID="11578.lolo.icluster.org" ReqAWDuration="00:10:00" ReqNodes="1" ReqProcs="1" StartCount="1" 
StartPriority="1" StartTime="1083861225" StatMSUtl="903.570" StatPSDed="364.610" StatPSUtl="364.610" 
State="Running" SubmissionTime="1083861225" SuspendDuration="0" SysPrio="0" SysSMinTime="00:00:00" 
User="test"><req AllocNodeList="hana" AllocPartition="access" ReqNodeFeature="[NONE]" 
ReqPartition="access"></req></job><job AWDuration="346" Class="batch" CmdFile="jobsleep.sh" 
EEDuration="0" EState="Running" Flags="RESTARTABLE" Group="test" IWD="/home/test" JobID="11579" 
QOS="high" RMJID="11579.lolo.icluster.org" ReqAWDuration="00:10:00" ReqNodes="1" ReqProcs="1" 
StartCount="1" StartPriority="1" StartTime="1083861225" StatMSUtl="602.380" StatPSDed="364.610" 
StatPSUtl="364.610" State="Running" SubmissionTime="1083861225" SuspendDuration="0" SysPrio="0" 
SysSMinTime="00:00:00" User="test"><req AllocNodeList="lolo" AllocPartition="access" 
ReqNodeFeature="[NONE]" ReqPartition="access"></req></job></Data>

Related topics