(Click to open topic with navigation)
3.26.1 Synopsis
mjobctl -c -w attr=val
mjobctl -e jobid
mjobctl -h [User|System|Batch|Defer|All] jobexp
mjobctl -m attr{+=|=|-=}valjobexp
mjobctl -N [<SIGNO>] jobexp
mjobctl -n <JOBNAME>
mjobctl -p <PRIORITY> jobexp
mjobctl -q {diag|starttime|hostlist} jobexp
mjobctl -R jobexp [‑‑flags=force
mjobctl -w attr{+=|=|-=}valjobexp
mjobctl -x [-w flags=val] jobexp
3.26.2 Overview
3.26.3 Format
-c - Cancel | |
---|---|
Format | JOBEXP |
Description |
Cancel a job. Use -w (following a -c flag) to specify job cancellation according to given credentials or job attributes. See -c -w for more information.
You can use mjobctl -c flags=follow-dependency <job_id> to cancel all jobs that the <job_id> depends on. If you wish to cancel all jobs that depend on this <job_id>, add FLAGS=CANCELFAILEDDEPENDENCYJOBS to your SCHEDCFG entry in moab.cfg file. See CANCELFAILEDDEPENDENCYJOBS for more information. |
Example: |
> mjobctl -c job1045 Cancel job job1045. |
-c -w - Cancel Where | |
---|---|
Format | <ATTR>=<VALUE>
where <ATTR>=[ user | account | qos | class | reqreservation(RsvName) | state (JobState) | jobname(JobName, not job ID)] | partition |
Description |
Cancel a job based on a given credential or job
attribute. SeeJob States for a list of all valid job states. Also, you can cancel jobs from given partitions using -w partition=<PAR1>[<PAR2>...]]; however, you must also either use another -w flag to specify a job or use the standard job expression. |
Example |
> mjobctl -c -w state=USERHOLD Cancels all jobs that currently have a USERHOLD on them. > mjobctl -c -w user=user1 -w acct=acct1 Cancels all jobs assigned to user1 or acct1. |
-C - Checkpoint | |
---|---|
Format | JOBEXP |
Description | Checkpoint a job. See Checkpoint/Restart Facilities for more information. |
Example |
> mjobctl -C job1045 Checkpoint job job1045. |
-F - Force Cancel | |
---|---|
Format | JOBEXP |
Description |
Forces a job to cancel and ignores previous cancellation attempts. Specifying this option tells Moab to purge a job from Torque (equivalent to qdel -p). This only tells pbs_server to remove any knowledge of the job from its internal memory. If the job is actually running, this will not cause pbs_server to tell the nodes with the job to cancel it. Therefore, users and administrators should only use this form of mjobctl when they've confirmed that the job no longer exists on any compute nodes, and want to force Torque to stop tracking the job. |
Example |
> mjobctl -F job1045 Force cancel job job1045. |
-h - Hold | |
---|---|
Format | <HOLDTYPE><JOBEXP>
<HOLDTYPE> = { user | batch | system | defer | ALL } |
Default | user |
Description | Set or release a job hold See Job Holds for more information |
Example |
> mjobctl -h user job1045 Set a user hold on job job1045. > mjobctl -u all job1045 Unset all holds on job job1045. |
-m - Modify | |
---|---|
Format | <ATTR>{ += | =| -= } <VAL>
When using mjobctl -m with the hostlist attribute, only "=" is supported. If using Torque and mjobctl -m with the partition attribute, only "=" is supported. "+=", "-=", and "=" are supported with other resource managers (SLURM or Native). <ATTR>={ account | advres | arraylimit | awduration| class | cpuclock | deadline | depend | eeduration | env | features | feature | flags | gres | group | hold | hostlist | jobdisk | jobmem | jobname | jobswap | loglevel | maxmem | messages | minstarttime | nodeaccess | nodecount | notificationaddress | partition | priority | queue | qos | reqreservation | rmxstring | reqattr | reqawduration | sysprio | tpn | trig | trigvar | user | userprio | var | wclimit} |
Description |
Modify a specific job attribute. If an mjobctl -m attribute can affect how a job starts, then it generally cannot affect a job that is already running. For example, it is not feasible to change the hostlist of a job that is already running. The userprio attribute allows you to specify user priority. For job priority, use the '-p' flag. Modification of the job dependency is also communicated to the resource manager in the case of SLURM and PBS/Torque. Adding --flags=warnifcompleted causes a warning message to print when a job completes. To define values for awduration, eeduration, minstarttime (Note that the minstarttime attribute performs the same function as msub -a.), reqawduration, and wclimit, use the time spec format. A non-active job's partition list can be modified. If using Torque, only "=" (set) is supported. If using SLURM or a Native resource manager you can add or subtract partitions, even multiple partitions. When adding or subtracting multiple partitions, each partition must have its own -m partition{+= | = | -=}name on the command line. An example for adding multiple partitions is provided in the list of examples. To modify a job's generic resources, use the following format: gres{ += | = | -= } <gresName>[:<count>]. <gresName> is a single resource, not a list. <count> is an integer that, if not specified, is assumed to be 1. Modifying a job's generic resources causes Moab to append the new gres (+=), subtract the specified gres (-=), or clear out all existing generic resources attached to the job and override them with the newly-specified one (=). If <gresName> is an empty string, all generic resources will be removed from the job. To modify the node access policy for a queued job, use nodeaccess=[<policy>]. See 4.9 Node Access Policies for a listed of supported node access policies. |
Example |
> mjobctl -m messages+="Adding a message" --flags=completed 1664 Set the message on the job, even if the job is completed. > mjobctl -m reqawduration+=600 1664 Add 10 minutes to the job walltime. > mjobctl -m eeduration=-1 1664 Reset job's effective queue time, to when the job was submitted. > mjobctl -m var=Flag1=TRUE 1664 Set the job variable Flag1 to TRUE. > mjobctl -m notificationaddress="[email protected]" Sets the notification e-mail address associated with a job to [email protected]. > mjobctl -m partition+=p3 -m partition+=p4 Moab.5 Adds multiple partitions (p3 and p4) to job Moab.5. Torque only supports "=" . "+=", "-=", and "=" are supported with other resource managers (SLURM or Native). > mjobctl -m arraylimit=10 sim.25 Changes the concurrently running sub-job limit to 10 for array sim.25. > mjobctl -m gres=matlab:1 job0201 Overrides all generic resources applied to job job0201 and replaces them with 1 matlab. > mjobctl -m user=user.job Modifies the user of a job that was submitted directly to moab (msub) and has not yet been migrated. > mjobctl -m userprio-=100 Moab.4 Reduces the user priority of Moab.4 by 100. > mjobctl -m tpn=2 Moab.128 Changes the requested "tasks per node" for job Moab.128 to 2. > mjobctl -m maxmem=80mb 157 Modifies the total job memory of job 157. See MAXMEM for more information. |
-N - Notify | |
---|---|
Format | [signal=]<SIGID>JOBEXP |
Description | Send a signal to all jobs matching the job expression. |
Example |
> mjobctl -N INT 1664 Send an interrupt signal to job 1664. > mjobctl -N 47 1664 Send signal 47 to job 1664. |
-n - Name | |
---|---|
Format | |
Description | Select jobs by job name. |
Example |
-r - Resume | |
---|---|
Format | JOBEXP |
Description | Resume a job. |
Example |
> mjobctl -r job1045 Resume job job1045. |
-R - Requeue | |
---|---|
Format | JOBEXP [‑‑flags=force |
Description |
Requeue a job. Adding --flags=force forces an asynchronous requeue on Torque systems. Adding --flags=unmigrate causes Moab to pull a grid job back to the central scheduler for further evaluation on all valid partitions. |
Example |
> mjobctl -R job1045 Requeue job job1045. |
-s - Suspend | |
---|---|
Format | JOBEXP |
Description | Suspend a job. For more information, see Suspend/Resume Handling. |
Example |
> mjobctl -s job1045 Suspend job job1045. |
-u - Unhold | |
---|---|
Format | [<TYPE>[,<TYPE>]]JOBEXP <TYPE> = [ user | system | batch | defer | ALL ] |
Default | ALL |
Description | Release a hold on a job See Job Holds for more information. |
Example |
> mjobctl -u user,system scrib.1045 Release user and system holds on job scrib.1045. |
-x - Execute | |
---|---|
Format | JOBEXP |
Description | Execute a job. The -w option allows flags to be set for the job. Allowable flags are, ignorepolicies, ignorenodestate, and ignorersv. |
Example |
> mjobctl -x job1045 Execute job job1045. > mjobctl -x -w flags=ignorepolicies job1046 Execute job job1046 and ignore policies, such as MaxJobPerUser. |
3.26.4 Parameters
JOB EXPRESSION | |
---|---|
Format | <STRING> |
Description | The name of a job or a regular expression for several jobs. The flags that support job expressions can use node expression syntax as described in Node Selection. Using x: indicates the following string is to be interpreted as a regular expression, and using r: indicates the following string is to be interpreted as a range.
Job expressions do not work for array sub-jobs.
Moab uses regular expressions conforming to the POSIX 1003.2 standard. This standard is somewhat different than the regular expressions commonly used for filename matching in Unix environments (see man 7 regex). To interpret a job expression as a regular expression, use x:. In most cases, it is necessary to quote the job expression (for example, job13[5-9]) to prevent the shell from intercepting and interpreting the special characters. The mjobctl command accepts a comma delimited list of job expressions. Example usage might be mjobctl -r job[1-2],job4 or mjobctl -c job1,job2,job4. |
Example: |
> mjobctl -c "x:80.*" job '802' cancelled job '803' cancelled job '804' cancelled job '805' cancelled job '806' cancelled job '807' cancelled job '808' cancelled job '809' cancelled Cancel all jobs starting with 80. > mjobctl -m priority+=200 "x:74[3-5]" job '743' system priority modified job '744' system priority modified job '745' system priority modified > mjobctl -h x:17.* # This puts a hold on any job that has a 17 that is followed by an unlimited amount of any # character and includes jobs 1701, 17mjk10, and 17DjN_JW-07 > mjobctl -h r:1-17 # This puts a hold on jobs 1 through 17. |
mjobctl information can be reported as XML as well. This is done with the command mjobctl -q diag <JOB_ID>.
3.26.5.A XML Attributes
Name | Description |
---|---|
Account | The account assigned to the job |
AllocNodeList | The nodes allocated to the job |
Args | The job's executable arguments |
AWDuration | The active wall time consumed |
BlockReason | The block message index for the reason the job is not eligible |
Bypass | Number of times the job has been bypassed by other jobs |
Calendar | The job's timeframe constraint calendar |
Class | The class assigned to the job |
CmdFile | The command file path |
CompletionCode | The return code of the job as extracted from the RM |
CompletionTime | The time of the job's completion |
Cost | The cost of executing the job relative to an accounting manager |
CPULimit | The CPU limit for the job |
Depend | Any dependencies on the status of other jobs |
DRM | The master destination RM |
DRMJID | The master destination RM job ID |
EEDuration | The duration of time the job has been eligible for scheduling |
EFile | The stderr file |
Env | The job's environment variables set for execution |
EnvOverride | The job's overriding environment variables set for execution |
EState | The expected state of the job |
EstHistStartTime | The estimated historical start time |
EstPrioStartTime | The estimated priority start time |
EstRsvStartTime | The estimated reservation start time |
ExcHList | The excluded host list |
Flags | Command delimited list of Moab flags on the job |
GAttr | The requested generic attributes |
GJID | The global job ID |
Group | The group assigned to the job |
Hold | The hold list |
Holdtime | The time the job was put on hold |
HopCount | The hop count between the job's peers |
HostList | The requested host list |
IFlags | The internal flags for the job |
IsInteractive | If set, the job is interactive |
IsRestartable | If set, the job is restartable |
IsSuspendable | If set, the job is suspendable |
IWD | The directory where the job is executed |
JobID | The job's batch ID. |
JobName | The user-specified name for the job |
JobGroup | The job ID relative to its group |
LogLevel | The individual log level for the job |
MasterHost | The specified host to run primary tasks on |
Messages | Any messages reported by Moab regarding the job |
MinPreemptTime | The minimum amount of time the job must run before being eligible for preemption |
Notification | Any events generated to notify the job's user |
OFile | The stdout file |
OldMessages | Any messages reported by Moab in the old message style regarding the job |
OWCLimit | The original wallclock limit |
PAL | The partition access list relative to the job |
QueueStatus | The job's queue status as generated this iteration |
QOS | The QoS assigned to the job |
QOSReq | The requested QoS for the job |
ReqAWDuration | The requested active walltime duration |
ReqCMaxTime | The requested latest allowed completion time |
ReqMem | The total memory requested/dedicated to the job |
ReqNodes | The number of requested nodes for the job |
ReqProcs | The number of requested procs for the job |
ReqReservation | The required reservation for the job |
ReqRMType | The required RM type |
ReqSMinTime | The requested earliest start time |
RM | The master source resource manager |
RMXString | The resource manager extension string |
RsvAccess | The list of reservations accessible by the job |
RsvStartTime | The reservation start time |
RunPriority | The effective job priority |
Shell | The execution shell's output |
SID | The job's system ID (parent cluster) |
Size | The job's computational size |
STotCPU | The average CPU load tracked across all nodes |
SMaxCPU | The max CPU load tracked across all nodes |
STotMem | The average memory usage tracked across all nodes |
SMaxMem | The max memory usage tracked across all nodes |
SRMJID | The source RM's ID for the job |
StartCount | The number of the times the job has tried to start |
StartPriority | The effective job priority |
StartTime | The most recent time the job started executing |
State | The state of the job as reported by Moab |
StatMSUtl | The total number of memory seconds utilized |
StatPSDed | The total number of processor seconds dedicated to the job |
StatPSUtl | The total number of processor seconds utilized by the job |
StdErr | The path to the stderr file |
StdIn | The path to the stdin file |
StdOut | The path to the stdout file |
StepID | StepID of the job (used with LoadLeveler systems) |
SubmitHost | The host where the job was submitted |
SubmitLanguage | The RM language that the submission request was performed |
SubmitString | The string containing the entire submission request |
SubmissionTime | The time the job was submitted |
SuspendDuration | The amount of time the job has been suspended |
SysPrio | The admin specified job priority |
SysSMinTime | The system specified min. start time |
TaskMap | The allocation taskmap for the job |
TermTime | The time the job was terminated |
User | The user assigned to the job |
UserPrio | The user specified job priority |
UtlMem | The utilized memory of the job |
UtlProcs | The number of utilized processors by the job |
Variable | |
VWCTime | The virtual wallclock limit |
3.26.6 Examples
Example 3-27:
> mjobctl -q diag ALL --format=xml <Data><job AWDuration="346" Class="batch" CmdFile="jobsleep.sh" EEDuration="0" EState="Running" Flags="RESTARTABLE" Group="test" IWD="/home/test" JobID="11578" QOS="high" RMJID="11578.lolo.icluster.org" ReqAWDuration="00:10:00" ReqNodes="1" ReqProcs="1" StartCount="1" StartPriority="1" StartTime="1083861225" StatMSUtl="903.570" StatPSDed="364.610" StatPSUtl="364.610" State="Running" SubmissionTime="1083861225" SuspendDuration="0" SysPrio="0" SysSMinTime="00:00:00" User="test"><req AllocNodeList="hana" AllocPartition="access" ReqNodeFeature="[NONE]" ReqPartition="access"></req></job><job AWDuration="346" Class="batch" CmdFile="jobsleep.sh" EEDuration="0" EState="Running" Flags="RESTARTABLE" Group="test" IWD="/home/test" JobID="11579" QOS="high" RMJID="11579.lolo.icluster.org" ReqAWDuration="00:10:00" ReqNodes="1" ReqProcs="1" StartCount="1" StartPriority="1" StartTime="1083861225" StatMSUtl="602.380" StatPSDed="364.610" StatPSUtl="364.610" State="Running" SubmissionTime="1083861225" SuspendDuration="0" SysPrio="0" SysSMinTime="00:00:00" User="test"><req AllocNodeList="lolo" AllocPartition="access" ReqNodeFeature="[NONE]" ReqPartition="access"></req></job></Data>
Related Topics