Synopsis
mjobctl -c jobexp mjobctl -c -w attr=val mjobctl -C jobexp mjobctl -e jobid mjobctl -h [User|System|Batch|Defer|All] jobexp mjobctl -m attr{+=|=|-=}val jobexp mjobctl -N [<SIGNO>] jobexp mjobctl -n <JOBNAME> mjobctl -p <PRIORITY> jobexp mjobctl -q {diag|starttime|hostlist} jobexp mjobctl -r jobexp mjobctl -R jobexp mjobctl -s mjobctl -w attr{+=|=|-=}val jobexp mjobctl -x [-w flags=val] jobexp
Overview
The mjobctl command controls various aspects of jobs. It is used to submit, cancel, execute, and checkpoint jobs. It can also display diagnostic information about each job. The mjobctl command enables the Moab administrator to control almost all aspects of job behavior. See 11.0 General Job Administration for more details on jobs and their attributes.Format
-c - Cancel | |||
Format: | JOBEXP | ||
Default: | --- | ||
Description: | Cancel a job.
| ||
Example: | > mjobctl -c job1045 | ||
-c -w - Cancel Where | |||
Format: | <ATTR>=<VALUE> where <ATTR>=[ user | account | qos | class | reqreservation(RsvName) | state (JobState) | jobname(JobName, not job ID)] | partition | ||
Default: | --- | ||
Description: | Cancel a job based on a given credential or job
attribute. Use -w following a -c flag to specify job cancellation according to credentials or job attributes. (See examples.) Also, you can cancel jobs from given partitions using -w partition=<PAR1>[<PAR2>...]]; however, you must also either use another -w flag to specify a job or use the standard job expression. | ||
Example: | > mjobctl -c -w state=USERHOLD > mjobctl -c -w user=user1 -w acct=acct1 | ||
-C - Checkpoint | |||
Format: | JOBEXP | ||
Default: | --- | ||
Description: | Checkpoint a job. See Checkpoint/Restart Facilities for more information. | ||
Example: | > mjobctl -C job1045 | ||
-e - Rerun | |||
Format: | JOBID | ||
Default: | --- | ||
Description: | Rerun the completed TORQUE job. This works only for jobs that are completed and show up in TORQUE as completed. This flag does not work with other resource managers. | ||
Example: | > mjobctl -e job1045 | ||
-F - Force Cancel | |||
Format: | JOBEXP | ||
Default: | --- | ||
Description: | Forces a job to cancel and ignores previous cancellation attempts. | ||
Example: | > mjobctl -F job1045 | ||
-h - Hold | |||
Format: | <HOLDTYPE> <JOBEXP> <HOLDTYPE> = { user | batch | system | defer | ALL } | ||
Default: | user | ||
Description: | Set or release a job hold See Job Holds for more information | ||
Example: | > mjobctl -h user job1045 > mjobctl -u all job1045 | ||
-m - Modify | |||
Format: | <ATTR>{ += | =| -= } <VAL> <ATTR>={ account | arraylimit | awduration| class | deadline | depend | eeduration | env | features | feature | flags | gres | group | hold | hostlist | jobdisk | jobmem | jobname | jobswap | loglevel | messages| minstarttime | nodecount | notificationaddress | partition | priority | queue | qos | reqreservation | rmxstring | reqawduration | sysprio | trig | trigvar | userprio | var | wclimit} | ||
Default: | --- | ||
Description: | Modify a specific job attribute. Adding --flags=warnifcompleted causes a warning message to print
when a job completes. To modify a job's generic resources, use the following format: | ||
Example: | > mjobctl -m reqawduration+=600 1664 > mjobctl -m eeduration=-1 1664 > mjobctl -m var=Flag1=TRUE 1664 > mjobctl -m notificationaddress="[email protected]" > mjobctl -m partition+=p3 -m partition+=p4 Moab.5 Adds multiple partitions (p3 and p4) to job Moab.5. > mjobctl -m arraylimit=10 sim.25 Changes the concurrently running sub-job limit to 10 for array sim.25. > mjobctl -m gres=matlab:1 job0201 Overrides all generic resources applied to job | ||
-N - Notify | |||
Format: | [signal=]<SIGID> JOBEXP | ||
Default: | --- | ||
Description: | Send a signal to all jobs matching the job expression. | ||
Example: | > mjobctl -N INT 1664 > mjobctl -N 47 1664 | ||
-n - Name | |||
Format: | |||
Default: | --- | ||
Description: | Select jobs by job name. | ||
Example: | |||
-p - Priority | |||
Format: | [+|+=|-=]<VAL> <JOBID> [--flags=relative] | ||
Default: | --- | ||
Description: | Modify a job's system priority. | ||
Example: | Priority is the job priority plus the system priority. Each format affects the job and system priorities differently. Using the format '<VAL> <JOBID>' or '+<VAL> <JOBID>' will set the system priority to the maximum system priority plus the specified value. Using '+=<VAL> <JOBID>' or '<VAL> <JOBID> --flags=relative' will relatively increase the job's priority and set the system priority. Using the format '-=<VAL> <JOBID>' sets the system priority to 0, and does not change priority based on <VAL> (it will not decrease priority by that number). For the following example, > mjobctl -p +1000 job1045 The system priority changes to the max system priority plus 1000 points, ensuring that this job will be higher priority than all normal jobs. In this case, the job priority of 10 is not added, so the priority of > mjobctl -p -=1 job1045 The system priority of > mjobctl -p 3 job1045 --flags=relative job1045 changes from 10 to 13. | ||
-q - Query | |||
Format: | [ diag( ALL ( --orange))| hostlist | starttime| template] <JOBEXP> | ||
Default: | --- | ||
Description: | Query a job. | ||
Example: | > mjobctl -q diag job1045 Query job job1045. > mjobctl -q diag ALL --format=xml --orange=105001-110000 Query all jobs in range 105001-110000 and return the output in machine-readable XML. > mjobctl -q starttime job1045 Query starttime of job job1045. > mjobctl -q template <job> Query job templates. If the <job> is set to ALL or empty, it will return information for all job templates. > mjobctl -q wiki <jobName> Query a job with the output displayed in a WIKI string. The job's name may be replaced with ALL.
| ||
-r - Resume | |||
Format: | JOBEXP | ||
Default: | --- | ||
Description: | Resume a job. | ||
Example: | > mjobctl -r job1045 | ||
-R - Requeue | |||
Format: | JOBEXP | ||
Default: | --- | ||
Description: | Requeue a job.
| ||
Example: | > mjobctl -R job1045 | ||
-s - Suspend | |||
Format: | JOBEXP | ||
Default: | --- | ||
Description: | Suspend a job. For more information, see Suspend/Resume Handling. | ||
Example: | > mjobctl -s job1045 | ||
-u - Unhold | |||
Format: | [<TYPE>[,<TYPE>]] JOBEXP <TYPE> = [ user | system | batch | defer | ALL ] | ||
Default: | ALL | ||
Description: | Release a hold on a job See Section 11.1, Job Holds for more information. | ||
Example: | > mjobctl -u user,system scrib.1045 | ||
-w - Where | |||
Format: | [CompletionTime | StartTime][<= | = | >=]<EPOCH_TIME> | ||
Default: | --- | ||
Description: | Add a where constraint clause to the current command. As it pertains to CompletionTime | StartTime, the where constraint only works for completed jobs. CompletionTime filters according to the completed jobs' completion times; StartTime filters according to the completed jobs' start times. | ||
Example: | > mjobctl -q diag ALL --flags=COMPLETED --format=xml | ||
-x - Execute | |||
Format: | JOBEXP | ||
Default: | --- | ||
Description: | Execute a job. The -w option allows flags to be set for the job. Allowable flags are, ignorepolicies, ignorenodestate, and ignorersv. | ||
Example: | > mjobctl -x job1045 > mjobctl -x -w flags=ignorepolicies job1046 |
Parameters
JOB EXPRESSION | |||||||||
Format: | <STRING> | ||||||||
Default: | --- | ||||||||
Description: | The name of a job or a regular expression for several jobs. The flags that support job expressions can use node expression syntax as described in Node Selection. Using "x:" indicates the following string is to be interpreted as a regular expression, and using "r:" indicates the following string is to be interpreted as a range.
Job expressions do not work for array sub-jobs.
| ||||||||
Example: | > mjobctl -c "x:80.*" job '802' cancelled job '803' cancelled job '804' cancelled job '805' cancelled job '806' cancelled job '807' cancelled job '808' cancelled job '809' cancelled > mjobctl -m priority+=200 "x:74[3-5]" job '743' system priority modified job '744' system priority modified job '745' system priority modified > mjobctl -h x:17.* # This puts a hold on any job that has a 17 that is followed by an unlimited amount of any # character and includes jobs 1701, 17mjk10, and 17DjN_JW-07 > mjobctl -h r:1-17 # This puts a hold on jobs 1 through 17. |
XML Attributes
Name | Description |
---|---|
Account | the account assigned to the job |
AllocNodeList | the nodes allocated to the job |
Args | the job's executable arguments |
AWDuration | the active wall time consumed |
BlockReason | the block message index for the reason the job is not eligible |
Bypass | Number of times the job has been bypassed by other jobs |
Calendar | the job's timeframe constraint calendar |
Class | the class assigned to the job |
CmdFile | the command file path |
CompletionCode | the return code of the job as extracted from the RM |
CompletionTime | the time of the job's completion |
Cost | the cost of executing the job relative to an allocation manager |
CPULimit | the CPU limit for the job |
Depend | any dependencies on the status of other jobs |
DRM | the master destination RM |
DRMJID | the master destination RM job ID |
EEDuration | the duration of time the job has been eligible for scheduling |
EFile | the stderr file |
Env | the job's environment variables set for execution |
EnvOverride | the job's overriding environment variables set for execution |
EState | the expected state of the job |
EstHistStartTime | the estimated historical start time |
EstPrioStartTime | the estimated priority start time |
EstRsvStartTime | the estimated reservation start time |
EstWCTime | the estimated walltime the job will execute |
ExcHList | the excluded host list |
Flags | Command delimited list of Moab flags on the jo |
GAttr | the requested generic attributes |
GJID | the global job ID |
Group | the group assigned to the job |
Hold | the hold list |
Holdtime | the time the job was put on hold |
HopCount | the hop count between the job's peers |
HostList | the requested host list |
IFlags | the internal flags for the job |
IsInteractive | if set, the job is interactive |
IsRestartable | if set, the job is restartable |
IsSuspendable | if set, the job is suspendable |
IWD | the directory where the job is executed |
JobID | the job's batch ID. |
JobName | the user-specifed name for the job |
JobGroup | the job ID relative to its group |
LogLevel | the individual log level for the job |
MasterHost | the specified host to run primary tasks on |
Messages | any messages reported by Moab regarding the job |
MinPreemptTime | the minimum amount of time the job must run before being eligible for preemption |
Notification | any events generated to notify the job's user |
OFile | the stdout file |
OldMessages | any messages reported by Moab in the old message style regarding the job |
OWCLimit | the original wallclock limit |
PAL | the partition access list relative to the job |
QueueStatus | the job's queue status as generated this iteration |
QOS | the QOS assigned to the job |
QOSReq | the requested QOS for the job |
ReqAWDuration | the requested active walltime duration |
ReqCMaxTime | the requested latest allowed completion time |
ReqMem | the total memory requested/dedicated to the job |
ReqNodes | the number of requested nodes for the job |
ReqProcs | the number of requested procs for the job |
ReqReservation | the required reservation for the job |
ReqRMType | the required RM type |
ReqSMinTime | the requested earliest start time |
RM | the master source resource manager |
RMXString | the resource manager extension string |
RsvAccess | the list of reservations accessible by the job |
RsvStartTime | the reservation start time |
RunPriority | the effective job priority |
Shell | the execution shell's output |
SID | the job's system ID (parent cluster) |
Size | the job's computational size |
STotCPU | the average CPU load tracked across all nodes |
SMaxCPU | the max CPU load tracked across all nodes |
STotMem | the average memory usage tracked across all nodes |
SMaxMem | the max memory usage tracked across all nodes |
SRMJID | the source RM's ID for the job |
StartCount | the number of the times the job has tried to start |
StartPriority | the effective job priority |
StartTime | the most recent time the job started executing |
State | the state of the job as reported by Moab |
StatMSUtl | the total number of memory seconds utilized |
StatPSDed | the total number of processor seconds dedicated to the job |
StatPSUtl | the total number of processor seconds utilized by the job |
StdErr | the path to the stderr file |
StdIn | the path to the stdin file |
StdOut | the path to the stdout file |
StepID | StepID of the job (used with LoadLeveler systems) |
SubmitHost | the host where the job was submitted |
SubmitLanguage | the RM langauge that the submission request was performed |
SubmitString | the string containing the entire submission request |
SubmissionTime | the time the job was submitted |
SuspendDuration | the amount of time the job has been suspended |
SysPrio | the admin specified job priority |
SysSMinTime | the system specified min. start time |
TaskMap | the allocation taskmap for the job |
TermTime | the time the job was terminated |
User | the user assigned to the job |
UserPrio | the user specified job priority |
UtlMem | the utilized memory of the job |
UtlProcs | the number of utilized processors by the job |
Variable | |
VWCTime | the virtual wallclock limit |
Example 1
> mjobctl -q diag ALL --format=xml <Data><job AWDuration="346" Class="batch" CmdFile="jobsleep.sh" EEDuration="0" EState="Running" Flags="RESTARTABLE" Group="test" IWD="/home/test" JobID="11578" QOS="high" RMJID="11578.lolo.icluster.org" ReqAWDuration="00:10:00" ReqNodes="1" ReqProcs="1" StartCount="1" StartPriority="1" StartTime="1083861225" StatMSUtl="903.570" StatPSDed="364.610" StatPSUtl="364.610" State="Running" SubmissionTime="1083861225" SuspendDuration="0" SysPrio="0" SysSMinTime="00:00:00" User="test"><req AllocNodeList="hana" AllocPartition="access" ReqNodeFeature="[NONE]" ReqPartition="access"></req></job><job AWDuration="346" Class="batch" CmdFile="jobsleep.sh" EEDuration="0" EState="Running" Flags="RESTARTABLE" Group="test" IWD="/home/test" JobID="11579" QOS="high" RMJID="11579.lolo.icluster.org" ReqAWDuration="00:10:00" ReqNodes="1" ReqProcs="1" StartCount="1" StartPriority="1" StartTime="1083861225" StatMSUtl="602.380" StatPSDed="364.610" StatPSUtl="364.610" State="Running" SubmissionTime="1083861225" SuspendDuration="0" SysPrio="0" SysSMinTime="00:00:00" User="test"><req AllocNodeList="lolo" AllocPartition="access" ReqNodeFeature="[NONE]" ReqPartition="access"></req></job></Data>
See Also
Copyright © 2012 Adaptive Computing Enterprises, Inc.®