You are here: 3 Scheduler Commands > Commands > checkjob

3.6 checkjob

3.6.1 Synopsis

checkjob [exact:jobid] [jobname:jobname] [-l policylevel] [-n nodeid] [-q qosid] [-r reservationid] [-v] [--flags=future | complete] [--blocking] jobid

3.6.2 Overview

checkjob displays detailed job state information and diagnostic output for a specified job. Detailed information is available for queued, blocked, active, and recently completed jobs. The checkjob command shows the master job of an array as well as a summary of array sub-jobs, but does not display all sub-jobs. Use checkjob -v to display all job-array sub-jobs.

3.6.3 Access

This command can be run by level 1-3 Moab administrators for any job. Also, end users can use checkjob to view the status of their own jobs.

3.6.4 Options

--blocking
Format --blocking
Description Do not use cache information in the output. The --blocking flag retrieves results exclusively from the scheduler.
Example
> checkjob -v --blocking 1234

Display real time data about job 1234.

--flags
Format --flags=future | complete
Description
  • future – Evaluates future eligibility of job (ignore current resource state and usage limitations).
  • complete – Queries details for jobs that have already terminated.
Example
> checkjob -v --flags=future 6235

Display reasons why idle job is blocked ignoring node state and current node utilization constraints.

exact
Format exact:<JOBID>
Description Searches for and returns the exact job ID
Example
> checkjob exact:1.job_dependency1 
jobname
Format jobname:<JOBNAME>
Description Searches for and returns the first job with the matching <JOBNAME>.
Example
> checkjob jobname:STEP4
-l (Policy level)
Format <POLICYLEVEL>

HARD, SOFT, or OFF
Description Reports job start eligibility subject to specified throttling policy level.
Example
> checkjob -l SOFT 6235
> checkjob -l HARD 6235
-n (NodeID)
Format <NODEID>
Description Checks job access to specified node and preemption status with regards to jobs located on that node.
Example
> checkjob -n node113 6235
-q (QoS)
Format <QOSID>
Description Checks job access to specified QoS <QOSID>.
Example
> checkjob -q special 6235
-r (Reservation)
Format <RSVID>
Description Checks job access to specified reservation <RSVID>.
Example:
> checkjob -r orion.1 6235
-v (Verbose)
Description

Sets verbose mode. If the job is part of an array, the -v option shows pertinent array information before the job-specific information (see Example 2 and Example 3 for differences between standard output and -v output).

Specifying the double verbose (-v -v) displays additional information about the job. See the Output table for details.

Example
> checkjob -v 6235

3.6.5 Details

This command allows any Moab administrator to check the detailed status and resource requirements of an active, queued, or recently completed job. Additionally, this command performs numerous diagnostic checks and determines if and where the job could potentially run. Diagnostic checks include policy violations, reservation constraints, preemption status, and job to resource mapping. If a job cannot run, a text reason is provided along with a summary of how many nodes are and are not available. If the -v flag is specified, a node by node summary of resource availability will be displayed for idle jobs.

3.6.5.A Job Eligibility

If a job cannot run, a text reason is provided along with a summary of how many nodes are and are not available. If the -v flag is specified, a node by node summary of resource availability will be displayed for idle jobs. For job level eligibility issues, one of the following reasons will be given:

Reason Description
job has hold in place one or more job holds are currently in place
insufficient idle procs there are currently not adequate processor resources available to start the job
idle procs do not meet requirements adequate idle processors are available but these do not meet job requirements
start date not reached job has specified a minimum start date which is still in the future
expected state is not idle job is in an unexpected state
state is not idle job is not in the idle state
dependency is not met job depends on another job reaching a certain state
rejected by policy job start is prevented by a throttling policy

If a job cannot run on a particular node, one of the following 'per node' reasons will be given:

Reason Description
Class Node does not allow required job class/queue
CPU Node does not possess required processors
Disk Node does not possess required local disk
Features Node does not possess required node features
Memory Node does not possess required real memory
Network Node does not possess required network interface
State Node is not Idle or Running

3.6.5.B Reservation Access

The -r flag can be used to provide detailed information about job access to a specific reservation

3.6.5.C Preemption Status

If a job is marked as a preemptor and the -v and -n flags are specified, checkjob will perform a job by job analysis for all jobs on the specified node to determine if they can be preempted.

3.6.6 Output

The checkjob command displays the following job attributes:

Attribute Value Description
Account <STRING> Name of account associated with job
Actual Run Time [[[DD:]HH:]MM:]SS

Length of time job actually ran.

This info is only displayed in simulation mode.

Allocated Nodes Square bracket delimited list of node and processor ids List of nodes and processors allocated to job
Applied Nodeset** <STRING> Nodeset used for job's node allocation
Arch <STRING> Node architecture required by job
Attr square bracket delimited list of job attributes Job Attributes (i.e. [BACKFILL][PREEMPTEE])
Available Memory** <INTEGER> The available memory requested by job. Moab displays the relative or exact value by returning a comparison symbol (>, <, >=, <=, or ==) with the value (i.e. Available Memory <= 2048).
Available Swap** <INTEGER> The available swap requested by job. Moab displays the relative or exact value by returning a comparison symbol (>, <, >=, <=, or ==) with the value (i.e. Available Swap >= 1024).
Average Utilized Procs* <FLOAT> Average load balance for a job
Avg Util Resources Per Task* <FLOAT>
BecameEligible <TIMESTAMP> The date and time when the job moved from Blocked to Eligible.
Bypass <INTEGER> Number of times a lower priority job with a later submit time ran before the job
CheckpointStartTime** [[[DD:]HH:]MM:]SS The time the job was first checkpointed
Class [<CLASS NAME> <CLASS COUNT>] Name of class/queue required by job and number of class initiators required per task.
Dedicated Resources Per Task* Space-delimited list of <STRING>:<INTEGER> Resources dedicated to a job on a per-task basis
Disk <INTEGER> Amount of local disk required by job (in MB)
Estimated Walltime [[[DD:]HH:]MM:]SS

The scheduler's estimated walltime.

In simulation mode, it is the actual walltime.

EnvVariables** Comma-delimited list of <STRING> List of environment variables assigned to job
Exec Size* <INTEGER> Size of job executable (in MB)
Executable <STRING> Name of command to run
Features Square bracket delimited list of <STRING>s Node features required by job
Flags
Group <STRING> Name of UNIX group associated with job
Holds Zero or more of User, System, and Batch Types of job holds currently applied to job
Image Size <INTEGER> Size of job data (in MB)
IWD (Initial Working Directory) <DIR> Directory to run the executable in
Job Messages** <STRING> Messages attached to a job
Job Submission** <STRING> Job script submitted to RM
Memory <INTEGER> Amount of real memory required per node (in MB)
Max Util Resources Per Task* <FLOAT>
NodeAccess*
Nodecount <INTEGER> Number of nodes required by job
Opsys <STRING> Node operating system required by job
Partition Mask ALL or colon delimited list of partitions List of partitions the job has access to
PE <FLOAT> Number of processor-equivalents requested by job
Per Partition Priority** Tabular Table showing job template priority for each partition
Priority Analysis** Tabular Table showing how job's priority was calculated:
Job PRIORITY* Cred( User:Group:Class) Serv(QTime)
QOS <STRING> Quality of Service associated with job
Reservation <RSVID> ( <TIME1> - <TIME2> Duration: <TIME3>) RESID specifies the reservation id, TIME1 is the relative start time, TIME2 the relative end time, TIME3 the duration of the reservation
Req [<INTEGER>] TaskCount: <INTEGER> Partition: <partition> A job requirement for a single type of resource followed by the number of tasks instances required and the appropriate partition
StageIn <SOURCE>%<DESTINATION> The <SOURCE> is the username, hostname, directory and file name of origin for the file(s) that Moab will stage in for this job. The <DESTINATION> is the username, hostname, directory and file name where Moab will place the file during this job. See About Data Staging for more information.
StageInSize <INTEGER><UNIT> The size of the file Moab will stage in for this job. <UNIT> can be KB, MB, GB, or TB. See About Data Staging for more information.
StageOut <SOURCE>%<DESTINATION> The <SOURCE> is the username, hostname, directory and file name of origin for the file(s) that Moab will stage out for this job. The <DESTINATION> is the username, hostname, directory and file name where Moab will place the file during this job. See About Data Staging for more information.
StageOutSize <INTEGER><UNIT> The size of the file Moab will stage out for this job. <UNIT> can be KB, MB, GB, or TB. See About Data Staging for more information.
StartCount <INTEGER> Number of times job has been started by Moab
StartPriority <INTEGER> Start priority of job
StartTime <TIME> Time job was started by the resource management system
State One of Idle, Starting, Running, etc. See Job States for all possible values. Current Job State
SubmitTime <TIME> Time job was submitted to resource management system
Swap <INTEGER> Amount of swap disk required by job (in MB)
Task Distribution* Square bracket delimited list of nodes
Time Queued
Total Requested Nodes** <INTEGER> Number of nodes the job requested
Total Requested Tasks <INTEGER> Number of tasks requested by job
User <STRING> Name of user submitting job
Utilized Resources Per Task* <FLOAT>
WallTime [[[DD:]HH:]MM:]SS of [[[DD:]HH:]MM:]SS Length of time job has been running out of the specified limit

In the above table, fields marked with an asterisk (*) are only displayed when set or when the -v flag is specified. Fields marked with two asterisks (**) are only displayed when set or when the -v -v flag is specified.

Example 3-1: checkjob 717

> checkjob 717
job 717
State: Idle
Creds:  user:jacksond  group:jacksond  class:batch
WallTime: 00:00:00 of 00:01:40
SubmitTime: Mon Aug 15 20:49:41
  (Time Queued  Total: 3:12:23:13  Eligible: 3:12:23:11)
TerminationDate:   INFINITY  Sat Oct 24 06:26:40
Total Tasks: 1
Req[0]  TaskCount: 1  Partition: ALL
Network: ---  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: ---  Arch: ---  Features: ---

IWD:            /home/jacksond/moab/moab-4.2.3
Executable:     STDIN
Flags:          RESTARTABLE,NORMSTART
StartPriority:  5063
Reservation '717' (  INFINITY ->   INFINITY  Duration: 00:01:40)
Note:  job cannot run in partition base (idle procs do not meet requirements : 0 of 1 procs found)
idle procs:   4  feasible procs:   0
Rejection Reasons: [State        :    3][ReserveTime  :    1]
cannot select job 717 for partition GM (partition GM does not support requested class batch)

The example job cannot be started for two different reasons.

  • It is temporarily blocked from partition base because of node state and node reservation conflicts.
  • It is permanently blocked from partition GM because the requested class batch is not supported in that partition.

Example 3-2: Using checkjob (no -v) on a job array master job:

checkjob array.1
job array.1

AName: array 
Job Array Info: 
  Name: array.1 

Sub-jobs:         10 
  Active:          6 ( 60.0%)
  Eligible:        2 ( 20.0%)
  Blocked:         2 ( 20.0%) 
  Complete:        0 (  0.0%)

Example 3-3: Using checkjob -v on a job array master job:

$ checkjob -v array.1
job array.1

AName: array 
Job Array Info: 
  Name: array.1 
  1 : array.1.1 : Running 
  2 : array.1.2 : Running 
  3 : array.1.3 : Running 
  4 : array.1.4 : Running 
  5 : array.1.5 : Running 
  6 : array.1.6 : Running 
  7 : array.1.7 : Idle 
  8 : array.1.8 : Idle 
  9 : array.1.9 : Blocked
  10 : array.1.10 : Blocked

Sub-jobs:         10 
  Active:          6 ( 60.0%)
  Eligible:        2 ( 20.0%)
  Blocked:         2 ( 20.0%) 
  Complete:        0 (  0.0%)

Example 3-4: Using checkjob -v on a data staging job

$ checkjob -v moab.14.dsin
job moab.14.dsin
 
AName: moab.14.dsin
State: Running
Creds:  user:fred  group:company
WallTime:   00:00:00 of 00:01:01
SubmitTime: Wed Apr 16 10:07:19
 (Time Queued  Total: 00:00:00  Eligible: 00:00:00)
 
StartTime: Wed Apr 16 10:07:19
TemplateSets:  dsin
Triggers: [email protected]:[email protected]/opt/moab/tools/datastaging/ds_move_rsync --stagein:FALSE
Total Requested Tasks: 1
 
Req[0]  TaskCount: 1  Partition: SHARED
Dedicated Resources Per Task: bandwidth: 1
NodeAccess: SHARED
 
Allocated Nodes:
[GLOBAL:1]
 
Job Group:  moab.14
SystemID:   moab
SystemJID:  moab.14.dsin
Task Distribution: GLOBAL
IWD:            $HOME/test/datastaging
SubmitDir:      $HOME/test/datastaging
StartCount:     1
Parent VCs:     vc11
User Specified Partition List:   local
Partition List: local
SrcRM:          internal
Flags:          NORMSTART,GRESONLY,TEMPLATESAPPLIED
Attr:           dsin
StageInSize:    386MB
StageOutSize:   100MB
StageIn:        [email protected]:/home/fred/input1/%[email protected]:/home/fred/input1/
StageIn:        [email protected]:/home/fred/input2/%[email protected]:/home/fred/input2/
StageIn:        [email protected]:/home/fred/input3/%[email protected]:/home/fred/input3/
StageOut:       [email protected]:/home/fred/output/%[email protected]:/home/fred/output/
StartPriority:  1
  SJob Type:             datastaging
  Completion Policy:     datastaging
PE:             0.00
Reservation 'moab.14.dsin' (-00:00:06 -> 00:00:55  Duration: 00:01:01)

Related Topics 


© 2016 Adaptive Computing