checkjob

Synopsis

checkjob [-l policylevel]
         [-n nodeid]
         [-q qosid]
         [-r reservationid]
         [-v] [--flags=future] jobid

Overview

checkjob displays detailed job state information and diagnostic output for a specified job. Detailed information is available for queued, blocked, active, and recently completed jobs. The checkjob command shows the master job of an array as well as a summary of array sub-jobs, but does not display all sub-jobs. Use checkjob -v to display all job-array sub-jobs.

Access

This command can be run by level 1-3 Moab administrators for any job. Also, end users can use checkjob to view the status of their own jobs.

Arguments

  
--flags
--flags=future
---
Evaluates future eligibility of job (ignore current resource state and usage limitations).
> checkjob -v --flags=future 6235

Display reasons why idle job is blocked ignoring node state and current node utilization constraints.

  
-l (Policy level)
<POLICYLEVEL>

HARD, SOFT, or OFF
---
Reports job start eligibility subject to specified throttling policy level.
> checkjob -l SOFT 6235
> checkjob -l HARD 6235
  
-n (NodeID)
<NODEID>
---
Checks job access to specified node and preemption status with regards to jobs located on that node.
> checkjob -n node113 6235
  
-q (QoS)
<QOSID>
---
Checks job access to specified QoS <QOSID>.
> checkjob -q special 6235
  
-r (Reservation)
<RSVID>
---
Checks job access to specified reservation <RSVID>.
> checkjob -r orion.1 6235
  
-v (Verbose)
N/A
Sets verbose mode. If the job is part of an array, the -v option shows pertinent array information before the job-specific information (see Example 2 and Example 3 for differences between standard output and -v output).
> checkjob -v 6235

Details

This command allows any Moab administrator to check the detailed status and resource requirements of a active, queued, or recently completed job. Additionally, this command performs numerous diagnostic checks and determines if and where the job could potentially run. Diagnostic checks include policy violations, reservation constraints, preemption status, and job to resource mapping. If a job cannot run, a text reason is provided along with a summary of how many nodes are and are not available. If the -v flag is specified, a node by node summary of resource availability will be displayed for idle jobs.

Job Eligibility

If a job cannot run, a text reason is provided along with a summary of how many nodes are and are not available. If the -v flag is specified, a node by node summary of resource availability will be displayed for idle jobs. For job level eligibility issues, one of the following reasons will be given:

ReasonDescription
one or more job holds are currently in place
there are currently not adequate processor resources available to start the job
adequate idle processors are available but these do not meet job requirements
job has specified a minimum start date which is still in the future
job is in an unexpected state
job is not in the idle state
job depends on another job reaching a certain state
job start is prevented by a throttling policy

If a job cannot run on a particular node, one of the following 'per node' reasons will be given:

Node does not allow required job class/queue
Node does not possess required processors
Node does not possess required local disk
Node does not possess required node features
Node does not possess required real memory
Node does not possess required network interface
Node is not Idle or Running

Reservation Access

The -r flag can be used to provide detailed information about job access to a specific reservation

Preemption Status

If a job is marked as a preemptor and the -v and -n flags are specified, checkjob will perform a job by job analysis for all jobs on the specified node to determine if they can be preempted.

Output

The checkjob command displays the following job attributes:
AttributeValueDescription
<STRING>Name of account associated with job
[[[DD:]HH:]MM:]SSLength of time job actually ran.
NoteThis info is only displayed in simulation mode.
Square bracket delimited list of node and processor idsList of nodes and processors allocated to job
<STRING>Node architecture required by job
square bracket delimited list of job attributesJob Attributes (i.e. [BACKFILL][PREEMPTEE])
<FLOAT>Average load balance for a job
<FLOAT>
<TIMESTAMP>The date and time when the job moved from Blocked to Eligible.
<INTEGER>Number of times a lower priority job with a later submit time ran before the job
[<CLASS NAME> <CLASS COUNT>]Name of class/queue required by job and number of class initiators required per task.
<INTEGER>
<INTEGER>Amount of local disk required by job (in MB)
[[[DD:]HH:]MM:]SSThe scheduler's estimated walltime.
NoteIn simulation mode, it is the actual walltime.
<INTEGER>Size of job executable (in MB)
<STRING>Name of command to run
Square bracket delimited list of <STRING>sNode features required by job
<STRING>Name of UNIX group associated with job
Zero or more of User, System, and BatchTypes of job holds currently applied to job
<INTEGER>Size of job data (in MB)
<DIR>Directory to run the executable in
<INTEGER>Amount of real memory required per node (in MB)
<FLOAT>
<INTEGERNumber of nodes required by job
<STRINGNode operating system required by job
ALL or colon delimited list of partitionsList of partitions the job has access to
<FLOAT>Number of processor-equivalents requested by job
<STRING>Quality of Service associated with job
<RSVID ( <TIME1 - <TIME2> Duration: <TIME3>)RESID specifies the reservation id, TIME1 is the relative start time, TIME2 the relative end time, TIME3 the duration of the reservation
[<INTEGER>] TaskCount: <INTEGER> Partition: <partition>A job requirement for a single type of resource followed by the number of tasks instances required and the appropriate partition
<INTEGER>Number of times job has been started by Moab
<INTEGER>Start priority of job
<TIME>Time job was started by the resource management system
One of Idle, Starting, Running, etcCurrent Job State
<TIME>Time job was submitted to resource management system
<INTEGER>Amount of swap disk required by job (in MB)
Square bracket delimited list of nodes
<INTEGER>Number of nodes requested by job
<INTEGER>Number of tasks requested by job
<STRING>Name of user submitting job
<FLOAT>
[[[DD:]HH:]MM:]SS of [[[DD:]HH:]MM:]SSLength of time job has been running out of the specified limit
In the above table, fields marked with an asterisk (*) are only displayed when set or when the -v flag is specified.

Example 1

checkjob 717
> checkjob 717
job 717
State: Idle
Creds:  user:jacksond  group:jacksond  class:batch
WallTime: 00:00:00 of 00:01:40
SubmitTime: Mon Aug 15 20:49:41
  (Time Queued  Total: 3:12:23:13  Eligible: 3:12:23:11)
TerminationDate:   INFINITY  Sat Oct 24 06:26:40
Total Tasks: 1
Req[0]  TaskCount: 1  Partition: ALL
Network: ---  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: ---  Arch: ---  Features: ---

IWD:            /home/jacksond/moab/moab-4.2.3
Executable:     STDIN
Flags:          RESTARTABLE,NORMSTART
StartPriority:  5063
Reservation '717' (  INFINITY ->   INFINITY  Duration: 00:01:40)
Note:  job cannot run in partition base (idle procs do not meet requirements : 0 of 1 procs found)
idle procs:   4  feasible procs:   0
Rejection Reasons: [State        :    3][ReserveTime  :    1]
cannot select job 717 for partition GM (partition GM does not support requested class batch)
NoteThe example job cannot be started for two different reasons.
  • It is temporarily blocked from partition base because of node state and node reservation conflicts.
  • It is permanently blocked from partition GM because the requested class batch is not supported in that partition.

Example 2

Using checkjob (no -v) on a job array master job:

checkjob array.1
job array.1

AName: array 
Job Array Info: 
  Name: array.1 

Sub-jobs:         10 
  Active:          6 ( 60.0%)
  Eligible:        2 ( 20.0%)
  Blocked:         2 ( 20.0%) 
  Complete:        0 (  0.0%)

Example 3

Using checkjob -v on a job array master job:

$ checkjob -v array.1
job array.1

AName: array 
Job Array Info: 
  Name: array.1 
  1 : array.1.1 : Running 
  2 : array.1.2 : Running 
  3 : array.1.3 : Running 
  4 : array.1.4 : Running 
  5 : array.1.5 : Running 
  6 : array.1.6 : Running 
  7 : array.1.7 : Idle 
  8 : array.1.8 : Idle 
  9 : array.1.9 : Blocked
  10 : array.1.10 : Blocked

Sub-jobs:         10 
  Active:          6 ( 60.0%)
  Eligible:        2 ( 20.0%)
  Blocked:         2 ( 20.0%) 
  Complete:        0 (  0.0%)

See Also


Copyright © 2012 Adaptive Computing Enterprises, Inc.®