Moab Workload Manager

Appendix W: Wiki Interface Specification, version 1.2

W.1.1 Commands
- W.1.1.1 Resource Query
  - W.1.1.1.1 Query Resources Request Format
  - W.1.1.1.2 Query Resources Response Format
  - W.1.1.1.3 Query Resources Example
  - W.1.1.1.4 Query Resources Data Format
- W.1.1.2 Workload Query
  - W.1.1.2.1 Query Workload Request Format
  - W.1.1.2.2 Query Workload Response Format
  - W.1.1.2.3 Query Workload Example
  - W.1.1.2.4 Query Workload Data Format
- W.1.1.3 Start Job
- W.1.1.4 Cancel Job
- W.1.1.5 Suspend Job
- W.1.1.6 Resume Job
- W.1.1.7 Requeue Job
- W.1.1.8 Signal Job
- W.1.1.9 Modify Job
- W.1.1.10 JobAddTask
- W.1.1.11 JobRemoveTask
W.1.2 Rejection Codes

W.1.1 COMMANDS

All commands are requested via a socket interface, one command per socket connection. All fields and values are specified in ASCII text. Moab is configured to communicate via the wiki interface by specifying the following parameters in the moab.cfg file:

moab.cfg

RMCFG[base] TYPE=WIKI SERVER=<HOSTNAME>[:<PORT>]
...

Field values must backslash escape the following characters if specified:

'#' ';' ':' (i.e. '\#')

Supported Commands are:

W.1.1.1 Wiki Query Resources

W.1.1.1.1 Wiki Query Resources Request Format

CMD=GETNODES ARG={<UPDATETIME>:<NODEID>[:<NODEID>]... | <UPDATETIME>:ALL}

Only nodes updated more recently than <UPDATETIME> will be returned where <UPDATETIME> is specified as the epoch time of interest. Setting <UPDATETIME> to '0' will return information for all nodes. Specify a colon delimited list of NODEID's if specific nodes are desired or use the keyword 'ALL' to receive information for all nodes.

W.1.1.1.2 Query Resources Response Format

The query resources response format is one or more line of the following format (separated with a newline, " "):

<NODEID> <ATTR>=<VALUE>[;<ATTR>=<VALUE>]...

<ATTR> is one of the names in the table below and the format of <VALUE> is dependent on <ATTR>.

W.1.1.1.3 Wiki Query Resources Example

request:

wiki resource query

CMD=GETNODES ARG=0:node001:node002:node003

response:

wiki resource query response

node001 UPDATETIME=963004212;STATE=Busy;OS=AIX43;ARCH=RS6000...
node002 UPDATETIME=963004213;STATE=Busy;OS=AIX43;ARCH=RS6000...
...

W.1.1.1.4 Wiki Query Resources Data Format

NAME	FORMAT	DEFAULT	DESCRIPTION
ACLASS	one or more bracket enclosed <NAME>:<COUNT> pairs (ie, [batch:5][sge:3])	---	run classes currently available on node. If not specified, scheduler will attempt to determine actual ACLASS value.
ADISK	<INTEGER>	0	available local disk on node (in MB)
AFS	<fs id="X" size="X" io="Y" rcount="X" wcount="X" ocount="X"></fs>[...]	0	available filesystem state
AMEMORY	<INTEGER>	0	available/free RAM on node (in MB)
ANET	one or more colon delimited <STRING>'s (ie, ETHER:ATM)	---	Available network interfaces on node. Available interfaces are those which are 'up' and not already dedicated to a job.
APROC	<INTEGER>	1	available processors on node
ARCH	<STRING>	---	compute architecture of node
ARES	one or more comma delimited <NAME>:<VALUE> pairs (ie, MATLAB:6,COMPILER:100)	---	Arbitrary consumable resources currently available on the node
ASWAP	<INTEGER>	0	available swap on node (in MB)
CCLASS	one or more bracket enclosed <NAME>:<COUNT> pairs (ie, [batch:5][sge:3])	---	Run classes supported by node. Typically, one class is 'consumed' per task. Thus, an 8 processor node may have 8 instances of each class it supports present, ie [batch:8][interactive:8]
CDISK	<INTEGER>	0	configured local disk on node (in MB)
CFS	<STRING>	0	configured filesystem state
CMEMORY	<INTEGER>	0	configured RAM on node (in MB)
CNET	one or more colon delimited <STRING>'s (ie, ETHER:FDDI:ATM)	---	configured network interfaces on node
CPROC	<INTEGER>	1	configured processors on node
CPULOAD	<DOUBLE>	0.0	one minute BSD load average
CRES	one or more comma delimited <NAME>:<VALUE> pairs (ie, MATLAB:6,COMPILER:100)	---	Arbitrary consumable resources supported and tracked on the node, ie software licenses or tape drives.
CSWAP	<INTEGER>	0	configured swap on node (in MB)
CURRENTTASK	<INTEGER>	0	Number of tasks currently active on the node
EVENT	<STRING>	---	Event or exception which occurred on the node
FEATURE	one or more colon delimited <STRING>'s (ie, WIDE:HSM)	---	generic attributes, often describing hardware or software features, associated with the node.
GCOUNTER	<INTEGER>	---	current total number of gevent event occurrences since epoch. This value should be monotonically increasing.
GEVENT	GEVENT[<EVENTNAME>]=<STRING>	---	generic event occurrence and context data.
GMETRIC	GMETRIC[<METRICNAME>]=<DOUBLE>	---	current value of generic metric, i.e., 'GMETRIC[temp]=103.5'.
IDLETIME	<INTEGER>	---	number of seconds since last detected keyboard or mouse activity (often used with desktop harvesting)
MAXTASK	<INTEGER>	<CPROC>	Maximum number of tasks allowed on the node at any given time
OS	<STRING>	---	operating system running on node
OSLIST	<STRING>	---	operating systems accepted by node
OTHER	<ATTR>=<VALUE>[,<ATTR>=<VALUE>]...	---	opaque node attributes assigned to node
PARTITION	<STRING>	DEFAULT	partition to which node belongs
RACK	<INTEGER>	0	Rack location of the node
SLOT	<INTEGER>	0	Slot location of the node
SPEED	<DOUBLE>	1.0	Relative processor speed of the node
STATE*	one of the following: Idle, Running, Busy, Unknown, Drained, Draining, or Down	Down	state of the node
UPDATETIME*	<EPOCHTIME>	0	time node information was last updated
VARIABLE*	<ATTR>=<VAL>	---	generic variables to be associated with node

* indicates required field

Note: node states have the following definitions:

Busy: Node is running some jobs and will not accept additional jobs
Down: Resource Manager problems have been detected. Node is incapable of running jobs.
Draining: Node is responding but will not accept new jobs
Idle: Node is ready to run jobs but currently is not running any.
Running: Node is running some jobs and will accept additional jobs
Unknown: Node is capable of running jobs but the scheduler will need to determine if the node state is actually Idle, Running, or Busy.

W.1.1.2 Wiki Query Workload

W.1.1.2.1 Wiki Query Workload Request Format

CMD=GETJOBS ARG={<UPDATETIME>:<JOBID>[:<JOBID>]... | <UPDATETIME>:ALL }

Only jobs updated more recently than <UPDATETIME> will be returned where <UPDATETIME> is specified as the epoch time of interest. Setting <UPDATETIME> to '0' will return information for all jobs. Specify a colon delimited list of JOBID's if information for specific jobs is desired or use the keyword 'ALL' to receive information about all jobs.

W.1.1.2.2 Wiki Query Workload Response Format

SC=<STATUSCODE> ARG=<JOBCOUNT>#<JOBID>:<FIELD>=<VALUE>;[<FIELD>=<VALUE>;]...[#<JOBID>:<FIELD>=<VALUE>;[<FIELD>=<VALUE>;]...]...

SC=<STATUSCODE> RESPONSE=<RESPONSE>

FIELD is either the text name listed below or 'A<FIELDNUM>'
(ie, 'UPDATETIME' or 'A2')

STATUSCODE values:

0 SUCCESS
-1 INTERNAL ERROR

RESPONSE is a statuscode sensitive message describing error or state details

W.1.1.2.3 Wiki Query Workload Example

request syntax

CMD=GETJOBS ARG=0:ALL

response syntax

ARG=2#nebo3001.0:UPDATETIME=9780000320;STATE=Idle;WCLIMIT=3600;...

W.1.1.2.4 Wiki Query Workload Data Format

NAME	FORMAT	DEFAULT	DESCRIPTION
ACCOUNT	<STRING>	---	AccountID associated with job
ALLOCSIZE	<INTEGER>	---	number of application tasks to allocate at each allocation adjustment.
APPBACKLOG	<DOUBLE>	---	backlogged quantity of workload for associated application (units are opaque), value may be compared against TARGETBACKLOG
APPLOAD	<DOUBLE>	---	load of workload for associated application (units are opaque), value may be compared against TARGETLOAD
APPRESPONSETIME	<DOUBLE>	---	response time of workload for associated application (units are opaque), value may be compared against TARGETRESPONSETIME
APPTHROUGHPUT	<DOUBLE>	---	throughput of workload for associated application (units are opaque), value may be compared against TARGETTHROUGHPUT
ARGS	<STRING>	---	job command-line arguments
COMMENT	<STRING>	0	job resource manager extension arguments including qos, dependencies, reservation constraints, etc
COMPLETETIME*	<EPOCHTIME>	0	time job completed execution
DDISK	<INTEGER>	0	quantity of local disk space (in MB) which must be dedicated to each task of the job
DGRES	name:value[,name:value]	---	Dedicated generic resources per task.
DPROCS	<INTEGER>	1	number of processors dedicated per task
DNETWORK	<STRING>	---	network adapter which must be dedicated to job
DSWAP	<INTEGER>	0	quantity of virtual memory (swap, in MB) which must be dedicated to each task of the job
ENDDATE	<EPOCHTIME>	[ANY]	time by which job must complete
ENV	<STRING>	---	job environment variables
EVENT	<EVENT>	---	event or exception experienced by job
ERROR	<STRING>	---	file to contain STDERR
EXEC	<STRING>	---	job executable command
EXITCODE	<INTEGER>	---	job exit code
FLAGS	<STRING>	---	job flags
GEOMETRY	<STRING>	---	String describing task geometry required by job
GNAME*	<STRING>	---	GroupID under which job will run
HOSTLIST	comma or colon delimited list of hostnames - suffix the hostlist with a carat (^) to mean superset; suffix with an asterisk (*) to mean subset; otherwise, the hostlist is interpreted as an exact set	[ANY]	list of required hosts on which job must run. (see TASKLIST)
INPUT	<STRING>	---	file containing STDIN
IWD	<STRING>	---	job's initial working directory
NAME	<STRING>	---	User specified name of job
NODERANGE	<INTEGER>[,<INTEGER>]	---	Minimum and maximum nodes allowed to be allocated to job.
NODES	<INTEGER>	1	Number of nodes required by job (See Node Definition for more info)
OUTPUT	<STRING>	---	file to contain STDOUT
PARTITIONMASK	one or more colon delimited <STRING>s	[ANY]	list of partitions in which job can run
PREF	colon delimited list of <STRING>s	---	List of preferred node features or variables. (See PREF for more information.)
PRIORITY	<INTEGER>	---	system priority (absolute or relative - use '+' and '-' to specify relative)
QOS	<INTEGER>	0	quality of service requested
QUEUETIME*	<EPOCHTIME>	0	time job was submitted to resource manager
RARCH	<STRING>	---	architecture required by job
RCLASS	list of bracket enclosed <STRING>:<INTEGER> pairs	---	list of <CLASSNAME>:<COUNT> pairs indicating type and number of class instances required per task. (ie, '[batch:1]' or '[batch:2][tape:1]')
RDISK	<INTEGER>	0	local disk space (in MB) required to be configured on nodes allocated to the job
RDISKCMP	one of '>=', '>', '==', '<', or '<='	>=	local disk comparison (ie, node must have > 2048 MB local disk)
REJCODE	<INTEGER>	0	reason job was rejected
REJCOUNT	<INTEGER>	0	number of times job was rejected
REJMESSAGE	<STRING>	---	text description of reason job was rejected
REQRSV	<STRING>	---	Name of reservation in which job must run
RESACCESS	<STRING>	---	List of reservations in which job can run
RFEATURES	colon delimited list <STRING>'s	---	List of features required on nodes
RMEM	<INTEGER>	0	real memory (RAM, in MB) required to be configured on nodes allocated to the job
RMEMCMP	one of '>=', '>', '==', '<', or '<='	>=	real memory comparison (ie, node must have >= 512MB RAM)
RNETWORK	<STRING>	---	network adapter required by job
ROPSYS	<STRING>	---	operating system required by job
RSOFTWARE	<RESTYPE>[{+\|:}<COUNT>][@<TIMEFRAME>]	---	software required by job
RSWAP	<INTEGER>	0	virtual memory (swap, in MB) required to be configured on nodes allocated to the job
RSWAPCMP	one of '>=', '>', '==', '<', or '<='	>=	virtual memory comparison (ie, node must have ==4096 MB virtual memory)
SID	<STRING>	---	system id (global job system owner)
SJID	<STRING>	---	system job id (global job id)
STARTDATE	<EPOCHTIME>	0	earliest time job should be allowed to start
STARTTIME*	<EPOCHTIME>	0	time job was started by the resource manager
STATE*	one of Idle, Running, Hold, Suspended, Completed, or Removed	Idle	State of job
SUSPENDTIME	<INTEGER>	0	Number of seconds job has been suspended
TARGETBACKLOG	<DOUBLE>[,<DOUBLE>]	---	Minimum and maximum backlog for application within job.
TARGETLOAD	<DOUBLE>[,<DOUBLE>]	---	Minimum and maximum load for application within job.
TARGETRESPONSETIME	<DOUBLE>[,<DOUBLE>]	---	Minimum and maximum response time for application within job.
TARGETTHROUGHPUT	<DOUBLE>[,<DOUBLE>]	---	Minimum and maximum throughput for application within job.
TARGETVIOLATIONTIME	<ALLOCATIONTIME>[,<DEALLOCATIONTIME>] where values are specified using the format [[[DD:]HH:]MM:]SS	---	By default, Moab allocates/deallocates resources as soon as a performance target violation is detected.
TASKLIST	one or more comma-delimited <STRING>'s	---	list of allocated tasks, or in other words, comma-delimited list of node ID's associated with each active task of job (i.e., cl01, cl02, cl01, cl02, cl03) The tasklist is initially selected by the scheduler at the time the StartJob command is issued. The resource manager is then responsible for starting the job on these nodes and maintaining this task distribution information throughout the life of the job. (see HOSTLIST)
TASKS*	<INTEGER>	1	Number of tasks required by job (See Task Definition for more info)
TASKPERNODE	<INTEGER>	0	exact number of tasks required per node
UNAME*	<STRING>	---	UserID under which job will run
UPDATETIME*	<EPOCHTIME>	0	Time job was last updated
WCLIMIT*	[[HH:]MM:]SS	864000	walltime required by job

* indicates required field

Note: Job states have the following definitions:

Completed: Job has completed
Hold: Job is in the queue but is not allowed to run
Idle: Job is ready to run
Removed: Job has been canceled or otherwise terminated externally
Running: Job is currently executing
Suspended: job has started but execution has temporarily been suspended

Note: Completed and canceled jobs should be maintained by the resource manager for a brief time, perhaps 1 to 5 minutes, before being purged. This provides the scheduler time to obtain all final job state information for scheduler statistics.

1.1.3 StartJob

The 'StartJob' command may only be applied to jobs in the 'Idle' state. It causes the job to begin running using the resources listed in the NodeID list.

send CMD=STARTJOB ARG=<JOBID> TASKLIST=<NODEID>[:<NODEID>]...

receive SC=<STATUSCODE> RESPONSE=<RESPONSE>

           STATUSCODE >= 0 indicates SUCCESS
           STATUSCODE < 0 indicates FAILURE
           RESPONSE   is a text message possibly further describing an error or state

job start example

# Start job nebo.1 on nodes cluster001 and cluster002
send 'CMD=STARTJOB ARG=nebo.1 TASKLIST=cluster001:cluster002'
receive 'SC=0;RESPONSE=job nebo.1 started with 2 tasks'

1.1.4 CancelJob

The 'CancelJob' command, if applied to an active job, will terminate its execution. If applied to an idle or active job, the CancelJob command will change the job's state to 'Canceled'.

send CMD=CANCELJOB ARG=<JOBID> TYPE=<CANCELTYPE>

<CANCELTYPE> is one of the following:

ADMIN (command initiated by scheduler administrator)
WALLCLOCK (command initiated by scheduler because job exceeded its specified wallclock limit)

receive SC=<STATUSCODE> RESPONSE=<RESPONSE>

           STATUSCODE >= 0 indicates SUCCESS
           STATUSCODE < 0 indicates FAILURE
           RESPONSE   is a text message further describing an error or state

job cancel example

# Cancel job nebo.2
send 'CMD=CANCELJOB ARG=nebo.2 TYPE=ADMIN'
receive 'SC=0 RESPONSE=job nebo.2 canceled'

1.1.5 SuspendJob

The 'SuspendJob' command can only be issued against a job in the state 'Running'. This command suspends job execution and results in the job changing to the 'Suspended' state.

send CMD=SUSPENDJOB ARG=<JOBID>

receive SC=<STATUSCODE> RESPONSE=<RESPONSE>

           STATUSCODE >= 0 indicates SUCCESS
           STATUSCODE < 0 indicates FAILURE
           RESPONSE   is a text message possibly further describing an error or state

job suspend example

# Suspend job nebo.3
send 'CMD=SUSPENDJOB ARG=nebo.3'
receive 'SC=0 RESPONSE=job nebo.3 suspended'

1.1.6 ResumeJob

The 'ResumeJob' command can only be issued against a job in the state 'Suspended'. This command resumes a suspended job returning it to the 'Running' state.

send CMD=RESUMEJOB ARG=<JOBID>

receive SC=<STATUSCODE> RESPONSE=<RESPONSE>

           STATUSCODE >= 0 indicates SUCCESS
           STATUSCODE < 0 indicates FAILURE
           RESPONSE   is a text message further describing an error or state

job resume example

# Resume job nebo.3
send 'CMD=RESUMEJOB ARG=nebo.3'
receive 'SC=0 RESPONSE=job nebo.3 resumed'

1.1.7 RequeueJob

The 'RequeueJob' command can only be issued against an active job in the state 'Starting' or 'Running'. This command requeues the job, stopping execution and returning the job to an idle state in the queue. The requeued job will be eligible for execution the next time resources are available.

send CMD=REQUEUEJOB ARG=<JOBID>

receive SC=<STATUSCODE> RESPONSE=<RESPONSE>

           STATUSCODE >= 0 indicates SUCCESS
           STATUSCODE < 0 indicates FAILURE
           RESPONSE   is a text message further describing an error or state

job requeue example

# Requeue job nebo.3
send 'CMD=REQUEUEJOB ARG=nebo.3'
receive 'SC=0 RESPONSE=job nebo.3 requeued'

1.1.8 SignalJob

The 'SignalJob' command can only be issued against an active job in the state 'Starting' or 'Running'. This command signals the job, sending the specified signal to the master process. The signalled job will be remain in the same state it was before the signal was issued.

send CMD=SIGNALJOB ARG=<JOBID> ACTION=signal VALUE=<SIGNAL>

receive SC=<STATUSCODE> RESPONSE=<RESPONSE>

           STATUSCODE >= 0 indicates SUCCESS
           STATUSCODE < 0 indicates FAILURE
           RESPONSE   is a text message further describing an error or state

job signal example

# Signal job nebo.3
send 'CMD=SIGNALJOB ARG=nebo.3 ACTION=signal VALUE=13'
receive 'SC=0 RESPONSE=job nebo.3 signalled'

1.1.9 ModifyJob

The 'ModifyJob' command can be issued against any active or queued job. This command modifies specified attributes of the job.

send CMD=MODIFYJOB ARG=<JOBID> [BANK=name] [NODES=num] [PARTITION=name] [TIMELIMIT=minutes]

receive SC=<STATUSCODE> RESPONSE=<RESPONSE>

           STATUSCODE >= 0 indicates SUCCESS
           STATUSCODE < 0 indicates FAILURE
           RESPONSE   is a text message further describing an error or state

job modify example

# Signal job nebo.3
send 'CMD=MODIFYJOB ARG=nebo.3 TIMELIMIT=9600'
receive 'SC=0 RESPONSE=job nebo.3 modified'

1.1.10 JobAddTask

The 'JobAddTask' command allocates additional tasks to an active job.

send

CMD=JOBADDTASK ARG=<JOBID> <NODEID> [<NODEID>]...

receive

SC=<STATUSCODE> RESPONSE=<RESPONSE>

           STATUSCODE >= 0 indicates SUCCESS
           STATUSCODE < 0 indicates FAILURE
           RESPONSE   is a text message possibly further describing an error or state

job addtask example

# Add 3 default tasks to job nebo30023.0 using resources located on nodes cluster002, cluster016, and cluster112.
send 'CMD=JOBADDTASK ARG=nebo30023.0 DEFAULT cluster002 cluster016 cluster112'
receive 'SC=0 RESPONSE=3 tasks added'

1.1.11 JobRemoveTask

The 'JobRemoveTask' command removes tasks from an active job.

send

CMD=JOBREMOVETASK ARG=<JOBID> <TASKID> [<TASKID>]...

receive

SC=<STATUSCODE> RESPONSE=<RESPONSE>

           STATUSCODE >= 0 indicates SUCCESS
           STATUSCODE < 0 indicates FAILURE
           RESPONSE   is a text message further describing an error or state

job removetask example

# Free resources allocated to tasks 14, 15, and 16 of job nebo30023.0
send 'CMD=JOBREMOVETASK ARG=nebo30023.0 14 15 16'
receive 'SC=0 RESPONSE=3 tasks removed'

1.2 Rejection Codes

0xx - success - no error

00x - success

000 - success

01x - usage/help reply

010 - usage/help reply

02x - status reply

020 - general status reply

1xx - warning

10x - general warning

100 - general warning

11x - no content

110 - general wire protocol or network warning
112 - redirect
114 - protocol warning

12x - no matching results

120 - general message format warning
122 - incomplete specification (best guess action/response applied)

13x - security warning

130 - general security warning
132 - insecure request
134 - insufficient privileges (response was censored/action reduced in scope)

14x - content or action warning

140 - general content/action warning
142 - no content (server has processed the request but there is no data to be returned)
144 - no action (no object to act upon)
146 - partial content
148 - partial action

15x - component defined
18x - application defined

2xx - wire protocol/network failure

20x - protocol failure

200 - general protocol/network failure

21x - network failure

210 - general network failure
212 - cannot resolve host
214 - cannot resolve port
216 - cannot create socket
218 - cannot bind socket

22x - connection failure

220 - general connection failure
222 - cannot connect to service
224 - cannot send data
226 - cannot receive data

23x - connection rejected

230 - general connection failure
232 - connection timed-out
234 - connection rejected - too busy
236 - connection rejected - message too big

24x - malformed framing

240 - general framing failure
242 - malformed framing protocol
244 - invalid message size
246 - unexpected end of file

25x - component defined
28x - application defined

3xx - messaging format error

30x - general messaging format error

300 - general messaging format error

31x - malformed XML document

310 - general malformed XML error

32x - XML schema validation error

320 - general XML schema validation

33x - general syntax error in request

330 - general syntax error in response
332 - object incorrectly specified
334 - action incorrectly specified
336 - option/parameter incorrectly specified

34x - general syntax error in response

340 - general response syntax error
342 - object incorrectly specified
344 - action incorrectly specified
346 - option/parameter incorrectly specified

35x - synchronization failure

350 - general synchronization failure
352 - request identifier is not unique
354 - request id values do not match
356 - request id count does not match

4xx - security error occurred

40x - authentication failure - client signature

400 - general client signature failure
402 - invalid authentication type
404 - cannot generate security token key - inadequate information
406 - cannot canonicalize request
408 - cannot sign request

41x - negotiation failure

410 - general negotiation failure
412 - negotiation request malformed
414 - negotiation request not understood
416 - negotiation request not supported

42x - authentication failure

420 - general authentication failure
422 - client signature failure
424 - server authentication failure
426 - server signature failure
428 - client authentication failure

43x - encryption failure

430 - general encryption failure
432 - client encryption failure
434 - server decryption failure
436 - server encryption failure
438 - client decryption failure

44x - authorization failure

440 - general authorization failure
442 - client authorization failure
444 - server authorization failure

45x - component defined failure
48x - application defined failure

5xx - event management request failure

50x - reserved

500 - reserved

6xx - reserved for future use

60x - reserved

600 - reserved

7xx - server side error occurred

70x - server side error

700 - general server side error

71x - server does not support requested function

710 - server does not support requested function

72x - internal server error

720 - general internal server error

73x - resource unavailable

730 - general resource unavailable error
732 - software resource unavailable error
734 - hardware resource unavailable error

74x - request violates policy

740 - general policy violation

75x - component-defined failure
78x - application-defined failure

8xx - client side error occurred

80x - general client side error

800 - general client side error

81x - request not supported

810 - request not supported

82x - application specific failure

820 - general application specific failure

9xx - miscellaneous

90x - general miscellaneous error

900 - general miscellaneous error

91x - general insufficient resources error

910 - general insufficient resources error

99x - general unknown error

999 - unknown error

Busy:	Node is running some jobs and will not accept additional jobs
Down:	Resource Manager problems have been detected. Node is incapable of running jobs.
Draining:	Node is responding but will not accept new jobs
Idle:	Node is ready to run jobs but currently is not running any.
Running:	Node is running some jobs and will accept additional jobs
Unknown:	Node is capable of running jobs but the scheduler will need to determine if the node state is actually Idle, Running, or Busy.

Completed:	Job has completed
Hold:	Job is in the queue but is not allowed to run
Idle:	Job is ready to run
Removed:	Job has been canceled or otherwise terminated externally
Running:	Job is currently executing
Suspended:	job has started but execution has temporarily been suspended