COMMANDS:
All commands are requested via a socket interface, one command per socket connection. All fields and values are specified in ASCII text. Maui is configured to communicate via the wiki interface by specifying the following parameters in the maui.cfg file:
RMTYPE[X]
WIKI
RMSERVER[X] <HOSTNAME>
RMPORT[X] <PORTNUMBER>
Field values must backslash escape the following characters if specified:
'#' ';' ':' (ie '\#')
Supported Commands are:
GETNODES, GETJOBS, STARTJOB, CANCELJOB, SUSPENDJOB, RESUMEJOB, JOBADDTASK, JOBRELEASETASK
send
CMD=GETNODES ARG={<UPDATETIME>:<NODEID>[:<NODEID>]... | <UPDATETIME>:ALL}
Only nodes updated more recently than <UPDATETIME> will be returned where <UPDATETIME> is specified as the epoch time of interest. Setting <UPDATETIME> to '0' will return information for all nodes. Specify a colon delimited list of NODEID's if specific nodes are desired or use the keyword 'ALL' to receive information for all nodes.
receive
SC=<STATUSCODE> ARG=<NODECOUNT>#<NODEID>:<FIELD>=<VALUE>;[<FIELD>=<VALUE>;]...[#<NODEID>:<FIELD>=<VALUE>;[<FIELD>=<VALUE>;]...]...
or
SC=<STATUSCODE> RESPONSE=<RESPONSE>
STATUSCODE Values:
0
SUCCESS
-1 INTERNAL
ERROR
FIELD is either the text name listed below or 'A<FIELDNUM>' (ie, 'UPDATETIME' or 'A2')
RESPONSE is a statuscode sensitive message describing error or state details
EXAMPLE:
send 'CMD=GETNODES ARG=0:node001:node002:node003'
receive 'SC=0 ARG=4#node001:UPDATETIME=963004212;STATE=Busy;OS=AIX43;ARCH=RS6000...'
Field Values
INDEX | NAME | FORMAT | DEFAULT | DESCRIPTION |
1 | UPDATETIME* | <EPOCHTIME> | 0 | time node information was last updated |
2 | STATE* | one of the following: Idle, Running, Busy, Unknown,Draining, or Down | Down | state of the node |
3 | OS | <STRING> | [NONE] | operating system running on node |
4 | ARCH | <STRING> | [NONE] | compute architecture of node |
5 | CMEMORY | <INTEGER> | 0 | configured RAM on node (in MB) |
6 | AMEMORY | <INTEGER> | 0 | available/free RAM on node (in MB) |
7 | CSWAP | <INTEGER> | 0 | configured swap on node (in MB) |
8 | ASWAP | <INTEGER> | 0 | available swap on node (in MB) |
9 | CDISK | <INTEGER> | 0 | configured local disk on node (in MB) |
10 | ADISK | <INTEGER> | 0 | available local disk on node (in MB) |
11 | CPROC | <INTEGER> | 1 | configured processors on node |
12 | APROC | <INTEGER> | 1 | available processors on node |
13 | CNET | one or more colon delimited <STRING>'s (ie, ETHER:FDDI:ATM) | [NONE] | configured network interfaces on node |
14 | ANET | one or more colon delimited <STRING>'s (ie, ETHER:ATM) | [NONE] | Available network interfaces on node. Available interfaces are those which are 'up' and not already dedicated to a job. |
15 | CPULOAD | <DOUBLE> | 0.0 | one minute BSD load average |
16 | CCLASS | one or more bracket enclosed <NAME>:<COUNT> pairs (ie, [batch:5][sge:3]) | [NONE] | Run classes supported by node. Typically, one class is 'consumed' per task. Thus, an 8 processor node may have 8 instances of each class it supports present, ie [batch:8][interactive:8] |
17 | ACLASS | one or more bracket enclosed <NAME>:<COUNT> pairs (ie, [batch:5][sge:3]) | [NONE] | run classes currently available on node. If not specified, scheduler will attempt to determine actual ACLASS value. |
18 | FEATURE | one or more colon delimited <STRING>'s (ie, WIDE:HSM) | [NONE] | generic attributes, often describing hardware or software features, associated with the node. |
19 | PARTITION | <STRING> | DEFAULT | partition to which node belongs |
20 | EVENT | <STRING> | [NONE] | Event or exception which occurred on the node |
21 | CURRENTTASK | <INTEGER> | 0 | Number of tasks currently active on the node |
22 | MAXTASK | <INTEGER> | <CPROC> | Maximum number of tasks allowed on the node at any given time |
23 | SPEED | <DOUBLE> | 1.0 | Relative processor speed of the node |
24 | FRAME | <INTEGER> | 0 | Frame location of the node |
25 | SLOT | <INTEGER> | 0 | Slot location of the node |
26 | CRES | one or more colon delimited <NAME>,<VALUE> pairs (ie, MATLAB,6:COMPILER,100) | [NONE] | Arbitrary consumable resources supported and tracked on the node, ie software licenses or tape drives. |
27 | ARES | one or more colon delimited <NAME>,<VALUE> pairs (ie, MATLAB,6:COMPILER,100) | [NONE] | Arbitrary consumable resources currently available on the node |
* indicates required field
NOTE 1: node states have the following definitions:
Idle:
Node is ready to run jobs but currently is not running any.
Running: Node
is running some jobs and will accept additional jobs
Busy:
Node is running some jobs and will not accept additional jobs
Unknown: Node is capable
of running jobs but the scheduler will need to determine if the node state
is actually Idle, Running, or Busy.
Draining: Node is
responding but will not accept new jobs
Down:
Resource Manager problems have been detected. Node is incapable of
running jobs.
send
CMD=GETJOBS ARG={<UPDATETIME>:<JOBID>[:<JOBID>]... | <UPDATETIME>:ALL }
Only jobs updated more recently than <UPDATETIME> will be returned where <UPDATETIME> is specified as the epoch time of interest. Setting <UPDATETIME> to '0' will return information for all jobs. Specify a colon delimited list of JOBID's if information for specific jobs is desired or use the keyword 'ALL' to receive information about all jobs
receive
SC=<STATUSCODE> ARG=<JOBCOUNT>#<JOBID>:<FIELD>=<VALUE>;[<FIELD>=<VALUE>;]...[#<JOBID>:<FIELD>=<VALUE>;[<FIELD>=<VALUE>;]...]...
or
SC=<STATUSCODE> RESPONSE=<RESPONSE>
FIELD
is either the text name listed below or 'A<FIELDNUM>'
(ie, 'UPDATETIME' or 'A2')
STATUSCODE values:
0 SUCCESS
-1 INTERNAL ERROR
RESPONSE is a statuscode sensitive message describing error or state details
EXAMPLE:
send 'CMD=GETJOBS ARG=0:LL'
receive 'ARG=2#nebo3001.0:UPDATETIME=9780000320;STATE=Idle;WCLIMIT=3600;...'
Table of Job Field Values
INDEX | NAME | FORMAT | DEFAULT | DESCRIPTION |
1 | UPDATETIME* | <EPOCHTIME> | 0 | Time job was last updated |
2 | STATE* | one of Idle, Running, Hold, Suspended, Completed, or Cancelled | Idle | State of job |
3 | WCLIMIT* | <INTEGER> | 864000 | Seconds of wall time required by job |
4 | TASKS* | <INTEGER> | 1 | Number of tasks required by job |
5 | NODES | <INTEGER> | 1 | Number of nodes required by job |
6 | GEOMETRY | <STRING> | [NONE] | String describing task geometry required by job |
7 | QUEUETIME* | <EPOCHTIME> | 0 | time job was submitted to resource manager |
8 | STARTDATE | <EPOCHTIME> | 0 | earliest time job should be allowed to start |
9 | STARTTIME* | <EPOCHTIME> | 0 | time job was started by the resource manager |
10 | COMPLETIONTIME* | <EPOCHTIME> | 0 | time job completed execution |
11 | UNAME* | <STRING> | [NONE] | UserID under which job will run |
12 | GNAME* | <STRING> | [NONE] | GroupID under which job will run |
13 | ACCOUNT | <STRING> | [NONE] | AccountID associated with job |
14 | RFEATURES | colon delimited list <STRING>'s | [NONE] | List of features required on nodes |
15 | RNETWORK | <STRING> | [NONE] | network adapter required by job |
16 | DNETWORK | <STRING> | [NONE] | network adapter which must be dedicated to job |
17 | RCLASS | list of bracket enclosed <STRING>:<INTEGER> pairs | [NONE] | list of <CLASSNAME>:<COUNT> pairs indicating type and number of class instances required per task. (ie, '[batch:1]' or '[batch:2][tape:1]') |
18 | ROPSYS | <STRING> | [NONE] | operating system required by job |
19 | RARCH | <STRING> | [NONE] | architecture required by job |
20 | RMEM | <INTEGER> | 0 | real memory (RAM, in MB) required to be configured on nodes allocated to the job |
21 | RMEMCMP | one of '>=', '>', '==', '<', or '<=' | >= | real memory comparison (ie, node must have >= 512MB RAM) |
22 | DMEM | <INTEGER> | 0 | quantity of real memory (RAM, in MB) which must be dedicated to each task of the job |
23 | RDISK | <INTEGER> | 0 | local disk space (in MB) required to be configured on nodes allocated to the job |
24 | RDISKCMP | one of '>=', '>', '==', '<', or '<=' | >= | local disk comparison (ie, node must have > 2048 MB local disk) |
25 | DDISK | <INTEGER> | 0 | quantity of local disk space (in MB) which must be dedicated to each task of the job |
26 | RSWAP | <INTEGER> | 0 | virtual memory (swap, in MB) required to be configured on nodes allocated to the job |
27 | RSWAPCMP | one of '>=', '>', '==', '<', or '<=' | >= | virtual memory comparison (ie, node must have ==4096 MB virtual memory) |
28 | DSWAP | <INTEGER> | 0 | quantity of virtual memory (swap, in MB) which must be dedicated to each task of the job |
29 | PARTITIONMASK | one or more colon delimited <STRING>s | [ANY] | list of partitions in which job can run |
30 | EXEC | <STRING> | [NONE] | job executable command |
31 | IWD | <STRING> | [NONE] | job's initial working directory |
32 | COMMENT | <STRING> | 0 | general job attributes not described by other field |
33 | REJCOUNT | <INTEGER> | 0 | number of times job was rejected |
34 | REJMESSAGE | <STRING> | [NONE] | text description of reason job was rejected |
35 | REJCODE | <INTEGER> | 0 | reason job was rejected |
36 | EVENT | <EVENT> | [NONE] | event or exception experienced by job |
37 | TASKLIST | one or more colon delimited <STRING>s | [NONE] | nodeid associated with each active task of job (ie, cl01, cl02, cl01, cl02, cl03) |
38 | TASKPERNODE | <INTEGER> | 0 | exact number of tasks required per node |
39 | QOS | <INTEGER> | 0 | quality of service requested |
40 | ENDDATE | <EPOCHTIME> | [ANY] | time by which job must complete |
41 | CBSERVER | <STRING>[:<INTEGER> | [NONE] | location of server which will handle callback requests in <HOSTNAME>:<PORT> format |
42 | CBTYPE | one or more of the following delimited by colons: CANCEL and START | START:CANCEL | list of callback types requested by job |
43 | DPROCS | <INTEGER> | 1 | number of processors dedicated per task |
44 | SUSPENDTIME | <INTEGER> | 0 | Number of seconds job has been suspended |
45 | RESERVATION | <STRING> | [NONE] | Name of reservation in which job must run |
* indicates required field
NOTE 1: job states have the following
definitions:
Idle:
job is ready to run
Running:
job is currently executing
Hold:
job is in the queue but is not allowed to run
Suspended: job has started
but execution has temporarily been suspended
Completed: job has completed
Cancelled: job has
been cancelled
NOTE 2: completed and cancelled jobs should be maintained by the resource manager for a brief time, perhaps 1 to 5 minutes, before being purged. This provides the scheduler time to obtain all final job state information for scheduler statistics.
The 'StartJob' command may only be applied to jobs in the 'Idle' state. It causes the job to begin running using the resources listed in the NodeID list.
send CMD=STARTJOB ARG=<JOBID> TASKLIST=<NODEID>[:<NODEID>]...
receive SC=<STATUSCODE> RESPONSE=<RESPONSE>
STATUSCODE
>= 0 indicates SUCCESS
STATUSCODE
< 0 indicates FAILURE
RESPONSE
is a text message possibly further describing an error or state
EXAMPLE:
Start job nebo.1 on nodes cluster001 and cluster002
send 'CMD=STARTJOB ARG=nebo.1 TASKLIST=cluster001:cluster002'
receive 'SC=0;RESPONSE=job nebo.1 started with 2 tasks'
The 'CancelJob' command, if applied to an active job, with terminate its execution. If applied to an idle or active job, the CancelJob command will change the job's state to 'Cancelled'.
send CMD=CANCELJOB ARG=<JOBID> TYPE=<CANCELTYPE>
<CANCELTYPE> is one of the following:
ADMIN
(command initiated by scheduler administrator)
WALLCLOCK (command initiated by scheduler because
job exceeded its specified wallclock limit)
receive SC=<STATUSCODE> RESPONSE=<RESPONSE>
STATUSCODE
>= 0 indicates SUCCESS
STATUSCODE
< 0 indicates FAILURE
RESPONSE
is a text message further describing an error or state
EXAMPLE:
Cancel job nebo.2
send 'CMD=CANCELJOB ARG=nebo.2 TYPE=ADMIN'
receive 'SC=0 RESPONSE=job nebo.2 cancelled'
The 'SuspendJob' command can only be issued against
a job in the state 'Running'. This command suspends job execution
and results in the job changing to the 'Suspended' state.
send CMD=SUSPENDJOB ARG=<JOBID>
receive SC=<STATUSCODE> RESPONSE=<RESPONSE>
STATUSCODE
>= 0 indicates SUCCESS
STATUSCODE
< 0 indicates FAILURE
RESPONSE
is a text message possibly further describing an error or state
EXAMPLE:
Resume job nebo.3
send 'CMD=RESUMEJOB ARG=nebo.3'
receive 'SC=0 RESPONSE=job nebo.3 resumed'
The 'ResumeJob' command can only be issued against a job in the state 'Suspended'. This command resumes a suspended job returning it to the 'Running' state.
send CMD=RESUMEJOB ARG=<JOBID>
receive SC=<STATUSCODE> RESPONSE=<RESPONSE>
STATUSCODE
>= 0 indicates SUCCESS
STATUSCODE
< 0 indicates FAILURE
RESPONSE
is a text message further describing an error or state
EXAMPLE:
Resume job nebo.3
send 'CMD=RESUMEJOB ARG=nebo.3'
receive 'SC=0 RESPONSE=job nebo.3 resumed'
The 'JobAddTask' command allocates additional tasks to an active job.
send
CMD=JOBADDTASK ARG=<JOBID> <NODEID> [<NODEID>]...
receive
SC=<STATUSCODE> RESPONSE=<RESPONSE>
STATUSCODE >= 0 indicates SUCCESS
EXAMPLE:
Add 3 default tasks to job nebo30023.0 using resources located on nodes cluster002, cluster016, and cluster112.
send 'CMD=JOBADDTASK ARG=nebo30023.0 DEFAULT cluster002 cluster016 cluster112'
receive 'SC=0 RESPONSE=3 tasks added'
The 'JobReleaseTask' command removes tasks from an active job.
send
CMD=JOBREMOVETASK ARG=<JOBID> <TASKID> [<TASKID>]...
receive
SC=<STATUSCODE> RESPONSE=<RESPONSE>
STATUSCODE
>= 0 indicates SUCCESS
STATUSCODE
< 0 indicates FAILURE
RESPONSE
is a text message further describing an error or state
EXAMPLE:
Free resources allocated to tasks 14, 15, and 16 of job nebo30023.0
send 'CMD=JOBREMOVETASK ARG=nebo30023.0 14 15 16'
receive 'SC=0 RESPONSE=3 tasks removed'