W.2 Managing Resources with SLURM

This section demonstrates how Moab uses the Moab RM language (formerly called WIKI) to communicate with SLURM. For SLURM configuration instructions, see the Moab-SLURM Integration Guide.

W.2.1 Commands

All commands are requested via a socket interface, one command per socket connection. All fields and values are specified in ASCII text.

Supported Commands are:


W.2.1.1 Moab RM Language Query Resources

W.2.1.1.1 Moab RM Language Query Resources Request Format
CMD=GETNODES ARG={<UPDATETIME>:<NODEID>[:<NODEID>]... | <UPDATETIME>:ALL}

Only nodes updated more recently than <UPDATETIME> will be returned where <UPDATETIME> is specified as the epoch time of interest. Setting <UPDATETIME> to '0' will return information for all nodes. Specify a colon delimited list of NODEID's if specific nodes are desired or use the keyword 'ALL' to receive information for all nodes.

W.2.1.1.2 Moab RM Language Resources Response Format
The query resources response format is one or more line of the following format (separated with a newline, " "):

<NODEID> <ATTR>=<VALUE>[;<ATTR>=<VALUE>]...

<ATTR> is one of the names in the table below and the format of <VALUE> is dependent on <ATTR>.

Moab RM Language Query Resources Example
request:
Moab RM language resource query
CMD=GETNODES ARG=0:node001:node002:node003

response:

Moab RM language resource query response
node001 UPDATETIME=963004212;STATE=Busy;OS=AIX43;ARCH=RS6000...
node002 UPDATETIME=963004213;STATE=Busy;OS=AIX43;ARCH=RS6000...
...

W.2.1.2 Moab RM Language Query Workload

W.2.1.2.1 Moab RM Language Query Workload Request Format
CMD=GETJOBS ARG={<UPDATETIME>:<JOBID>[:<JOBID>]... | <UPDATETIME>:ALL }

Only jobs updated more recently than <UPDATETIME> will be returned where <UPDATETIME> is specified as the epoch time of interest. Setting <UPDATETIME> to '0' will return information for all jobs. Specify a colon delimited list of JOBID's if information for specific jobs is desired or use the keyword 'ALL' to receive information about all jobs.

W.2.1.2.2 Moab RM Language Query Workload Response Format

SC=<STATUSCODE> ARG=<JOBCOUNT>#<JOBID>:<FIELD>=<VALUE>;[<FIELD>=<VALUE>;]...[#<JOBID>:<FIELD>=<VALUE>;[<FIELD>=<VALUE>;]...]...

or

SC=<STATUSCODE> RESPONSE=<RESPONSE>

FIELD is either the text name listed below or 'A<FIELDNUM>'
(ie, 'UPDATETIME' or 'A2')

STATUSCODE values:

0 SUCCESS
-1 INTERNAL ERROR

RESPONSE is a statuscode sensitive message describing error or state details

W.2.1.2.3 Moab RM Language Query Workload Example
request syntax
CMD=GETJOBS ARG=0:ALL
response syntax
ARG=2#nebo3001.0:UPDATETIME=9780000320;STATE=Idle;WCLIMIT=3600;...

W.2.1.3 StartJob

The 'StartJob' command may only be applied to jobs in the 'Idle' state. It causes the job to begin running using the resources listed in the NodeID list.

send CMD=STARTJOB ARG=<JOBID> TASKLIST=<NODEID>[:<NODEID>]...

receive SC=<STATUSCODE> RESPONSE=<RESPONSE>

STATUSCODE >= 0 indicates SUCCESS
STATUSCODE < 0 indicates FAILURE
RESPONSE is a text message possibly further describing an error or state

job start example

# Start job nebo.1 on nodes cluster001 and cluster002
send 'CMD=STARTJOB ARG=nebo.1 TASKLIST=cluster001:cluster002'
receive 'SC=0;RESPONSE=job nebo.1 started with 2 tasks'

W.2.1.4 CancelJob

The 'CancelJob' command, if applied to an active job, will terminate its execution. If applied to an idle or active job, the CancelJob command will change the job's state to 'Canceled'.

send CMD=CANCELJOB ARG=<JOBID> TYPE=<CANCELTYPE>

<CANCELTYPE> is one of the following:

ADMIN (command initiated by scheduler administrator)
WALLCLOCK (command initiated by scheduler because job exceeded its specified wallclock limit)

receive SC=<STATUSCODE> RESPONSE=<RESPONSE>

STATUSCODE >= 0 indicates SUCCESS
STATUSCODE < 0 indicates FAILURE
RESPONSE is a text message further describing an error or state

job cancel example
# Cancel job nebo.2

send 'CMD=CANCELJOB ARG=nebo.2 TYPE=ADMIN'
receive 'SC=0 RESPONSE=job nebo.2 canceled'

W.2.1.5 SuspendJob

The 'SuspendJob' command can only be issued against a job in the state 'Running'. This command suspends job execution and results in the job changing to the 'Suspended' state.

send CMD=SUSPENDJOB ARG=<JOBID>

receive SC=<STATUSCODE> RESPONSE=<RESPONSE>

STATUSCODE >= 0 indicates SUCCESS
STATUSCODE < 0 indicates FAILURE
RESPONSE is a text message possibly further describing an error or state

job suspend example

# Suspend job nebo.3
send 'CMD=SUSPENDJOB ARG=nebo.3'
receive 'SC=0 RESPONSE=job nebo.3 suspended'

W.2.1.6 ResumeJob

The 'ResumeJob' command can only be issued against a job in the state 'Suspended'. This command resumes a suspended job returning it to the 'Running' state.

send CMD=RESUMEJOB ARG=<JOBID>

receive SC=<STATUSCODE> RESPONSE=<RESPONSE>

STATUSCODE >= 0 indicates SUCCESS
STATUSCODE < 0 indicates FAILURE
RESPONSE is a text message further describing an error or state

job resume example

# Resume job nebo.3
send 'CMD=RESUMEJOB ARG=nebo.3'
receive 'SC=0 RESPONSE=job nebo.3 resumed'

W.2.1.7 RequeueJob

The 'RequeueJob' command can only be issued against an active job in the state 'Starting' or 'Running'. This command requeues the job, stopping execution and returning the job to an idle state in the queue. The requeued job will be eligible for execution the next time resources are available.

send CMD=REQUEUEJOB ARG=<JOBID>

receive SC=<STATUSCODE> RESPONSE=<RESPONSE>

STATUSCODE >= 0 indicates SUCCESS
STATUSCODE < 0 indicates FAILURE
RESPONSE is a text message further describing an error or state

job requeue example

# Requeue job nebo.3
send 'CMD=REQUEUEJOB ARG=nebo.3'
receive 'SC=0 RESPONSE=job nebo.3 requeued'

W.2.1.8 SignalJob

The 'SignalJob' command can only be issued against an active job in the state 'Starting' or 'Running'. This command signals the job, sending the specified signal to the master process. The signalled job will be remain in the same state it was before the signal was issued.

send CMD=SIGNALJOB ARG=<JOBID> ACTION=signal VALUE=<SIGNAL>

receive SC=<STATUSCODE> RESPONSE=<RESPONSE>

STATUSCODE >= 0 indicates SUCCESS
STATUSCODE < 0 indicates FAILURE
RESPONSE is a text message further describing an error or state

job signal example

# Signal job nebo.3
send 'CMD=SIGNALJOB ARG=nebo.3 ACTION=signal VALUE=13'
receive 'SC=0 RESPONSE=job nebo.3 signalled'

W.2.1.9 ModifyJob

The 'ModifyJob' command can be issued against any active or queued job. This command modifies specified attributes of the job.

send CMD=MODIFYJOB ARG=<JOBID> [BANK=name] [NODES=num] [PARTITION=name] [TIMELIMIT=minutes]

receive SC=<STATUSCODE> RESPONSE=<RESPONSE>

STATUSCODE >= 0 indicates SUCCESS
STATUSCODE < 0 indicates FAILURE
RESPONSE is a text message further describing an error or state

job modify example

# Signal job nebo.3
send 'CMD=MODIFYJOB ARG=nebo.3 TIMELIMIT=9600'
receive 'SC=0 RESPONSE=job nebo.3 modified'

W.2.1.10 JobAddTask

The 'JobAddTask' command allocates additional tasks to an active job.

send

CMD=JOBADDTASK ARG=<JOBID> <NODEID> [<NODEID>]...

receive

SC=<STATUSCODE> RESPONSE=<RESPONSE>

STATUSCODE >= 0 indicates SUCCESS
STATUSCODE < 0 indicates FAILURE
RESPONSE is a text message possibly further describing an error or state

job addtask example

# Add 3 default tasks to job nebo30023.0 using resources located on nodes cluster002, cluster016, and cluster112.
send 'CMD=JOBADDTASK ARG=nebo30023.0 DEFAULT cluster002 cluster016 cluster112'
receive 'SC=0 RESPONSE=3 tasks added'

W.2.1.11 JobRemoveTask

The 'JobRemoveTask' command removes tasks from an active job.

send

CMD=JOBREMOVETASK ARG=<JOBID> <TASKID> [<TASKID>]...

receive

SC=<STATUSCODE> RESPONSE=<RESPONSE>

STATUSCODE >= 0 indicates SUCCESS
STATUSCODE < 0 indicates FAILURE
RESPONSE is a text message further describing an error or state

job removetask example


# Free resources allocated to tasks 14, 15, and 16 of job nebo30023.0
send 'CMD=JOBREMOVETASK ARG=nebo30023.0 14 15 16'
receive 'SC=0 RESPONSE=3 tasks removed'

 W.2.2 Rejection Codes

See Also

Copyright © 2012 Adaptive Computing Enterprises, Inc.®