(Click to open topic with navigation)
This topic demonstrates how Moab uses the Moab RM language (formerly called WIKI) to communicate with Slurm. For Slurm configuration instructions, see the Moab-Slurm Integration Guide.
In this topic:
All commands are requested via a socket interface, one command per socket connection. All fields and values are specified in ASCII text.
Supported Commands are:
W.2.1.1 Moab RM Language Query Resources
W.2.1.1.1 Moab RM Language Query Resources Request Format
CMD=GETNODES ARG={<UPDATETIME>:<NODEID>[:<NODEID>]... | <UPDATETIME>:ALL}
Only nodes updated more recently than <UPDATETIME> will be returned where <UPDATETIME> is specified as the epoch time of interest. Setting <UPDATETIME> to 0 will return information for all nodes. Specify a colon delimited list of NODEIDs if specific nodes are desired or use the keyword ALL to receive information for all nodes.
W.2.1.1.2 Moab RM Language Resources Response Format
The query resources response format is one or more line of the following format (separated with a new line):
<NODEID><ATTR>=<VALUE>[;<ATTR>=<VALUE>]...
<ATTR> is a valid query resource and the format of <VALUE> is dependent on <ATTR>. See W.1.1 Query Resources Data Format for a list of valid query resources.
Example 5-169: Moab RM language resource query and response
Request:
CMD=GETNODES ARG=0:node001:node002:node003
Response:
node001 UPDATETIME=963004212;STATE=Busy;OS=AIX43;ARCH=RS6000... node002 UPDATETIME=963004213;STATE=Busy;OS=AIX43;ARCH=RS6000... ...
W.2.1.2 Moab RM Language Query Workload
W.2.1.2.1 Moab RM Language Query Workload Request Format
CMD=GETJOBS ARG={<UPDATETIME>:<JOBID>[:<JOBID>]... | <UPDATETIME>:ALL }
Only jobs updated more recently than <UPDATETIME> will be returned where <UPDATETIME> is specified as the epoch time of interest. Setting <UPDATETIME> to 0 will return information for all jobs. Specify a colon delimited list of JOBID's if information for specific jobs is desired or use the keyword ALL to receive information about all jobs.
W.2.1.2.2 Moab RM Language Query Workload Response Format
SC=<STATUSCODE> ARG=<JOBCOUNT>#<JOBID>:<FIELD>=<VALUE>;[<FIELD>=<VALUE>;]...[#<JOBID>:<FIELD>=<VALUE>;[<FIELD>=<VALUE>;]...]...
or
SC=<STATUSCODE> RESPONSE=<RESPONSE>
FIELD is
either the text name listed below or A<FIELDNUM>
(ie, UPDATETIME or A2)
STATUSCODE values:
RESPONSE is a statuscode sensitive message describing error or state details.
W.2.1.2.3 Moab RM Language Query Workload Example
Request:
CMD=GETJOBS ARG=0:ALL
Response:
ARG=2#nebo3001.0:UPDATETIME=9780000320;STATE=Idle;WCLIMIT=3600;...
The StartJob command may only be applied to jobs in the Idle state. It causes the job to begin running using the resources listed in the NodeID list.
send CMD=STARTJOB ARG=<JOBID> TASKLIST=<NODEID>[:<NODEID>]...
receive SC=<STATUSCODE> RESPONSE=<RESPONSE>
STATUSCODE >= 0
indicates SUCCESS
STATUSCODE < 0
indicates FAILURE
RESPONSE is a
text message possibly further describing an error or state
Example 5-170: Job start
# Start job nebo.1 on nodes cluster001 and cluster002
# send
CMD=STARTJOB ARG=nebo.1 TASKLIST=cluster001:cluster002
# receive
SC=0;RESPONSE=job nebo.1 started with 2 tasks
The CancelJob command, if applied to an active job, will terminate its execution. If applied to an idle or active job, the CancelJob command will change the job's state to Canceled.
send CMD=CANCELJOB ARG=<JOBID> TYPE=<CANCELTYPE>
<CANCELTYPE> is one of the following:
ADMIN
(command initiated by scheduler administrator)
WALLCLOCK (command initiated by scheduler because job exceeded its
specified wallclock limit)
receive SC=<STATUSCODE> RESPONSE=<RESPONSE>
STATUSCODE >= 0
indicates SUCCESS
STATUSCODE < 0
indicates FAILURE
RESPONSE is a
text message further describing an error or state
Example 5-171: Job cancel
# Cancel job nebo.2
# send
CMD=CANCELJOB ARG=nebo.2 TYPE=ADMIN'
# receive
SC=0 RESPONSE=job nebo.2 canceled
The SuspendJob command can only be issued against a job in the state Running. This command suspends job execution and results in the job changing to the Suspended state.
send CMD=SUSPENDJOB ARG=<JOBID>
receive SC=<STATUSCODE> RESPONSE=<RESPONSE>
STATUSCODE >= 0
indicates SUCCESS
STATUSCODE < 0
indicates FAILURE
RESPONSE is a
text message possibly further describing an error or state
Example 5-172: Job suspend
# Suspend job nebo.3
# send
CMD=SUSPENDJOB ARG=nebo.3 # receive
SC=0 RESPONSE=job nebo.3 suspended
The ResumeJob command can only be issued against a job in the state Suspended. This command resumes a suspended job returning it to the Running state.
send CMD=RESUMEJOB ARG=<JOBID>
receive SC=<STATUSCODE> RESPONSE=<RESPONSE>
STATUSCODE >= 0
indicates SUCCESS
STATUSCODE < 0
indicates FAILURE
RESPONSE is a
text message further describing an error or state
Example 5-173: Job resume
# Resume job nebo.3
# send
CMD=RESUMEJOB ARG=nebo.3
# receive
SC=0 RESPONSE=job nebo.3 resumed
The RequeueJob command can only be issued against an active job in the state Starting or Running. This command the job, stopping execution and returning the job to an idle state in the queue. The requeued job will be eligible for execution the next time resources are available.
send CMD=REQUEUEJOB ARG=<JOBID>
receive SC=<STATUSCODE> RESPONSE=<RESPONSE>
STATUSCODE >= 0
indicates SUCCESS
STATUSCODE < 0
indicates FAILURE
RESPONSE is a
text message further describing an error or state
Example 5-174: job requeue
# Requeue job nebo.3
# send
CMD=REQUEUEJOB ARG=nebo.3 # receive
SC=0 RESPONSE=job nebo.3 requeued
The SignalJob command can only be issued against an active job in the state Starting or Running. This command signals the job, sending the specified signal to the master process. The signaled job will be remain in the same state it was before the signal was issued.
send CMD=SIGNALJOB ARG=<JOBID> ACTION=signal VALUE=<SIGNAL>
receive SC=<STATUSCODE> RESPONSE=<RESPONSE>
STATUSCODE >= 0
indicates SUCCESS
STATUSCODE < 0
indicates FAILURE
RESPONSE is a
text message further describing an error or state
Example 5-175: Job signal
# Signal job nebo.3
# send
CMD=SIGNALJOB ARG=nebo.3 ACTION=signal VALUE=13
# receive
SC=0 RESPONSE=job nebo.3 signaled
The ModifyJob command can be issued against any active or queued job. This command modifies specified attributes of the job.
send CMD=MODIFYJOB ARG=<JOBID> [BANK=name] [NODES=num] [PARTITION=name] [TIMELIMIT=minutes]
receive SC=<STATUSCODE> RESPONSE=<RESPONSE>
STATUSCODE >= 0
indicates SUCCESS
STATUSCODE < 0
indicates FAILURE
RESPONSE is a
text message further describing an error or state
Example 5-176: Job modify
# Signal job nebo.3
# send
CMD=MODIFYJOB ARG=nebo.3 TIMELIMIT=9600
# receive
SC=0 RESPONSE=job nebo.3 modified
The JobAddTask command allocates additional tasks to an active job.
send CMD=JOBADDTASK ARG=<JOBID> <NODEID> [<NODEID>]...
receive SC=<STATUSCODE> RESPONSE=<RESPONSE>
STATUSCODE >= 0
indicates SUCCESS
STATUSCODE < 0
indicates FAILURE
RESPONSE is a
text message possibly further describing an error or state
Example 5-177: Job addtask
# Add 3 default tasks to job nebo30023.0 using resources located on nodes cluster002, cluster016, and cluster112.
# send
CMD=JOBADDTASK ARG=nebo30023.0 DEFAULT cluster002 cluster016 cluster112 # receive
SC=0 RESPONSE=3 tasks added
The JobRemoveTask command removes tasks from an active job.
send CMD=JOBREMOVETASK ARG=<JOBID> <TASKID> [<TASKID>]...
receive SC=<STATUSCODE> RESPONSE=<RESPONSE>
STATUSCODE >= 0
indicates SUCCESS
STATUSCODE < 0
indicates FAILURE
RESPONSE is a
text message further describing an error or state
Example 5-178: Job removetask
# Free resources allocated to tasks 14, 15, and 16 of job nebo30023.0
# send
CMD=JOBREMOVETASK ARG=nebo30023.0 14 15 16 # receive
SC=0 RESPONSE=3 tasks removed
Related Topics