Moab Workload Manager

11.11 Job Arrays beta

11.11.1 Job Array Overview

You can submit an array of jobs to Moab via the msub command. Array jobs are an easy way to submit many sub-jobs that perform the same work using the same script, but operate on different sets of data. Sub-jobs are the jobs created by an array job and are identified by the array job ID and an index; for example, if 235.1 is an identifier, the number 235 is a job array ID, and the number 1 is the sub-job.

Note The job array feature, new in Moab 6.0, does not integrate natively with TORQUE support for job arrays. Also, job array usage limits are presently unavailable.

11.11.2 Enabling Job Arrays

To enable job arrays, include the ENABLEJOBARRAYS parameter in the Moab configuration file (moab.cfg).

11.11.3 Sub-job Definitions

Like a normal job, an array job submits a job script, but it additionally has a start index (sidx) and an end index (eidx); array jobs also have increment (incr) values, which Moab uses to create sub-jobs, all executing the same script. The model for sub-job creation follows the formula of end index minus start index plus increment divided by the increment value: (eidx - sidx + incr) / incr.

To illustrate, suppose an array job has a start index of 1, an end index of 100, and an increment of 1. This is an array job that creates (100 - 1 + 1) / 1 = 100 sub-jobs with indexes of 1, 2, 3, ..., 100. An increment of 2 produces (100 - 1 + 2) / 2 = 50 sub-jobs with indexes of 1, 3, 5, ..., 99. An increment of 2 with a start index of 2 produces (100 - 2 + 2) / 2 = 50 sub-jobs with indexes of 2, 4, 6, ..., 100. Again, sub-jobs are jobs in their own right that have a slightly different job naming convention (jobID.subJobIndex).

11.11.4 Using Environment Variables to Specify Array Index Values

The script can use an environment variable to obtain the array index value to form data file and/or directory names unique to an array job's particular sub-job. The following two environment variables are supplied so job scripts can recognize what index in the array they are in; use the msub command with the -V option to pass the environment parameters to the resource manager, or include the parameters in a job script; for example: #PBS -V MOAB_JOBARRAYRANGE.

Environment Parameter Description
Used to create dataset file names, directory names, and so forth, when splitting up a single problem into multiple jobs.

For example, a user may split up a problem into 20 separate jobs, each with its own input and output data files whose names contain the numbers 1-20.

To illustrate, assume a user submits the 20 sub-jobs using two msub commands; one to submit the ten even-numbered jobs and one to submit the ten odd-numbered jobs.

msub -t job1.[1-20:2]
msub -t job2.[2-20:2]

The MOAB_JOBARRAYINDEX environment variable value would populate each of the two job arrays' ten sub-jobs as 1, 3, 5, 7, 9, 11, 13, 15, 17 and 19 for the first array job's ten sub-jobs, and 2, 4, 6, 8, 10, 12, 14, 16, 18, and 20 for the second array job's ten sub-jobs.

The count of jobs in the array.

11.11.4.1 Control

Users can control individual sub-jobs in the same manner as normal jobs. In addition, an array job represents its group of sub-jobs and any user or administrator commands performed on an array job apply to its sub-jobs; for example, the command canceljob <arrayJobId> cancels all sub-jobs that belong to the array job. For more information about job control, see the documentation for the mjobctl command.

11.11.4.2 Grid

If a user submits an array job to a grid head node, Moab must schedule the array job's sub-jobs to a single cluster; that is, its sub-jobs are not permitted to execute across multiple clusters.

11.11.4.3 Reporting

In the first example below, the parts unique to array subjobs are in bold.

$ checkjob -v Moab.1.1  
job Moab.1.1    

State: Running   
Creds:  user:testuser1  group:testgroup1  
WallTime:   00:00:17 of 8:20:00  
SubmitTime: Thu Nov  4 11:50:03    
(Time Queued  Total: 00:00:00  Eligible:   INFINITY)      

StartTime: Thu Nov  4 11:50:03    
Total Requested Tasks: 1      

Req[0]  TaskCount: 1  Partition: base      
Average Utilized Procs: 0.96    
NodeCount:  1      

Allocated Nodes:    
[node010:1]        

Job Group:        Moab.1    
Parent Array ID:  Moab.1    
Array Index:      1    
Array Range:      10    
SystemID:   Moab    
SystemJID:  Moab.1.1    
Task Distribution: node010      

IWD:            /home/jbanks    
UMask:          0000     
Executable:     /usr/test/moab/spool/moab.job.3CvNjl      

StartCount:     1    
Partition List: base    
SrcRM:          internal  DstRM: base  DstRMJID: Moab.1.1    
Flags:          ARRAYJOB,GLOBALQUEUE    
StartPriority:  1    
PE:             1.00    
Reservation 'Moab.1.1' (-00:00:19 -> 8:19:41  Duration: 8:20:00)

If the array range is not provided, the output displays all the jobs in the array.

$ checkjob -v medsec.1

job medsec.1

Job Array Info:
  Name: moab
  1 : medsec.1.1 : Running
  2 : medsec.1.2 : Running
  3 : medsec.1.3 : Running
  4 : medsec.1.4 : Running
  5 : medsec.1.5 : Running
  6 : medsec.1.6 : Running
  7 : medsec.1.7 : Running
  8 : medsec.1.8 : Running
  9 : medsec.1.9 : Running
  10 : medsec.1.10 : Running
  11 : medsec.1.11 : Running
  12 : medsec.1.12 : Running
  13 : medsec.1.13 : Running
  14 : medsec.1.14 : Running
  15 : medsec.1.15 : Running
  16 : medsec.1.16 : Running
  17 : medsec.1.17 : Running
  18 : medsec.1.18 : Running
  19 : medsec.1.19 : Running
  20 : medsec.1.20 : Running

  Totals:
    Active:   20
    Idle:     0
    Complete: 0

You can also use showq. This displays all active, eligible, blocked, and/or recently completed jobs on the system.

$ showq

active jobs------------------------
JOBID              USERNAME      STATE PROCS   REMAINING            STARTTIME

medsec.1.6         testuser   Starting     1 99:23:59:54  Thu Oct  7 16:18:19
medsec.1.13        testuser   Starting     1 99:23:59:54  Thu Oct  7 16:18:19
medsec.1.19        testuser   Starting     1 99:23:59:54  Thu Oct  7 16:18:19
medsec.1.5         testuser   Starting     1 99:23:59:54  Thu Oct  7 16:18:19
medsec.1.8         testuser   Starting     1 99:23:59:54  Thu Oct  7 16:18:19
medsec.1.10        testuser   Starting     1 99:23:59:54  Thu Oct  7 16:18:19
medsec.1.3         testuser   Starting     1 99:23:59:54  Thu Oct  7 16:18:19
medsec.1.4         testuser   Starting     1 99:23:59:54  Thu Oct  7 16:18:19
medsec.1.12        testuser   Starting     1 99:23:59:54  Thu Oct  7 16:18:19
medsec.1.2         testuser   Starting     1 99:23:59:54  Thu Oct  7 16:18:19
medsec.1.1         testuser   Starting     1 99:23:59:54  Thu Oct  7 16:18:19
medsec.1.20        testuser   Starting     1 99:23:59:54  Thu Oct  7 16:18:19
medsec.1.9         testuser   Starting     1 99:23:59:54  Thu Oct  7 16:18:19
medsec.1.14        testuser   Starting     1 99:23:59:54  Thu Oct  7 16:18:19
medsec.1.11        testuser   Starting     1 99:23:59:54  Thu Oct  7 16:18:19
medsec.1.16        testuser   Starting     1 99:23:59:54  Thu Oct  7 16:18:19
medsec.1.7         testuser   Starting     1 99:23:59:54  Thu Oct  7 16:18:19
medsec.1.17        testuser   Starting     1 99:23:59:54  Thu Oct  7 16:18:19
medsec.1.18        testuser   Starting     1 99:23:59:54  Thu Oct  7 16:18:19
medsec.1.15        testuser   Starting     1 99:23:59:54  Thu Oct  7 16:18:19

20 active jobs           20 of 240 processors in use by local jobs (8.33%)
                            1 of 4 nodes active      (25.00%)

eligible jobs----------------------
JOBID              USERNAME      STATE PROCS     WCLIMIT            QUEUETIME


0 eligible jobs   

blocked jobs-----------------------
JOBID              USERNAME      STATE PROCS     WCLIMIT            QUEUETIME


0 blocked jobs   

Total jobs:  21

11.11.5 Examples

Operations can be performed on individual jobs, a selection of jobs in a job array, or on the entire array.

11.11.5.1 Submitting Job Arrays

The syntax for submitting job arrays is: msub -t <alias>.[<indexlist>]%<limit>

The alias and limit are optional. The alias does not override the arrayid Moab assigns to the array. When submitting an array with an alias, Moab returns the arrayid, which is the scheduler name followed by a unique ID.

For example, if the scheduler name in moab.cfg is Moab, submitting an array with an alias responds like this:

### SCHEDCFG line in moab.cfg ###
SCHEDCFG[Moab] SERVER=headmaster
### Submitting an array with an alias ### > msub -t myarray.[1-10] job.sh Moab.6

To specify that only a certain number of sub-jobs in the array can run at a time, use the percent sign (%) delimiter. In this example, only five sub-jobs in the array can run at a time:

> msub -t myarray.[1-1000]%5

To submit a specific set of array sub-jobs, use the comma delimiter in the array index list:

> msub -t myarray.[1,2,3,4]
> msub -t myarray.[1-5,7,10]

You can use the checkjob command on either the arrayid or the alias you specified.

> msub -t myarray.[1-2] job.sh

Moab.10

$ checkjob myarray
  job Moab.10

AName: myarray
Job Array Info:
   Name: Moab.1
   1 : Moab.1[1] : Running
   2 : Moab.1[2] : Running

   Sub-jobs:           2
     Active:           2 ( 100.0% )
     Eligible:         0 ( 0.0% )
     Blocked:          0 ( 0.0% )
     Completed:        0 ( 0.0% )

State: Idle
Creds:  user:tuser1  group:tgroup1
WallTime:   00:00:00 of 99:23:59:59
SubmitTime: Thu Jun  2 16:37:17
   (Time Queued  Total: 00:00:33  Eligible: 00:00:00)

Total Requested Tasks: 1

Req[0]  TaskCount: 1  Partition: ALL

To submit a job with a step size, use a colon in the array range and specify how many jobs to step. In the example below, a step size of 2 is requested. The sub-jobs will be numbered according to the step size inside the index limit. The array master job name will be the same as explained above.

$ msub -t myarray.[2-10:2] job.sh

job Moab.15

$ checkjob -v myarray //or you could use 'checkjob -v Moab.15'
job Moab.15

AName: job
Job Array Info:
   Name: Moab.1
   2 : Moab.15[2] : Running
   4 : Moab.15[4] : Running
   6 : Moab.15[6] : Running
   8 : Moab.15[8] : Running   
   10 : Moab.15[10] : Running

   Sub-jobs:           4
     Active:           4 ( 100.0% )
     Eligible:         0 ( 0.0% )
     Blocked:          0 ( 0.0% )
     Completed:        0 ( 0.0% )

State: Idle
Creds:  user:tuser1  group:tgroup1
WallTime:   00:00:00 of 99:23:59:59
SubmitTime: Thu Jun  2 16:37:17
   (Time Queued  Total: 00:00:33  Eligible: 00:00:00)

Total Requested Tasks: 1

Req[0]  TaskCount: 1  Partition: ALL

11.11.5.2 Additional Operations

$ releasehold -a myarray OR myarray.1
$ checkjob -v myarray
$ canceljob myarray.1.1
$ releasehold -a myarray1.2
$ mjobctl -c r:myarray1.[20-30]
$ mjobctl -u r:myarray1.[10-99]
$ canceljob r:myarray1.[475-500]

See Also