(Click to open topic with navigation)
In the most common data staging use case, the cluster utilizes a shared file system between all compute nodes. This type of data staging makes data stored outside of the cluster available to a job that will run on any set of nodes in the cluster. At the time of submission, you must specify where Moab will obtain the data with a username, host name, and path to a file or directory and where on the shared file system Moab will store the data. After the job runs, you can also copy data from the shared file system back to a remote file system.
Image 24-1: Data staging to or from a shared file system |
Click to enlarge |
To stage data to or from a shared file system
Create your job templates for data staging jobs in moab.cfg. The templates in the example below create a compute job that stages data in before it starts and stages data out when it completes. For more information about creating job templates, see About Job Templates.
Add FLAGS=GRESONLY to indicate that this data staging job does not require any compute resources.
If you use the rsync protocol, you can configure your data staging jobs to report the actual number of bytes transferred and the total data size to be transferred. To do so, use the Sets attribute to ^BYTES_IN.^DATA_SIZE_IN for stage in jobs and ^BYTES_OUT.^DATA_SIZE_OUT for stage out jobs. For example, a stage in trigger would look like the following:
JOBCFG[dsin] TRIGGER=EType=start,AType=exec,Action="/opt/moab/tools/data-staging/ds_move_rsync --stagein",Flags=objectxmlstdin:user:attacherror,Sets=^BYTES_IN.^DATA_SIZE_IN
A stage out trigger would look like the following:
JOBCFG[dsout] TRIGGER=EType=start,AType=exec,Action="/opt/moab/tools/data-staging/ds_move_rsync --stageout",Flags=objectxmlstdin:user:attacherror,Sets=^BYTES_OUT.^DATA_SIZE_OUT
These variables show up as events if you set your WIKIEVENTS parameter to TRUE.
JOBCFG[ds] TEMPLATEDEPEND=AFTEROK:dsin TEMPLATEDEPEND=BEFORE:dsout SELECT=TRUE
JOBCFG[dsin] DATASTAGINGSYSJOB=TRUE
JOBCFG[dsin] GRES=bandwidth:2
JOBCFG[dsin] FLAGS=GRESONLY
JOBCFG[dsin] TRIGGER=EType=start,AType=exec,Action="/opt/moab/tools/data-staging/ds_move_rsync --stagein",Flags=attacherror:objectxmlstdin:user
JOBCFG[dsout] DATASTAGINGSYSJOB=TRUE
JOBCFG[dsout] GRES=bandwidth:1
JOBCFG[dsout] FLAGS=GRESONLY
JOBCFG[dsout] TRIGGER=EType=start,AType=exec,Action="/opt/moab/tools/data-staging/ds_move_rsync --stageout",Flags=attacherror:objectxmlstdin:user
If the destination partition is down or does not have configured resources, the data staging workflow submission will fail.
> msub --stagein=annasmith@labs:/patient-022678/%davidharris@university:/davidharris/research/patientrecords <jobScript>
Moab copies the /patient-022678 directory from the hospital's labs server to the university cluster where the job will run prior to job start.
<path>/<fileName>
of the file. The file must contain at least one line with this format: <source>%<destination>, where both <source> and <destination> are [<user>@]<host>:/<path>[<fileName>]. See Staging multiple files or directories for more information.If the destination partition is down or does not have configured resources, the data staging workflow submission will fail.
> msub --stageinfile=/davidharris/research/recordlist <jobScript>
Moab copies all files specified in the /davidharris/research/recordlist file to the cluster where the job will run prior to job start.
/davidharris/research/recordlist:
annasmith@labs:/patient-022678/tests/blood02282014%davidharris@university:/davidharris/research/patientrecords/blood02282014
annasmith@labs:/patient-022678/visits/stats02032014%davidharris@university:/davidharris/research/patientrecords/stats02032014
annasmith@labs:/patient-022678/visits/stats02142014%davidharris@university:/davidharris/research/patientrecords/stats02142014
annasmith@labs:/patient-022678/visits/stats02282014%davidharris@university:/davidharris/research/patientrecords/stats02282014
annasmith@labs:/patient-022678/visits/stats03032014%davidharris@university:/davidharris/research/patientrecords/stats03032014
annasmith@labs:/patient-022678/visits/stats03142014%davidharris@university:/davidharris/research/patientrecords/stats03142014
annasmith@labs:/patient-022678/visits/stats03282014%davidharris@university:/davidharris/research/patientrecords/stats03282014
Moab copies the seven patient record files from the hospital's labs server to the university cluster where the job will run prior to job start.
The --stageinsize/--stageoutsize option lets you specify the estimated size of the files and/or directories to help Moab more quickly and accurately calculate the amount of time it will take to stage the data and therefore schedule your job correctly. If you are staging data out, then setting --stageoutsize is required. If you provide an integer, Moab will assume the number is in megabytes. To change the unit, add another suffix. See Stage in or out file size for more information.
> msub --stageinfile=/davidharris/research/recordlist --stageinsize=100 <jobScript>
Moab copies the /davidharris/research/recordlist file, which is approximately 100 megabytes, from the biology node to the host where the job will run prior to job start.
To see the status, errors, and other details associated with your data staging job, run checkjob -v. See "checkjob" for details.
Related Topics