(Click to open topic with navigation)
You can stage data to or from a local compute node in an environment where each node on the cluster has local storage. This type of data staging will make data stored outside the cluster available to a job that will run on a single node in the cluster. You must specify the username, host name, and path to a file or directory and a location on the compute node where Moab will store the data. You will supply the remote data source location at job submission time, but you must use the $JOBHOST placeholder for the name of the compute node. After the job runs, you can also copy data from the local file system to a remote file system.
Image 13-2: Data staging to or from a local compute node |
![]() |
Click to enlarge |
Before staging data to or from a local compute node, please follow the procedure in Configuring data staging.
To stage data to or from a local compute node
Add INHERITRES=TRUE to reserve the compute node for the data staging job to prevent other compute jobs from using the node at the same time and creating input, output, and disk conflicts with the data staging job.
If you use the rsync protocol, you can configure your data staging jobs to report the actual number of bytes transferred and the total data size to be transferred. To do so, use the Sets attribute to ^BYTES_IN.^DATA_SIZE_IN for stage in jobs and ^BYTES_OUT.^DATA_SIZE_OUT for stage out jobs. For example, a stage in trigger would look like the following:
JOBCFG[dsin] TRIGGER=EType=start,AType=exec,Action="/opt/moab/tools/data-staging/ds_move_rsync --stagein",Flags=objectxmlstdin:user:attacherror,Sets=^BYTES_IN.^DATA_SIZE_IN
A stage out trigger would look like the following:
JOBCFG[dsout] TRIGGER=EType=start,AType=exec,Action="/opt/moab/tools/data-staging/ds_move_rsync --stageout",Flags=objectxmlstdin:user:attacherror,Sets=^BYTES_OUT.^DATA_SIZE_OUT
These variables show up as events if you set your WIKIEVENTS parameter to TRUE.
JOBCFG[ds] TEMPLATEDEPEND=AFTEROK:dsin TEMPLATEDEPEND=BEFORE:dsout SELECT=TRUE
JOBCFG[dsin] DATASTAGINGSYSJOB=TRUE
JOBCFG[dsin] GRES=bandwidth:2
JOBCFG[dsin] NODEACCESSPOLICY=SINGLEJOB
JOBCFG[dsin] INHERITRES=TRUE
JOBCFG[dsin] TRIGGER=EType=start,AType=exec,Action="/opt/moab/tools/data-staging/ds_move_rsync --stagein",Flags=attacherror:objectxmlstdin:user
JOBCFG[dsout] DATASTAGINGSYSJOB=TRUE
JOBCFG[dsout] GRES=bandwidth:1
JOBCFG[dsout] NODEACCESSPOLICY=SINGLEJOB
JOBCFG[dsout] INHERITRES=TRUE
JOBCFG[dsout] TRIGGER=EType=start,AType=exec,Action="/opt/moab/tools/data-staging/ds_move_rsync --stageout",Flags=attacherror:objectxmlstdin:user
> msub -l naccesspolicy=singlejob... <jobScript>
If the destination partition is down or does not have configured resources, the data staging workflow submission will fail.
If you do not know the host where the job will run but want the data staged to the same location, you can use the $JOBHOST variable in place of a host.
> msub --stagein=annasmith@labs:/patient-022678/%\$JOBHOST:/davidharris/research/patientrecords <jobScript>
Moab copies the /patient-022678 directory from the hospital's labs server to the node where the job will run prior to job start.
<path>/<fileName>%
of the file. The file must contain at least one line with this format: <source>%<destination>, where <source> and <destination> are both [<user>@]<host>:/<path>[/<fileName>]. See Staging multiple files or directories for more information.If the destination partition is down or does not have configured resources, the data staging workflow submission will fail.
> msub --stageinfile=/davidharris/research/recordlist <jobScript>
Moab copies all files specified in the /davidharris/research/recordlist file to the host where the job will run prior to job start.
/davidharris/research/recordlist:
annasmith@labs:/patient-022678/tests/blood02282014%$JOBHOST:/davidharris/research/patientrecords/blood02282014
annasmith@labs:/patient-022678/visits/stats02032014%$JOBHOST:/davidharris/research/patientrecords/stats02032014
annasmith@labs:/patient-022678/visits/stats02142014%$JOBHOST:/davidharris/research/patientrecords/stats02142014
annasmith@labs:/patient-022678/visits/stats02282014%$JOBHOST:/davidharris/research/patientrecords/stats02282014
annasmith@labs:/patient-022678/visits/stats03032014%$JOBHOST:/davidharris/research/patientrecords/stats03032014
annasmith@labs:/patient-022678/visits/stats03142014%$JOBHOST:/davidharris/research/patientrecords/stats03142014
annasmith@labs:/patient-022678/visits/stats03282014%$JOBHOST:/davidharris/research/patientrecords/stats03282014
Moab copies the seven patient record files from the hospital's labs server to the host where the job will run prior to job start.
> msub --stageinfile=/davidharris/research/recordlist --stageinsize=100 <jobScript>
Moab copies the /davidharris/research/recordlist file, which is approximately 100 megabytes, from the biology node to the host where the job will run prior to job start.
Your checkjob output may include a warning that says "req 1 RM (internal) does not match job destination RM". You can safely ignore this message.
Related topics