You are here: 24 Data Staging > How-to > Data Staging to or from a Compute Node

24.6 Staging Data to or from a Compute Node

Before staging data to or from a local compute node, please follow the procedure in Configuring Data Staging.

To stage data to or from a local compute node

  1. If you have not already done so, configure your SSH keys and moab.cfg to support data staging. See Configuring the SSH keys for the Data Staging Transfer Script and Configuring Data Staging for more information.
  2. Create your job templates for data staging jobs in moab.cfg. The templates in the example below create a compute job that stages data in before it starts and stages data out when it completes. For more information about creating job templates, see About Job Templates.
    1. Create a selectable master template, called ds in the example below, that creates a stage in and stage out system job. This name should match the DEFAULT_TEMPLATE value in ds_config.py. For more information, see Configuring Data Staging with Advanced Options.
    2. For the data staging in job template, called dsin in the example below, specify that it will create a data staging job by setting DATASTAGINGJOB to TRUE. Note that the name of this job template must match the name of the data stage in job template referenced in the master template.
    3. Set the staging job template bandwidth GRES to the amount of bandwidth a single stage in job should use. This indicates how many of the bandwidth units specified with NODECFG[GLOBAL] in Configuring Data Staging a data staging job with this template should consume.
    4. For local node data staging it is important that the data staging job has the entire node to itself. To prevent Moab from scheduling another job on the node at the same time as the data staging job, set the NODEACCESSPOLICY to SINGLEJOB in the staging job template.
    5. Add INHERITRES=TRUE to reserve the compute node for the data staging job to prevent other compute jobs from using the node at the same time and creating input, output, and disk conflicts with the data staging job.

    6. Create a trigger that executes the ds_move_scp, ds_move_rsync, or ds_move_multiplex script, depending on which file transfer utility you use. Set the attacherror, objectxmlstdin, and user FLAGs to attach any trigger stderr as a message to the job, pass the job XML to the script, and indicate that the script should run as the job's user, respectively.

      If you use the rsync protocol, you can configure your data staging jobs to report the actual number of bytes transferred and the total data size to be transferred. To do so, use the Sets attribute to ^BYTES_IN.^DATA_SIZE_IN for stage in jobs and ^BYTES_OUT.^DATA_SIZE_OUT for stage out jobs. For example, a stage in trigger would look like the following:

      JOBCFG[dsin]   TRIGGER=EType=start,AType=exec,Action="/opt/moab/tools/data-staging/ds_move_rsync --stagein",Flags=objectxmlstdin:user:attacherror,Sets=^BYTES_IN.^DATA_SIZE_IN

      A stage out trigger would look like the following:

      JOBCFG[dsout]   TRIGGER=EType=start,AType=exec,Action="/opt/moab/tools/data-staging/ds_move_rsync --stageout",Flags=objectxmlstdin:user:attacherror,Sets=^BYTES_OUT.^DATA_SIZE_OUT

      These variables show up as events if you set your WIKIEVENTS parameter to TRUE.

    7. Create the stage out job, called dsout in the example below, by repeating steps 2b - 2f in a new template. In the example below, this template is called dsout. Note that the name of this job template must match the name of the data stage out job template referenced in the data staging master template.
      JOBCFG[ds]     TEMPLATEDEPEND=AFTEROK:dsin TEMPLATEDEPEND=BEFORE:dsout SELECT=TRUE
       
      JOBCFG[dsin]   DATASTAGINGSYSJOB=TRUE
      JOBCFG[dsin]   GRES=bandwidth:2
      JOBCFG[dsin]   NODEACCESSPOLICY=SINGLEJOB
      JOBCFG[dsin]   INHERITRES=TRUE
      JOBCFG[dsin]   TRIGGER=EType=start,AType=exec,Action="/opt/moab/tools/data-staging/ds_move_rsync --stagein",Flags=attacherror:objectxmlstdin:user
       
      JOBCFG[dsout]  DATASTAGINGSYSJOB=TRUE
      JOBCFG[dsout]  GRES=bandwidth:1
      JOBCFG[dsout]  NODEACCESSPOLICY=SINGLEJOB
      JOBCFG[dsout]  INHERITRES=TRUE
      JOBCFG[dsout]  TRIGGER=EType=start,AType=exec,Action="/opt/moab/tools/data-staging/ds_move_rsync --stageout",Flags=attacherror:objectxmlstdin:user
  3. Create the job using msub, adding resources and specifying a script as you normally would. Then configure Moab to stage the data for it. To do so:
    1. If the compute job does not use all of the node's processors, Moab could schedule another job on the node at the same time. If you did not set NODEACCESSPOLICY to SINGLEJOB in your moab.cfg, set the policy for this job by adding -l naccesspolicy=singlejob to your msub command.
      > msub -l naccesspolicy=singlejob... <jobScript>
    2. At the end of the command, use the --stagein/--stageout option and/or --stageinfile/--stageoutfile option.
      • The --stagein/--stageout option lets you specify a single file or directory to stage in or out. You must set the option equal to <source>%<destination>, where <source> and <destination> are both [<user>@]<host>:/<path>/[<fileName>]. See Staging a file or directory for format and details.

        If the destination partition is down or does not have configured resources, the data staging workflow submission will fail.

        If you do not know the host where the job will run but want the data staged to the same location, you can use the $JOBHOST variable in place of a host.

        > msub --stagein=annasmith@labs:/patient-022678/%\$JOBHOST:/davidharris/research/patientrecords <jobScript>

        Moab copies the /patient-022678 directory from the hospital's labs server to the node where the job will run prior to job start.

      • The --stageinfile/--stageoutfile option lets you specify a file that contains the file and directory name(s) to stage in or out. You must set the option equal to <path>/<fileName>% of the file. The file must contain at least one line with this format: <source>%<destination>, where <source> and <destination> are both [<user>@]<host>:/<path>[/<fileName>]. See Staging multiple files or directories for more information.

        If the destination partition is down or does not have configured resources, the data staging workflow submission will fail.

        > msub --stageinfile=/davidharris/research/recordlist <jobScript>

        Moab copies all files specified in the /davidharris/research/recordlist file to the host where the job will run prior to job start.

        /davidharris/research/recordlist:

        annasmith@labs:/patient-022678/tests/blood02282014%$JOBHOST:/davidharris/research/patientrecords/blood02282014
        annasmith@labs:/patient-022678/visits/stats02032014%$JOBHOST:/davidharris/research/patientrecords/stats02032014
        annasmith@labs:/patient-022678/visits/stats02142014%$JOBHOST:/davidharris/research/patientrecords/stats02142014
        annasmith@labs:/patient-022678/visits/stats02282014%$JOBHOST:/davidharris/research/patientrecords/stats02282014
        annasmith@labs:/patient-022678/visits/stats03032014%$JOBHOST:/davidharris/research/patientrecords/stats03032014
        annasmith@labs:/patient-022678/visits/stats03142014%$JOBHOST:/davidharris/research/patientrecords/stats03142014
        annasmith@labs:/patient-022678/visits/stats03282014%$JOBHOST:/davidharris/research/patientrecords/stats03282014

        Moab copies the seven patient record files from the hospital's labs server to the host where the job will run prior to job start.

    3. The --stageinsize/--stageoutsize option lets you specify the estimated size of the files and/or directories to help Moab more quickly and accurately calculate the amount of time it will take to stage the data and therefore schedule your job correctly. If you used the $JOBHOST variable to stage in, then setting --stageinsize is required. --stageoutsize is always required for staging data out. If you provide an integer, Moab will assume the number is in megabytes. To change the unit, add another suffix. See Stage in or out file size for more information.
      > msub --stageinfile=/davidharris/research/recordlist --stageinsize=100 <jobScript>

      Moab copies the /davidharris/research/recordlist file, which is approximately 100 megabytes, from the biology node to the host where the job will run prior to job start.

  4. To see the status, errors, and other details associated with your data staging job, run checkjob -v. See "checkjob" for details.

    Your checkjob output may include a warning that says "req 1 RM (internal) does not match job destination RM". You can safely ignore this message.

Related Topics 

© 2016 Adaptive Computing