You are here: 24 Data Staging > How-to > Configuring Data Staging with Advanced Options

24.7 Configuring Data Staging with Advanced Options

24.7.1 Using a Different Default Template Name

When you submit a data staging job, a data staging job template is attached to the job automatically. In the reference script configuration, the default template name is ds. This is the template that will be attached to the compute job by the client msub filter.

If you would like to change the name of the default template that is automatically attached, you should change the value of DEFAULT_TEMPLATE in the ds_config.py file installed on all client submit hosts. This name must match the master data staging template name specified in the Moab configuration file.

To configure the DEFAULT_TEMPLATE variable

  1. Open the ds_config.py file for modification. It is located in /opt/moab/tools/data-staging/ by default.
    [moab]$ vi /opt/moab/tools/data-staging/ds_config.py
  2. Locate the DEFAULT_TEMPLATE parameter.
    ...
    DEFAULT_TEMPLATE = "ds"
    ...
  3. Replace the template name with the one specified in the Moab configuration file.

    ds_config.py

    ...
    DEFAULT_TEMPLATE = "datastaging"
    ...
     
    moab.cfg
    ...
    JOBCFG[datastaging] TEMPLATEDEPEND=...
  4. Make these changes on all client submit hosts.

24.7.2 Supporting Multiple File Transfer Script Utilities in a Grid on a Per-Partition Basis

If you want a different transfer script to run based on which partition the job is submitted to, you can configure a multiplexer script that will switch execution to various other scripts based on the partition.

To support multiple file transfer script utilities in a grid on a per-partition basis

  1. Configure the trigger in your job templates in moab.cfg to run ds_move_multiplex instead of ds_move_rsync or ds_move_scp.
  2. Configure the PARTITION_TO_SCRIPT variable in ds_config.py to provide a mapping from each partition to the desired script to run.
    1. Open the ds_config.py file for modification. It is located in /opt/moab/tools/data-staging/ by default.
      [moab]$ vi /opt/moab/tools/data-staging/ds_config.py
    2. Locate the PARTITION_TO_SCRIPT parameter.
      ...
      PARTITION_TO_SCRIPT =
      {"partition_1_name":"/opt/moab/tools/data-staging/ds_move_rsynch",
       "partition_2_name":"/opt/moab/tools/data-staging/ds_move_scp",
       "partition_3_name":"/opt/moab/tools/data-staging/ds_move_rsync"}
      ...
    3. Replace the partition_*_names with partitions that exist in your configuration. After each partition, specify the script that you want to execute for that partition.

24.7.3 Receiving Notification at the Completion of the Data Staging Job

If you want explicit notification in case of failure of the stage out job, add an additional trigger to the dsout job template which will send email notification to the job's submitter. For more information, see Using a Trigger to Send Email.

JOBCFG[dsout]  DATASTAGINGSYSJOB=TRUE
JOBCFG[dsout]  GRES=bandwidth:1
JOBCFG[dsout]  FLAGS=GRESONLY
JOBCFG[dsout]  TRIGGER=EType=start,AType=exec,Action="/opt/moab/tools/data-staging/ds_move_rsync --stageout",Flags=attacherror:objectxmlstdin:user
JOBCFG[dsout]  TRIGGER=EType=fail,AType=mail,Action="Your (stageout) data staging job $OID failed."

The first trigger listed in the template configuration should be the exec trigger. Add the email trigger and any other triggers after the exec trigger. You can modify the email trigger to run at completion rather than at failure. You can also add this type of trigger to stage in jobs.

24.7.4 Adding a Non-Default Template via msub

You can have multiple data staging template workflows defined in the moab.cfg. The submit filter is configured to add only one of them by default. If you wish to use one of the other available templates, you can do so by using the -l template=TEMPLATENAME option in the msub command:

Given the following moab.cfg:

#Default data staging template:
 
JOBCFG[ds]     TEMPLATEDEPEND=AFTEROK:dsin TEMPLATEDEPEND=BEFORE:dsout SELECT=TRUE
JOBCFG[dsin]   DATASTAGINGSYSJOB=TRUE
JOBCFG[dsin]   GRES=bandwidth:2
JOBCFG[dsin]   FLAGS=GRESONLY
JOBCFG[dsin]   TRIGGER=EType=start,AType=exec,Action="/opt/moab/tools/data-staging/ds_move_rsync --stagein",Flags=attacherror:objectxmlstdin:user
 
JOBCFG[dsout]  DATASTAGINGSYSJOB=TRUE
JOBCFG[dsout]  GRES=bandwidth:1
JOBCFG[dsout]  FLAGS=GRESONLY
JOBCFG[dsout]  TRIGGER=EType=start,AType=exec,Action="/opt/moab/tools/data-staging/ds_move_rsync --stageout",Flags=attacherror:objectxmlstdin:user
 
#experimental data staging template:
 
JOBCFG[dscustom]     TEMPLATEDEPEND=AFTEROK:dscustomin TEMPLATEDEPEND=BEFORE:dscustomout SELECT=TRUE
JOBCFG[dscustomin]   DATASTAGINGSYSJOB=TRUE
JOBCFG[dscustomin]   GRES=bandwidth:2
JOBCFG[dscustomin]   FLAGS=GRESONLY
JOBCFG[dscustomin]   TRIGGER=EType=start,AType=exec,Action="/opt/moab/tools/data-staging/ds_move_custom --stagein",Flags=attacherror:objectxmlstdin:user
 
JOBCFG[dscustomout]  DATASTAGINGSYSJOB=TRUE
JOBCFG[dscustomout]  GRES=bandwidth:1
JOBCFG[dscustomout]  FLAGS=GRESONLY
JOBCFG[dscustomout]  TRIGGER=EType=start,AType=exec,Action="/opt/moab/tools/data-staging/ds_move_custom --stageout",Flags=attacherror:objectxmlstdin:user

The user could submit a job using the custom data staging template with the following command:

[moab]$ msub -l template=dscustom …

24.7.5 Using msub to Return all the job IDs in the Workflow at Submission Time

By default, msub will print the job ID to stdout at the time of submission. If you would like to have msub print all of the jobs that are created as part of the data staging workflow template, you can use the msub --workflowjobids option to show all the job IDs at submission time:

$ echo sleep 60 | msub -l walltime=15 --workflowjobids
 
MoabA.3.dsin MoabA.3 MoabA.3.dsout

This could be useful if you are writing scripts to do your own workflows and you need to programmatically capture the data stage out job name for use in your workflow.

Related Topics 

© 2016 Adaptive Computing