You are here: 24 Data Staging > How-to > Configuring Data Staging

24.2 Configuring Data Staging

To configure data staging

  1. Verify that your firewall and network are correctly configured to allow the scripts to operate as designed.

  2. If you have not already done so, install the modules required to run the data staging scripts. python-paramiko is required for data staging, but python-mock is only required if you intend to run the unit test.
    > yum install python-paramiko python-mock
  3. If you have not already, follow the instructions found in Configuring the SSH keys for the Data Staging Transfer Script.
  4. Ensure that the data staging scripts are installed on your system. To do so, list the contents of the /opt/moab/tools/data-staging directory. You should see the data staging README file, reference scripts, and other related files.
    > ls -l /opt/moab/tools/data-staging

    You can copy and modify the reference scripts and configuration files to meet your specific needs. See the README file packaged in the data-staging directory for information about modifying these files.

  5. Open your moab.cfg file for editing and do each of the following tasks:
    1. Configure the data staging msub filter, located in /opt/moab/tools/data-staging by default, as a client-side filter. See Applying the msub submit filter for more information.
      SUBMITFILTER /opt/moab/tools/data-staging/ds_filter

      The data staging filter checks the msub argument syntax to verify that the arguments make sense and are consistent; attempts a dry run connection via SSH and the file transfer utility to ensure that keys exist for the user on the necessary systems; and attempts to determine the size of the data that will be transferred.

      You can customize the script to meet your specific needs; the file contains detailed comments illustrating its default behavior to facilitate its modification. If you replace or modify the submit filter, it is your responsibility to ensure that the same functionality described in the paragraph above is present in your filter.

      Note that this filter has the DEFAULT_TEMPLATE name which should match the name of the master data staging template in moab.cfg. For more information, see Configuring Data Staging with Advanced Options.

    2. Set the data staging bandwidth gmetric (DATASTAGINGBANDWIDTH_MBITS_PER_SEC) on each partition associated with an RM to the rate at which its network to be used for data staging transfers data in megabits per second (see Per-Partition Settings for more information). Moab will use the specified rate and the data staging size specified at job submission (see Stage in or out file size for more information) to determine how long staging the data will take and to schedule the job as soon after data staging completes as possible.

      Example 24-1: Non-grid

      RMCFG[torque]  Type=pbs
      PARCFG[torque] GMETRIC[DATASTAGINGBANDWIDTH_MBITS_PER_SEC]=58

      Partition torque has a transfer rate of 58 megabits per second. Moab uses the rate when it estimates the time it will take to stage data in and determine when to schedule the job that will use the data.

      Example 24-2: Grid

      RMCFG[m1] type=Moab
      PARCFG[m1] GMETRIC[DATASTAGINGBANDWIDTH_MBITS_PER_SEC]=100

      Partition m1 has a transfer rate of 100 megabits per second. Moab uses the rate when it estimates the time it will take to stage data in and determine when to schedule the job that will use the data.

    3. Set the bandwidth generic resource on all nodes to limit the total number of concurrent data staging jobs in your system.
      NODECFG[GLOBAL] GRES=bandwidth:10

      Data staging jobs can use up to 10 units of bandwidth on the system. You can specify the number of units consumed by each data staging job when you configure the data staging job templates.

    4. Configure moab with "JOBMIGRATEPOLICY JUSTINTIME".

      DataStaging requires "JOBMIGRATEPOLICY JUSTINTIME" to ensure the workflow job ids are not altered upon submission.

  6. Install the msub client filter on all client submission hosts.

Related Topics 

© 2016 Adaptive Computing