Moab Workload Manager

17.12 Grid Data Management

17.12.1 Grid Data Management Overview

Moab provides a highly generalized data manager interface that can allow both simple and advanced data management services to be used to migrate data amongst peer clusters. Using a flexible script interface, services such as scp, NFS, and gridftp can be used to address data staging needs. This feature enables a Moab peer to push job data to a destination Moab peer.

17.12.2 Configuring Peer Data Staging

Moab offers a simple, automatic configuration, as well as advanced configuration options. At a high level, configuring data staging across a peer-to-peer relationship consists of configuring one or more storage managers, associating them with the appropriate peer resource managers, and then specifying data requirements at the local level—when the job is submitted.

To use the data staging features, you must specify the --with-grid option at ./configure time. After properly configuring data staging, you can submit a job to the peer with any user who has SSH keys set up and Moab will automatically or implicitly stage back the standard out and standard error files created by the job. Files can be implicitly staged in or out before a job runs by using the mstagein or mstageout options of msub.

Automatic Configuration

Moab automatically does most of the data staging configuration based on a simplified set of parameters (most common defaults) in the configuration file (moab.cfg).

Do the following to configure peer data staging:

  1. Configure at least two Moab clusters to work in a grid. Please refer to information throughout 17.0 Moab Workload Manager for Grids for help on configuring Moab clusters to work together as peers in a grid.

  2. Set up SSH keys so that users on the source grid peer can SSH to destination peers without the need for a password.

  3. Make necessary changes to the moab.cfg file of the source grid peer to activate data staging, which involves creating a new data resource manager definition within Moab. The resource manager provides data staging services to existing peers in the grid. By defining the data resource manager within the moab.cfg, Moab automatically sets up all of the necessary data staging auxiliary scripts.

    Use the following syntax for defining a data resource manager:

    RMCFG[<RMName>] TYPE=NATIVE RESOURCETYPE=STORAGE VARIABLES=DATASPACEUSER=<DataSpaceUser>,DATASPACEDIR=<DataSpaceDir> SERVER=<DataServer>
    

    • <RMName>: Name of the RM (defined as a storage RM type by RESOURCETYPE=STORAGE).
    • <DataSpaceUser>: User used to SSH into <DataServer> to determine available space in <DataSpaceDir>. Moab runs a command similar to the following:
        ssh <DataServer> -l <DataSpaceUser> df <DataSpaceDir>
    • <DataSpaceDir>: Directory where staged data is stored.
    • <DataServer>: Name of the server where <DataSpaceDir> is located.

  4. Associate the data resource manager with a peer resource manager.
  5. RMCFG[remote_data] TYPE=NATIVE RESOURCETYPE=STORAGE VARIABLES=DATASPACEUSER=datauser,DATASPACEDIR=/tmp SERVER=clusterhead
    RMCFG[remote_cluster] TYPE=MOAB SERVER=clusterhead:42559 DATARM=remote_data
    

  6. Restart Moab to finalize changes. You can use the mschedctl -R command to cause Moab to automatically restart and load the changes.

    When restarting, Moab recognizes the added configuration and runs a Perl script in the Moab tool directory that configures the external scripts (also found in the tools directory) that Moab uses to perform data staging. You can view the data staging configuration by looking at the config.dstage.pl file in $MOABTOOLSDIR; this file is generated from the config.dstage.pl.dist file each time Moab restarts. Moab replaces any strings of the form @...@ with appropriate values.

Advanced Configuration

If you need a more customized data staging setup, send email to the sales team at sales@adaptivecomputing.com.

17.12.3 Diagnostics

Verify data staging is properly configured by using the following diagnostic commands:

  • mdiag -R -v: Displays the status of the storage manger. Notice that the automatic configuration sets up the necessary *URLs.
  • > mdiag -R -v data
    diagnosing resource managers
    
    RM[data]      State: Active  Type: NATIVE:AGFULL  ResourceType: STORAGE
      Server:             keche
      Timeout:            30000.00 ms
      Cluster Query URL:  exec://$TOOLSDIR/cluster.query.dstage.pl
      RM Initialize URL:  exec://$TOOLSDIR/setup.config.pl
      System Modify URL:  exec://$TOOLSDIR/system.modify.dstage.pl
      System Query URL:   exec://$TOOLSDIR/system.query.dstage.pl
      Nodes Reported:     1 (scp://keche//tmp/)
      Partition:          SHARED
      Event Management:   (event interface disabled)
      Variables:          DATASPACEUSER=root,DATASPACEDIR=/tmp
      RM Languages:       NATIVE
      RM Sub-Languages:   -

  • checknode -v: Executing this on the storage node displays the data staging operations associated with the node and its disk usage.
  • Note The number of bytes transferred for each file is currently not used.

    > checknode -v scp://keche//tmp/
    node scp://keche//tmp/
    
    State:      Idle  (in current state for 00:00:13)
    Configured Resources: DISK: 578G
    Utilized   Resources: DISK: 316G
    Dedicated  Resources: ---
      MTBF(longterm):   INFINITY  MTBF(24h):   INFINITY
    Active Data Staging Operations:  
      job         native.2  complete (1 bytes transferred)  
    (/home/brian/stage.txt)
      job         native.3  pending (1 bytes)  (/home/brian/stage.txt)
    
    Dedicated Storage Manager Disk Usage:  0 of 592235 MB
    Cluster Query URL:  exec://$TOOLSDIR/cluster.query.dstage.pl
    Partition:  SHARED  Rack/Slot:  ---
    Flags:      rmdetected
    RM[data]:   TYPE=NATIVE:AGFULL
    EffNodeAccessPolicy: SHARED
    
    Total Time: 00:12:15  Up: 00:12:15 (100.00%)  Active: 00:00:00 (0.00%)
    
    Reservations:  ---

  • mdiag -n: Displays the state of the storage node.
  • > mdiag -n
    compute node summary
    Name                    State   Procs      Memory         Opsys
    
    compute1                 Idle    4:4      3006:3006       linux
    compute2                 Down    0:4      3006:3006       linux
    scp://keche//tmp/        Idle    0:0         0:0              -
    -----                     ---    4:8      6012:6012       -----
    
    Total Nodes: 3  (Active: 0  Idle: 2  Down: 1)
    

  • checkjob -v: Displays the status of the staging request.
  • Note The remaining time and size of the file information is currently not used. The information should only be used to see file locations and whether the file has been staged or not.

    > checkjob -v jobid
    
    ...
    
    Stage-In Requirements:
    
      localhost:/home/brian/stage.txt => keche:/tmp/staged.txt  size:0B  
    status:[NONE]  remaining:00:00:01
        Transfer URL: file:///home/brian/stage.txt,ssh://keche/tmp/staged.txt
    
    ...

17.12.4 Peer-to-Peer SCP Key Authentication

In order to use scp as the data staging protocol, we will need to create SSH keys which allow users to copy files between the two peers, without the need for passwords. For example, if UserA is present on the source peer, and his counterpart is UserB on the destination peer, then UserA will need to create an SSH key and configure UserB to allow password-less copying. This will enable UserA to copy files to and from the destination peer using Moab's data staging capabilities.

Another common scenario is that several users present on the source peer are mapped to a single user on the destination peer. In this case, each user on the source peer will need to create keys and set them up with the user at the destination peer. Below are steps that can be used to setup SSH keys among two (or more) peers:

Note These instructions were written for OpenSSH version 3.6 and might not work correctly for older versions.

Generate SSH Key on Source Peer

As the user who will be submitting jobs on the source peer, run the following command:

ssh-keygen -t rsa

You will be prompted to give an optional key. Just hit return and ignore this or other settings. When finished, this command will create two files id_rsa and id_rsa.pub located inside the user's ~/.ssh/ directory.

Copy the Public SSH Key to the Destination Peer

Transfer the newly created public key (id_rsa.pub) to the destination peer:

scp ~/.ssh/id_rsa.pub ${DESTPEERHOST}:~

Disable Strict SSH Checking on Source Peer (Optional)

By appending the following to your ~/.ssh/config file you can disable SSH prompts which ask to add new hosts to the "known hosts file." (These prompts can often cause problems with data staging functionality.) Note that the ${DESTPEERHOST} should be the name of the host machine running the destination peer:

Host ${DESTPEERHOST}
CheckHostIP no
StrictHostKeyChecking no
BatchMode yes

Configure Destination Peer User

Now, log in to the destination peer as the destination user and set up the newly created public key to be trusted:

ssh ${DESTPEERUSER}@${DESTPEERHOST}
mkdir -p .ssh; chmod 700 .ssh
cat id_rsa.pub >> .ssh/authorized_keys
chmod 600 .ssh/authorized_keys
rm id_rsa.pub

If multiple source users map to a single destination user, then repeat the above commands for each source user's SSH public key.

Configure SSH Daemon on Destination Peer

Some configuration of the SSH daemon may be required on the destination peer. Typically, this is done by editing the /etc/ssh/sshd_config file. To verify correct configuration, see that the following attributes are set (not commented):

---
RSAAuthentication    yes
PubkeyAuthentication yes
---

If configuration changes were required, the SSH daemon will need to be restarted:

/etc/init.d/sshd restart

Validate Correct SSH Configuration

If all is properly configured, if you issue the following command source peer it should succeed without requiring a password:

scp ${DESTPEERHOST}:/etc/motd /tmp/