(Click to open topic with navigation)
This method of data staging has been deprecated in Moab Workload Manager 8.1.0 and will be removed from the product in a future release. See About Data Staging for information about the new method of staging data.
Moab provides a highly generalized data manager interface that can allow both simple and advanced data management services to be used to migrate data amongst peer clusters. Using a flexible script interface, services such as scp, NFS, and gridftp can be used to address data staging needs. This feature enables a Moab peer to push job data to a destination Moab peer.
Moab offers a simple, automatic configuration, as well as advanced configuration options. At a high level, configuring data staging across a peer-to-peer relationship consists of configuring one or more storage managers, associating them with the appropriate peer resource managers, and then specifying data requirements at the local level—when the job is submitted.
To use the data staging features, you must specify the --with-grid option at ./configure time. After properly configuring data staging, you can submit a job to the peer with any user who has SSH keys set up and Moab will automatically or implicitly stage back the standard out and standard error files created by the job. Files can be implicitly staged in or out before a job runs by using the mstagein or mstageout options of msub.
Simple Configuration
Moab automatically does most of the data staging configuration based on a simplified set of parameters (most common defaults) in the configuration file (moab.cfg).
Do the following to configure peer data staging:
Configure at least two Moab clusters to work in a grid. Please refer to information throughout Moab Workload Manager for Grids for help on configuring Moab clusters to work together as peers in a grid.
Set up SSH keys so that users on the source grid peer can SSH to destination peers without the need for a password.
Make necessary changes to the moab.cfg file of the source grid peer to activate data staging, which involves creating a new data resource manager definition within Moab. The resource manager provides data staging services to existing peers in the grid. By defining the data resource manager within the moab.cfg, Moab automatically sets up all of the necessary data staging auxiliary scripts.
Use the following syntax for defining a data resource manager:
RMCFG[<RMName>] TYPE=NATIVE RESOURCETYPE=STORAGE VARIABLES=DATASPACEUSER=<DataSpaceUser>,DATASPACEDIR=<DataSpaceDir> SERVER=<DataServer>
<DataSpaceUser>: User used to SSH into <DataServer> to determine available space in <DataSpaceDir>. Moab runs a command similar to the following:
ssh <DataServer> -l <DataSpaceUser> df <DataSpaceDir>
Define the following URLs:
RMCFG[data] CLUSTERQUERYURL=exec://$TOOLSDIR/cluster.query.dstage.pl RMCFG[data] SYSTEMMODIFYURL=exec://$TOOLSDIR/system.modify.dstage.pl RMCFG[data] SYSTEMQUERYURL=exec://$TOOLSDIR/system.query.dstage.pl RMCFG[data] RMINITIALIZEURL=exec://$TOOLSDIR/setup.config.pl
RMCFG[remote_data] TYPE=NATIVE RESOURCETYPE=STORAGE VARIABLES=DATASPACEUSER=datauser,DATASPACEDIR=/tmp SERVER=clusterhead RMCFG[remote_cluster] TYPE=MOAB SERVER=clusterhead:42559 DATARM=remote_data
When restarting, Moab recognizes the added configuration and runs a Perl script in the Moab tool directory that configures the external scripts (also found in the tools directory) that Moab uses to perform data staging. You can view the data staging configuration by looking at the config.dstage.pl file in $MOABHOMEDIR/etc.
Advanced Configuration
If you need a more customized data staging setup, contact your account representative.
Peer-to-Peer SCP Key Authentication
In order to use scp as the data staging protocol, we will need to create SSH keys which allow users to copy files between the two peers, without the need for passwords. For example, if UserA is present on the source peer, and his counterpart is UserB on the destination peer, then UserAwill need to create an SSH key and configure UserB to allow password-less copying. This will enable UserA to copy files to and from the destination peer using Moab's data staging capabilities.
Another common scenario is that several users present on the source peer are mapped to a single user on the destination peer. In this case, each user on the source peer will need to create keys and set them up with the user at the destination peer. Below are steps that can be used to setup SSH keys among two (or more) peers:
These instructions were written for OpenSSH version 3.6 and might not work correctly for older versions.
Generate SSH Key on Source Peer
As the user who will be submitting jobs on the source peer, run the following command:
ssh-keygen -t rsa
You will be prompted to give an optional key. Just hit return and ignore this or other settings. When finished, this command will create two files id_rsa and id_rsa.pub located inside the user's ~/.ssh/ directory.
Copy the Public SSH Key to the Destination Peer
Transfer the newly created public key (id_rsa.pub) to the destination peer:
scp ~/.ssh/id_rsa.pub ${DESTPEERHOST}:~
Disable Strict SSH Checking on Source Peer (Optional)
By appending the following to your ~/.ssh/config file you can disable SSH prompts which ask to add new hosts to the "known hosts file." (These prompts can often cause problems with data staging functionality.) Note that the ${DESTPEERHOST} should be the name of the host machine running the destination peer:
Host ${DESTPEERHOST} CheckHostIP no StrictHostKeyChecking no BatchMode yes
Configure Destination Peer User
Now, log in to the destination peer as the destination user and set up the newly created public key to be trusted:
ssh ${DESTPEERUSER}@${DESTPEERHOST} mkdir -p .ssh; chmod 700 .ssh cat id_rsa.pub >> .ssh/authorized_keys chmod 600 .ssh/authorized_keys rm id_rsa.pub
If multiple source users map to a single destination user, then repeat the above commands for each source user's SSH public key.
Configure SSH Daemon on Destination Peer
Some configuration of the SSH daemon may be required on the destination peer. Typically, this is done by editing the /etc/ssh/sshd_config file. To verify correct configuration, see that the following attributes are set (not commented):
--- RSAAuthentication yes PubkeyAuthentication yes ---
If configuration changes were required, the SSH daemon will need to be restarted:
/etc/init.d/sshd restart
Validate Correct SSH Configuration
If all is properly configured, if you issue the following command source peer it should succeed without requiring a password:
scp ${DESTPEERHOST}:/etc/motd /tmp/
Verify data staging is properly configured by using the following diagnostic commands:
> mdiag -R -v data diagnosing resource managers RM[data] State: Active Type: NATIVE ResourceType: STORAGE Server: keche Timeout: 30000.00 ms Cluster Query URL: exec://$TOOLSDIR/grid/cluster.query.dstage.pl RM Initialize URL: exec://$TOOLSDIR/grid/setup.config.pl System Modify URL: exec://$TOOLSDIR/grid/system.modify.dstage.pl System Query URL: exec://$TOOLSDIR/grid/system.query.dstage.pl Nodes Reported: 1 (scp://keche//tmp/) Partition: SHARED Event Management: (event interface disabled) Variables: DATASPACEUSER=root,DATASPACEDIR=/tmp RM Languages: NATIVE RM Sub-Languages: -
The number of bytes transferred for each file is currently not used.
> checknode -v scp://keche//tmp/ node scp://keche//tmp/ State: Idle (in current state for 00:00:13) Configured Resources: DISK: 578G Utilized Resources: DISK: 316G Dedicated Resources: --- MTBF(longterm): INFINITY MTBF(24h): INFINITY Active Data Staging Operations: job native.2 complete (1 bytes transferred) (/home/brian/stage.txt) job native.3 pending (1 bytes) (/home/brian/stage.txt) Dedicated Storage Manager Disk Usage: 0 of 592235 MB Cluster Query URL: exec://$TOOLSDIR/grid/cluster.query.dstage.pl Partition: SHARED Rack/Slot: --- Flags: rmdetected RM[data]: TYPE=NATIVE EffNodeAccessPolicy: SHARED Total Time: 00:12:15 Up: 00:12:15 (100.00%) Active: 00:00:00 (0.00%) Reservations: ---
> mdiag -n compute node summary Name State Procs Memory Opsys compute1 Idle 4:4 3006:3006 linux compute2 Down 0:4 3006:3006 linux scp://keche//tmp/ Idle 0:0 0:0 - ----- --- 4:8 6012:6012 ----- Total Nodes: 3 (Active: 0 Idle: 2 Down: 1)
The remaining time and size of the file information is currently not used. The information should only be used to see file locations and whether the file has been staged or not.
> checkjob -v jobid ... Stage-In Requirements: localhost:/home/brian/stage.txt => keche:/tmp/staged.txt size:0B status:[NONE] remaining:00:00:01 Transfer URL: file:///home/brian/stage.txt,ssh://keche/tmp/staged.txt ...
To ensure that SCP key authentication is properly configured, the following conditions must be met:
su - <DATASPACEUSER> -c "/usr/bin/ssh <destination host> -l <DATASPACEUSER> 'df -k //tmp/ 2>&1 || echo FAILED'"