(Click to open topic with navigation)
If the prologue script executes successfully, it should exit with a zero status. Otherwise, the script should return the appropriate error code as defined in the table below. The pbs_mom will report the script's exit status to pbs_server which will in turn take the associated action. The following table describes each exit code for the prologue scripts and the action taken.
Error | Description | Action |
---|---|---|
-4 | The script timed out | Job will be requeued |
-3 | The wait(2) call returned an error | Job will be requeued |
-2 | Input file could not be opened | Job will be requeued |
-1 |
Permission error (script is not owned by root, or is writable by others) |
Job will be requeued |
0 | Successful completion | Job will run |
1 | Abort exit code | Job will be aborted |
>1 | other | Job will be requeued |
Example 4-30:
Following are example prologue and epilogue scripts that write the arguments passed to them in the job's standard out file:
prologue | |
---|---|
Script | #!/bin/sh
echo "Prologue Args:" echo "Job ID: $1" echo "User ID: $2" echo "Group ID: $3" echo "" exit 0 |
stdout | Prologue Args:
Job ID: 13724.node01 User ID: user1 Group ID: user1 |
epilogue | |
---|---|
Script | #!/bin/sh
echo "Epilogue Args:" echo "Job ID: $1" echo "User ID: $2" echo "Group ID: $3" echo "Job Name: $4" echo "Session ID: $5" echo "Resource List: $6" echo "Resources Used: $7" echo "Queue Name: $8" echo "Account String: $9" echo "" exit 0 |
stdout | Epilogue Args:
Job ID: 13724.node01 User ID: user1 Group ID: user1 Job Name: script.sh Session ID: 28244 Resource List: neednodes=node01,nodes=1,walltime=00:01:00 Resources Used: cput=00:00:00,mem=0kb,vmem=0kb,walltime=00:00:07 Queue Name: batch Account String: |
Example 4-31:
The Ohio Supercomputer Center contributed the following scripts:
"prologue creates a unique temporary directory on each node assigned to a job before the job begins to run, and epilogue deletes that directory after the job completes.
Having a separate temporary directory on each node is probably not as good as having a good, high performance parallel filesystem.
prologue
#!/bin/sh
# Create TMPDIR on all the nodes
# Copyright 1999, 2000, 2001 Ohio Supercomputer Center
# prologue gets 3 arguments:
# 1 -- jobid
# 2 -- userid
# 3 -- grpid
#
jobid=$1
user=$2
group=$3
nodefile=/var/spool/pbs/aux/$jobid
if [ -r $nodefile ] ; then
nodes=$(sort $nodefile | uniq)
else
nodes=localhost
fi
tmp=/tmp/pbstmp.$jobid
for i in $nodes ; do
ssh $i mkdir -m 700 $tmp \&\& chown $user.$group $tmp
done
exit 0
epilogue
#!/bin/sh
# Clear out TMPDIR
# Copyright 1999, 2000, 2001 Ohio Supercomputer Center
# epilogue gets 9 arguments:
# 1 -- jobid
# 2 -- userid
# 3 -- grpid
# 4 -- job name
# 5 -- sessionid
# 6 -- resource limits
# 7 -- resources used
# 8 -- queue
# 9 -- account
#
jobid=$1
nodefile=/var/spool/pbs/aux/$jobid
if [ -r $nodefile ] ; then
nodes=$(sort $nodefile | uniq)
else
nodes=localhost
fi
tmp=/tmp/pbstmp.$jobid
for i in $nodes ; do
ssh $i rm -rf $tmp
done
exit 0
prologue, prologue.user, and prologue.parallel scripts can have dramatic effects on job scheduling if written improperly.
Related Topics