4.289 Prologue Error Processing

If the prologue script executes successfully, it should exit with a zero status. Otherwise, the script should return the appropriate error code as defined in the table below. The pbs_mom will report the script's exit status to pbs_server which will in turn take the associated action. The following table describes each exit code for the prologue scripts and the action taken.

Error Description Action
-4 The script timed out Job will be requeued
-3 The wait(2) call returned an error Job will be requeued
-2 Input file could not be opened Job will be requeued
-1

Permission error

(script is not owned by root, or is writable by others)

Job will be requeued
0 Successful completion Job will run
1 Abort exit code Job will be aborted
>1 other Job will be requeued

Example 4-200:  

Following are example prologue and epilogue scripts that write the arguments passed to them in the job's standard out file:

prologue
Script #!/bin/sh
echo "Prologue Args:"
echo "Job ID: $1"
echo "User ID: $2"
echo "Group ID: $3"
echo ""

exit 0
stdout Prologue Args:
Job ID: 13724.node01
User ID: user1
Group ID: user1
epilogue
Script #!/bin/sh
echo "Epilogue Args:"
echo "Job ID: $1"
echo "User ID: $2"
echo "Group ID: $3"
echo "Job Name: $4"
echo "Session ID: $5"
echo "Resource List: $6"
echo "Resources Used: $7"
echo "Queue Name: $8"
echo "Account String: $9"
echo ""

exit 0
stdout Epilogue Args:
Job ID: 13724.node01
User ID: user1
Group ID: user1
Job Name: script.sh
Session ID: 28244
Resource List: neednodes=node01,nodes=1,walltime=00:01:00
Resources Used: cput=00:00:00,mem=0kb,vmem=0kb,walltime=00:00:07
Queue Name: batch
Account String:

Example 4-201:  

The Ohio Supercomputer Center contributed the following scripts:

"prologue creates a unique temporary directory on each node assigned to a job before the job begins to run, and epilogue deletes that directory after the job completes.

Having a separate temporary directory on each node is probably not as good as having a good, high performance parallel filesystem.

prologue

 

#!/bin/sh

# Create TMPDIR on all the nodes

# Copyright 1999, 2000, 2001 Ohio Supercomputer Center

# prologue gets 3 arguments:

# 1 -- jobid

# 2 -- userid

# 3 -- grpid

#

jobid=$1

user=$2

group=$3

nodefile=/var/spool/pbs/aux/$jobid

if [ -r $nodefile ] ; then

    nodes=$(sort $nodefile | uniq)

else

    nodes=localhost

fi

tmp=/tmp/pbstmp.$jobid

for i in $nodes ; do

    ssh $i mkdir -m 700 $tmp \&\& chown $user.$group $tmp

done

exit 0

epilogue

 

#!/bin/sh

# Clear out TMPDIR

# Copyright 1999, 2000, 2001 Ohio Supercomputer Center

# epilogue gets 9 arguments:

# 1 -- jobid

# 2 -- userid

# 3 -- grpid

# 4 -- job name

# 5 -- sessionid

# 6 -- resource limits

# 7 -- resources used

# 8 -- queue

# 9 -- account

#

jobid=$1

nodefile=/var/spool/pbs/aux/$jobid

if [ -r $nodefile ] ; then

    nodes=$(sort $nodefile | uniq)

else

    nodes=localhost

fi

tmp=/tmp/pbstmp.$jobid

for i in $nodes ; do

    ssh $i rm -rf $tmp

done

exit 0

prologue, prologue.user, and prologue.parallel scripts can have dramatic effects on job scheduling if written improperly.

Related Topics 

© 2017 Adaptive Computing