5.651 Test 1 - Basic Operation

5.651.1 Introduction

This test determines if the proper environment has been established.

5.651.2 Test Steps

Submit a test job and the issue a hold on the job.

> qsub -c enabled test.sh

999.xxx.yyy

> qhold 999

5.651.3 Possible Failures

Normally the result of qhold is nothing. If an error message is produced saying that qhold is not a supported feature then one of the following configuration errors might be present.

5.651.4 Successful Results

If no configuration was done to specify a specific directory location for the checkpoint file, the default location is off of the Torque directory, which in my case is /var/spool/torque/checkpoint.

Otherwise, go to the specified directory for the checkpoint image files. This was done by either specifying an option on job submission, i.e. -c dir=/home/test or by setting an attribute on the execution queue. This is done with the command qmgr -c 'set queue batch checkpoint_dir=/home/test'.

Doing a directory listing shows the following.

# find /var/spool/torque/checkpoint

/var/spool/torque/checkpoint

/var/spool/torque/checkpoint/999.xxx.yyy.CK

/var/spool/torque/checkpoint/999.xxx.yyy.CK/ckpt.999.xxx.yyy.1205266630

# find /var/spool/torque/checkpoint |xargs ls -l

-r-------- 1 root root 543779 2008-03-11 14:17 /var/spool/torque/checkpoint/999.xxx.yyy.CK/ckpt.999.xxx.yyy.1205266630

 

/var/spool/torque/checkpoint:

total 4

drwxr-xr-x 2 root root 4096 2008-03-11 14:17 999.xxx.yyy.CK

 

/var/spool/torque/checkpoint/999.xxx.yyy.CK:

total 536

-r-------- 1 root root 543779 2008-03-11 14:17 ckpt.999.xxx.yyy.1205266630

Doing a qstat -f command should show the job in a held state, job_state = H. Note that the attribute checkpoint_name is set to the name of the file seen above.

If a checkpoint directory has been specified, there will also be an attribute checkpoint_dir in the output of qstat -f.

$ qstat -f

Job Id: 999.xxx.yyy

    Job_Name = test.sh

    Job_Owner = [email protected]

    resources_used.cput = 00:00:00

    resources_used.mem = 0kb

    resources_used.vmem = 0kb

    resources_used.walltime = 00:00:06

    job_state = H

    queue = batch

    server = xxx.yyy

    Checkpoint = u

    ctime = Tue Mar 11 14:17:04 2008

    Error_Path = xxx.yyy:/home/test/test.sh.e999

    exec_host = test/0

    Hold_Types = u

    Join_Path = n

    Keep_Files = n

    Mail_Points = a

    mtime = Tue Mar 11 14:17:10 2008

    Output_Path = xxx.yyy:/home/test/test.sh.o999

    Priority = 0

    qtime = Tue Mar 11 14:17:04 2008

    Rerunable = True

    Resource_List.neednodes = 1

    Resource_List.nodect = 1

    Resource_List.nodes = 1

    Resource_List.walltime = 01:00:00

    session_id = 9402 substate = 20

    Variable_List = PBS_O_HOME=/home/test,PBS_O_LANG=en_US.UTF-8,

        PBS_O_LOGNAME=test,

        PBS_O_PATH=/usr/local/perltests/bin:/home/test/bin:/usr/local/s bin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games,

        PBS_O_SHELL=/bin/bash,PBS_SERVER=xxx.yyy,

        PBS_O_HOST=xxx.yyy,PBS_O_WORKDIR=/home/test,

        PBS_O_QUEUE=batch

    euser = test

    egroup = test

    hashname = 999.xxx.yyy

    queue_rank = 3

    queue_type = E comment = Job started on Tue Mar 11 at 14:17

    exit_status = 271

    submit_args = test.sh

    start_time = Tue Mar 11 14:17:04 2008

    start_count = 1

    checkpoint_dir = /var/spool/torque/checkpoint/999.xxx.yyy.CK

    checkpoint_name = ckpt.999.xxx.yyy.1205266630

The value of Resource_List.* is the amount of resources requested.

Related Topics 

© 2016 Adaptive Computing