(Click to open topic with navigation)
TORQUE maintains accounting records for batch jobs in the following directory:
$TORQUEROOT/server_priv/accounting/<TIMESTAMP>
$TORQUEROOT defaults to /usr/spool/PBS and <TIMESTAMP> is in the format: YYYYMMDD.
These records include events, time stamps, and information on resources requested and used.
Records for four different event types are produced and are described in the following table:
Record marker | Record type | Description |
---|---|---|
A | abort | Job has been aborted by the server |
C | checkpoint | Job has been checkpointed and held |
D | delete | Job has been deleted |
E | exit | Job has exited (either successfully or unsuccessfully) |
Q | queue | Job has been submitted/queued |
R | rerun | Attempt to rerun the job has been made |
S | start | Attempt to start the job has been made (if the job fails to properly start, it may have multiple job start records) |
T | restart | Attempt to restart the job (from checkpoint) has been made (if the job fails to properly start, it may have multiple job start records) |
Accounting Variables
The following table offers accounting variable descriptions. Descriptions for accounting variables not indicated in the table, particularly those prefixed with Resources_List, are available at Job Submission.
Variable | Description |
---|---|
ctime | Time job was created |
etime | Time job became eligible to run |
qtime | Time job was queued |
start | Time job started to run |
A sample record in this file can look like the following:
08/26/2014 17:07:44;Q;11923.napali;queue=batch
08/26/2014 17:07:50;S;11923.napali;user=dbeer group=company jobname=STDIN queue=batch ctime=1409094464 qtime=1409094464 etime=1409094464 start=1409094470 owner=dbeer@napali exec_host=napali/0+napali/1+napali/2+napali/3+napali/4+napali/5+torque-devtest-03/0+torque-devtest-03/1+torque-devtest-03/2+torque-devtest-03/3+torque-devtest-03/4+torque-devtest-03/5 Resource_List.neednodes=2:ppn=6 Resource_List.nodect=2 Resource_List.nodes=2:ppn=6
08/26/2014 17:08:04;E;11923.napali;user=dbeer group=company jobname=STDIN queue=batch ctime=1409094464 qtime=1409094464 etime=1409094464 start=1409094470 owner=dbeer@napali exec_host=napali/0+napali/1+napali/2+napali/3+napali/4+napali/5+torque-devtest-03/0+torque-devtest-03/1+torque-devtest-03/2+torque-devtest-03/3+torque-devtest-03/4+torque-devtest-03/5 Resource_List.neednodes=2:ppn=6 Resource_List.nodect=2 Resource_List.nodes=2:ppn=6 session=11352 total_execution_slots=12 unique_node_count=2 end=1409094484 Exit_status=265 resources_used.cput=00:00:00 resources_used.mem=82700kb resources_used.vmem=208960kb resources_used.walltime=00:00:14 Error_Path=/dev/pts/11 Output_Path=/dev/pts/11
The value of Resource_List.* is the amount of resources requested, and the value of resources_used.* is the amount of resources actually used.
total_execution_slots and unique_node_count display additional information regarding the job resource usage.