TORQUE Resource Manager > Accounting Records

Accounting Records

TORQUE maintains accounting records for batch jobs in the following directory:

$TORQUEROOT/server_priv/accounting/<TIMESTAMP>

$TORQUEROOT defaults to /usr/spool/PBS and <TIMESTAMP> is in the format: YYYYMMDD.

These records include events, time stamps, and information on resources requested and used.

Records for four different event types are produced and are described in the following table:

Record marker Record type Description
A abort Job has been aborted by the server
C checkpoint Job has been checkpointed and held
D delete Job has been deleted
E exit Job has exited (either successfully or unsuccessfully)
Q queue Job has been submitted/queued
R rerun Attempt to rerun the job has been made
S start Attempt to start the job has been made (if the job fails to properly start, it may have multiple job start records)
T restart Attempt to restart the job (from checkpoint) has been made (if the job fails to properly start, it may have multiple job start records)

Accounting Variables

The following table offers accounting variable descriptions. Descriptions for accounting variables not indicated in the table, particularly those prefixed with Resources_List, are available at Job Submission.

Variable Description
ctime Time job was created
etime Time job became eligible to run
qtime Time job was queued
start Time job started to run

A sample record in this file can look like the following:

08/26/2014 17:07:44;Q;11923.napali;queue=batch

08/26/2014 17:07:50;S;11923.napali;user=dbeer group=company jobname=STDIN queue=batch ctime=1409094464 qtime=1409094464 etime=1409094464 start=1409094470 owner=dbeer@napali exec_host=napali/0+napali/1+napali/2+napali/3+napali/4+napali/5+torque-devtest-03/0+torque-devtest-03/1+torque-devtest-03/2+torque-devtest-03/3+torque-devtest-03/4+torque-devtest-03/5 Resource_List.neednodes=2:ppn=6 Resource_List.nodect=2 Resource_List.nodes=2:ppn=6

08/26/2014 17:08:04;E;11923.napali;user=dbeer group=company jobname=STDIN queue=batch ctime=1409094464 qtime=1409094464 etime=1409094464 start=1409094470 owner=dbeer@napali exec_host=napali/0+napali/1+napali/2+napali/3+napali/4+napali/5+torque-devtest-03/0+torque-devtest-03/1+torque-devtest-03/2+torque-devtest-03/3+torque-devtest-03/4+torque-devtest-03/5 Resource_List.neednodes=2:ppn=6 Resource_List.nodect=2 Resource_List.nodes=2:ppn=6 session=11352 total_execution_slots=12 unique_node_count=2 end=1409094484 Exit_status=265 resources_used.cput=00:00:00 resources_used.mem=82700kb resources_used.vmem=208960kb resources_used.walltime=00:00:14 Error_Path=/dev/pts/11 Output_Path=/dev/pts/11

The value of Resource_List.* is the amount of resources requested, and the value of resources_used.* is the amount of resources actually used.

total_execution_slots and unique_node_count display additional information regarding the job resource usage.

© 2014 Adaptive Computing