L.6 Tracing Energy Usage from the Cray XC System to MAM

This section contains screenshots that show the progression of the energy used by a job from the Cray XC system to the Moab Accounting Manager. The job submission requested 10 compute nodes with 32 cores on each node. The job submission was assigned job ID 121.

L.6.1 Cray RUR File and Energy Consumption

The Cray XC system, from which the following screenshots and information were obtained, had configured the location of its RUR file in its rur.conf file as illustrated:

# The File output plugin.
# Write RUR output to a single plain text file
# Arg - The destination text file
[file]
output: /opt/cray/rur/default/bin/file_output.py
#arg: path-to-flat-textfile
arg: /lus/scratch/RUR/output/rur.output

The screenshot that follows shows the RUR file is accessible on the login node from a known path (see cat command parameter) to which the actual path has been mapped. The command’s output displays the energy_used metric in one of the RUR entries for job 121.sdb in the Cray RUR file.

login:~ # cat /etc/opt/cray/RUR/rur.output | grep 121.sdb
uid: 12795, apid: 9648, jobid: 121.sdb, cmdname: /home/crayadm/calc, plugin: energy ['energy_used', 3353472]
uid: 12795, apid: 9648, jobid: 121.sdb, cmdname: /home/crayadm/calc, plugin: timestamp APP_START 2014-10-20T22:44:32CDT APP_STOP 2014-10-20T23:03:44CDT

From these screenshots, you can see the job used 3,353,472 joules (plugin: energy). In addition, ALPS indicated the job’s application (ALPS job step) executed for 1,152 seconds as computed from the application's start and stop timestamps (plugin: timestamp).

L.6.2 Torque Resource Manager and Energy Consumption

This Torque resource manager qstat command output screenshot shows the energy_used generic metric is part of the Torque job’s resource usage information:

sdb:~ # qstat -f 121
Job Id: 121.sdb
Job_Name = hello
Job_Owner = crayadm@snake-p1
resources_used.cput = 00:00:00
resources_used.energy_used = 3353472
resources_used.mem = 6112kb
resources_used.vmem = 141256kb
resources_used.walltime = 00:19:14
job_state = C
queue = batch
server = sdb

In addition, according to Torque, job 121 took 1,154 seconds to execute, which includes 2 additional seconds for ALPS and Torque processing overhead, including the RUR-based energy consumption data extraction process.

L.6.3 Moab Workload Manager (MWM) and Energy Consumption

This Moab workload manager screenshot shows the energy_used generic metric is part of the Moab job’s generic metrics information:

login:~ # checkjob -v 121

job 121 (RM job '121.sdb')

AName: calc
State: Completed 
Completion Code: 0    Time: Mon Oct 20 23:03:49
Creds:  user:crayadm  group:crayadm  class:batch
WallTime:   00:19:17 of 00:30:00
SubmitTime: Mon Oct 20 22:44:29
(Time Queued  Total: 00:00:00  Eligible: 00:00:00)

TemplateSets:  DEFAULT
NodeMatchPolicy: EXACTNODE
Total Requested Tasks: 320
Total Requested Nodes: 10

Req[0]  TaskCount: 320  Partition: snake
GMetric[energy_used]  Current: 0.00  Min: 0.00  Max: 0.00  Avg: 0.00 Total: 3353472.00
TasksPerNode: ==32  NodeCount:  10

Allocated Nodes:
[39:32][38:32][37:32][50:32][59:32][56:32]
[57:32][36:32][49:32][51:32]


SystemID:   Moab
SystemJID:  121
Notification Events: JobFail
Task Distribution: 39,38,37,50,59,56,57,36,49,51
UMask:          0000 
OutputFile:     nid00029:/ufs/home/crayadm/calc.o121
ErrorFile:      nid00029:/ufs/home/crayadm/calc.e121
StartCount:     1
Execution Partition:  snake
SrcRM:          snake  DstRM: snake  DstRMJID: 121.sdb
Submit Args:    -l nodes=10,ppn=32,walltime=30:00 calc
Flags:          RESTARTABLE
Attr:           checkpoint
StartPriority:  1
PE:             320.00

In addition, according to Moab, job 121 took 1,157 seconds to execute, which includes ALPS, Torque, and Moab processing overhead with 3 additional seconds for Torque and Moab job processing overhead.

L.6.4 Moab Accounting Manager (MAM), Energy Consumption, and Charging

This MAM screenshot shows the charge rates set up that will apply to job 121:

sdb:~ # mam-list-chargerates
Name       Value Amount       Description                   
---------- ----- ------------ ------------------------------------ 
EnergyUsed       0.15/3600000 15 cents/kilowatt-hour (3.6M joules)
Processors       0.03/h       3 cents/processor-hour

Notice that the EnergyUsed charge rate is 15 cents per kilowatt-hour (1 kWh = 3,600,000 joules) and the Processors charge rate is 3 cents per core-hour.

This MAM screenshot shows the job ID, the job’s processor count, the job runtime in seconds, and the job’s energy usage; all of which will be involved in the job charge computation.

sdb:~ # mam-list-usagerecords -J 121 --show Instance,Processors,Duration,EnergyUsed
Instance Processors Duration EnergyUsed 
-------- ---------- -------- ---------- 
121      320        1157     3353472

This MAM screenshot shows the charge computed by MAM using the job time in seconds, the processor count, and the energy usage with the rates identified in the previous example.

sdb:~ # mam-list-usagerecords -J 121
Id Type Instance Charge Stage  User    Group   Account   Organization Class QualityOfService Machine Nodes Processors EnergyUsed Memory Duration StartTime           EndTime             Description 
-- ---- -------- ------ ------ ------- ------- --------- ------------ ----- ---------------- ------- ----- ---------- ---------- ------ -------- ------------------- ------------------- ----------- 
6  Job  121        5.28 Charge crayadm crayadm chemistry sciences     batch                  snake   10    320        3353472           1157     2014-10-20 22:44:29 2014-10-20 23:03:49

The cost of $3.23 is the sum of the energy rate × energy usage ($0.15 per kWh × 3,353,472 joules ÷ 3,600,000 joules per kWh) plus the processor-seconds rate × processor-seconds ($0.03 per core-hour × 320 processors × 1,157 seconds ÷ 3,600 seconds/hour) = $0.14 ($0.139728 rounded) + $3.09 ($3.085333 rounded) = $3.23.

This MAM screenshot identifies the individual charges described in the previous example that make up the job's total charge.

sdb:~ # mam-list-itemizedcharges -J 121
UsageRecord Instance Name       Value   Duration Rate         ScalingFactor Amount CreationTime        Description 
----------- -------- ---------- ------- -------- ------------ ------------- ------ ------------------- ----------- 
3182        121      EnergyUsed 3353472          0.15/3600000 1               0.14 2014-10-20 23:03:51
3182        121      Processors 320     1157     0.03/h       1               5.14 2014-10-20 23:03:51            

All MAM configurations necessary for computing a charge (e.g., defining charge precision, etc) are not identified here since the intent of this section is to illustrate how an administrator can trace energy consumption for a job through the various software components and systems involved. See the Moab Accounting Manager documentation for any additional generic charging configuration needed.

Related Topics 

© 2016 Adaptive Computing