You are here: Appendices > Appendix L Cray-Specific Power Management and Energy-Consumption-by-Job Accounting > Moab Energy-Consumption-by-Job Accounting

L.3 Moab Energy-Consumption-by-Job Accounting

A Cray XC system, running CLE 5.2 or later, can monitor and compute the energy consumed by each compute node used by a job. These energy consumption values can be used to compute the energy consumption of a job and charge for the cost of the energy consumed.

The figure that follows identifies the Moab HPC Suite and the Cray system components that, working together, permit a Cray site to charge a user for the energy a job consumes.

Image L-3: Moab Energy Accounting Architecture and Information Flow

Click to enlarge

L.3.1 Cray XC Systems Energy Monitoring

In order to monitor and gather energy consumption by job, the Cray ALPS subsystem must execute a job step "prologue" and "epilogue" that together allow it to compute the energy consumption of each compute node allocated to a job and sum them for total energy consumption by a job step. The prologue and epilogue scripts must be configured or ALPS will not perform any energy consumption computation and recording.

L.3.2 Cray Resource Utilization Record (RUR) and Job Energy Consumption

The Cray ALPS subsystem records the resource usage of a job step in a Resource Utilization Record (RUR). These records can be stored in a system RUR file. Each time a job executes the aprun command, which defines a "job step", the ALPS system records one RUR entry for the job step in the RUR file, which entry includes the energy used by the job's compute nodes during the job step.

Computing the energy used by a job step requires ALPS to execute the job step prologue script to record the current “power meter reading” of all compute nodes allocated to the job and then execute the job step epilogue script to record the new current “power meter reading” of each compute node, compute the differences between the first and second readings, and then sum the differences to obtain the job step’s energy_used value. The energy_used units is joules (1 joule = 1 watt-second; 1 kilo-watt hour = 3.6 million joules).

To compute the energy consumption for a job, the energy_used value from each RUR entry for a job’s job steps must be summed to compute a job's total energy_used value.

L.3.3 RUR File Processing and "energy_used" Generic Metric

If the RUR file is present, Torque automatically extracts the RUR entries for a job as part of the pbs_mom's job termination processing and sums the job steps’ RUR energy_used values to obtain a single energy_used "generic metric" for the job, which the pbs_mom passes on to the Torque pbs_server daemon as part of the job's resource usage information. The pbs_server daemon delivers the generic metric to the Moab Workload Manager, where it becomes part of the job's information in Moab.

L.3.4 "EnergyUsed" Generic Metric and Moab Accounting Manager (MAM)

If the Moab Workload Manager passes job information to the Moab Accounting Manager (MAM), the energy_used generic metric is part of the job information passed. If received, MAM converts the Moab job generic metric into its own EnergyUsed metric associated with the job. If MAM must compute a cost for the energy a job used, the system administrator must configure MAM with a charge that uses the EnergyUsed metric’s value to compute the energy cost of the job.

Related Topics 

© 2016 Adaptive Computing