Chapter 5 Node Resource Plug-in

There is now an API for creating a resource plug-in to allow the reporting of custom varattrs, generic resources, generic metrics, and node features. Additionally, jobs can be made to report custom resources through the same plug-in. The purpose of this plug-in is to allow some resource integration to happen outside of the normal code release cycle and without having to be part of the main codebase for Torque This should allow specific sites to implement things that are not of general interest, as well as provide a tight integration option for resources that vary widely based on hardware.

Torque's resource plug-in capability provides an API through which a Torque plug-in can add arbitrary generic resources, generic metrics, varattrs, and features to a node. Additionally, Torque plug-in can add arbitrary resource usage per job.

The API can be found in trq_plugin_api.h. To implement a plug-in, you must implement all of the API functions, even if the function does nothing. An implementation that does nothing may be found in contrib/resource_plugin.cpp. If you wish, you may simply add the desired functionality to this file, build the library, and link it to the MOM at build time.

5.696.3 Plug-in Implementation Recommendations

Your plug-in must execute very quickly in order to avoid causing problems for the pbs_mom daemon. The node resource portion of the plug-in has a 5 second time limit, and the job resource usage portion has a 3 second time limit. The node resource portion executes each time the MOM sends a status to pbs_server, and the job resource usage portion executes once per job at the same time interval. The node resource and job resource portions block pbs_mom while they are executing, so they should execute in a short, deterministic amount of time.

Remember, you are responsible for plug-ins, so please design well and test thoroughly.

5.696.4 Building the Plug-in

If you do not change the name of the .cpp file and wish to build it, execute the following:

export TRQ_HEADER_LOCATION=/usr/local/include/
g++ -fPIC -I $TRQ_HEADER_LOCATION resource_plugin.cpp -shared -o libresource_plugin.so

NOTE: Change TRQ_HEADER_LOCATION if you configured torque with the --prefix option.

5.696.5 Testing the Plug-in

NOTE: You assume all responsibility for any plug-ins. This document is intended to assist you in testing the plug-ins, but this list of suggested tests may not be comprehensive. We do not assume responsibility if these suggested tests do not cover everything.

5.696.5.A Testing Basic Functionality

Once you've implemented and built your library, you can begin testing. For your convenience, a simple test driver can be found in plugin_driver.cpp. You can build this executable and link it against your library as shown below in order to manually verify the output:

export PLUGIN_LIB_PATH=/usr/local/lib
g++ plugin_driver.cpp -I $TRQ_HEADER_LOCATION -L $PLUGIN_LIB_PATH -lresource_plugin -o driver

You can then execute the driver and manually inspect the output:

./driver

NOTE: Change PLUGIN_LIB_PATH if you have installed the plug-in somewhere other than /usr/local/lib.

To illustrate output, a simple plug-in that reports:

will have the output:

$ ./driver
Your plugin reported the following for the random pid 7976:
stormlight = 2broams
Your plugin reports the following for this host:
	GRES:
		hbmem = 1024
 
	GMETRICS:
		temperature = 75.20
 
	VARATTRS:
		octave = 3.2.4
 
	FEATURES: haswell

5.696.5.B Testing for Memory Leaks

In order to prevent your compute nodes from being compromised for speed or even going down due to out-of-memory conditions, you should run your plug-in under valigrind to test that it is correctly managing memory.

Assuming you are executing the driver from the "Testing Basic Functionality" section, you can run:

valgrind --tool=memcheck --leak-check=full --log-file=plugin_valgrind_output.txt ./driver

If you are not familiar with valgrind, a good primer can be found at The Valgrind Quick Start Guide.

We recommend that you fix all errors reported by valgrind.

5.696.6 Enabling the Plug-in

Once you've implemented, built, and thoroughly tested your plug-in (remember that our suggestions may not address everything), you will want to enable it in Torque. Your plug-in can be linked in at build time:

./configure <your other options> --with-resource-plugin=<path to your resource plugin>

NOTE: You will want to make sure that the path you specify is in $LD_LIBRARY_PATH, or can otherwise be found by pbs_mom when you start the daemon.

Once you build, you can then start the new MOM and be able to observe the plug-in's output using pbsnodes, qstat -f, and in the accounting file.

Sample pbsnodes output:

<normal output>
gres:hbmem = 20
gmetric:temperature = 76.20
varattr:octave = 3.2.4
features = haswell

The keywords at the front let Moab know what each line means, so it can use them accordingly.

Sample accounting file entry:

<normal entry until resources used> resources_used.cput=0
resources_used.energy_used=0 resources_used.mem=452kb
resources_used.vmem=22372kb resources_used.walltime=00:05:00
resources_used.stormlight=2broams

Your plug-in resources reported will appear in the form:

resources_used.<name you supplied>=<value you supplied>

The above example includes the arbitrary resource stormlight and the value of 2broams.

Sample qstat -f output:

<normal qstat -f output>
resources_used.stormlight = 2broams

The resources used reported bythe plug-in will appear at the end of the qstat -f output in the same format as in the accounting file.

© 2016 Adaptive Computing