5 Reporting Framework
The reporting framework is a set of tools to make time-based reports from numerical data. The following sections will
(1) provide an overview of the framework and the concepts related to it, and (2) work through an example report (CPU
Utilization) with details regarding which web services to use and with what data.
The REST API reference is located in the
Report Resource section.
5.1 Overview
5.1.1 Concepts
The reporting framework uses 3 core concepts: reports, datapoints, and samples.
- Report - A report is a time-based view of numerical data.
- Datapoint - A datapoint is a consolidated set of data for a certain time period.
- Sample - A sample is a snapshot of a certain set of data at a particular point in time.
To illustrate, consider the memory utilization of a virtual machine: at any given point in time, you can get the memory
utilization by using your operating system's performance utilities (top for Linux, Task Manager for Windows):
2400/12040MB
By recording the memory utilization and time constantly for 1 minute, you could gather the following data:
Time | Memory Utilization |
---|
3:53:55 PM | 2400/12040 MB |
3:54:13 PM | 2410/12040 MB |
3:54:27 PM | 2406/12040 MB |
3:54:39 PM | 2402/12040 MB |
3:54:50 PM | 2409/12040 MB |
Each of the rows in the table above represent a
sample of data. By averaging the rows we can consolidate them into one
or more
datapoints:
Start time | End Time | Memory Utilization |
---|
3:53:30 PM | 3:54:00 PM | 2400/12040 MB |
3:54:00 PM | 3:54:30 PM | 2408/12040 MB |
3:54:30 PM | 3:55:00 PM | 2406/12040 MB |
Note that each datapoint covers exactly the same amount of time, and averages all samples within that period of
time.
A
report, then, is simply a list of datapoints with some additional configuration information:
Field | Value |
---|
Name | Memory Utilization Report |
Datapoint Duration | 30 seconds |
Report Size | 3 datapoints |
Datapoints:
Start time | End Time | Memory Utilization |
---|
3:53:30 PM | 3:54:00 PM | 2400/12040 MB |
3:54:00 PM | 3:54:30 PM | 2408/12040 MB |
3:54:30 PM | 3:55:00 PM | 2406/12040 MB |
5.1.2 Capabilities
While storing simple information like memory utilization is nice, the reporting framework is built to automatically
handle much more complex information.
Consolidating Samples
Samples are JSON documents which are pushed into the report using the
samples API. Samples
are then stored until the consolidation operation creates a datapoint out of them. The table below shows how different
data types are handled in this operation:
Type | Consolidation Function Handling |
---|
Numbers | Numerical data is averaged |
Strings | Strings are aggregated into an array |
Objects | The consolidation function recursively consolidates sub-objects |
Lists | Lists are combined into a single flat list containing all elements |
Mixed | If samples have different types of data for the same field, the values are aggregated into an array. |
Null | These values will be ignored unless all values for a sample field are set to null, resulting in a null result. |
If the mixed data types contains at least one number, it will be treated as numerical data. The non-numerical
data will be ignored and the result will be averaged.
Below is an example of how the consolidation function works:
Samples:
Time | NumberEx | StringEx | ListEx | MixedEx | MixedNumberEx |
---|
3:53:55 PM | 2400 | "str1" | ["elem1"] | "str1" | "str1" |
3:54:13 PM | 2410 | "str2" | ["elem2", "elem3"] | ["elem1"] | ["elem1"] |
3:54:27 PM | 2405 | "str3" | ["elem4"] | null | 5 |
Resulting Datapoint after consolidation:
Time | NumberEx | StringEx | ListEx | MixedEx | MixedNumberEx |
---|
3:55:00 PM | 2405 | ["str1", "str2", "str3"] | ["elem1", "elem2", "elem3", "elem4"] | ["str1", "elem1"] | 5 |
Minimum Number of Samples
If your dataset is highly variable (i.e. values contained in samples are not very close together), converting a single
sample into a datapoint may provide misleading information. It may be better to have a datapoint with an "Unknown"
value. This can be accomplished by setting the minimum number of samples for a datapoint in the report.
The
minimumSampleSize
field in the
Report API explains that if the specified size of
samples is not met when the consolidation function is performed, the datapoint is considered "null" and no data is available
for it. When this occurs, the sample data is discarded and the
data
field of the datapoint is set to "null".
For information on how to set this option, see the REST API
Report Resource section in the documentation.
Report Size
Reports have a predetermined number of datapoints, or size, which sets a limit on the amount of data that can be stored.
After the report size has been reached, as newly created datapoints are pushed into the report, the oldest datapoints
will automatically be deleted. This is to aid in managing the storage capacity of the server hosting MWS.
On report creation, a Mongo collection will be initialized that is the
maximum size of a single entry (currently 16 MB) multiplied by the report size.
Be careful in setting a large report size as this will quickly allocate the entire
disk if many reports with large report sizes are created.
5.2 Example Report (CPU Utilization)
To understand how the behavior and usage of the reporting framework, a sample report covering
CPU Utilization will be shown in this section. It will not cover how to gather or display data
for reports, but will cover some basic operations that are available with Moab Web Services to
facilitate reporting.
5.2.1 Creating A Report
Before any data is sent to Moab Web Services, a report must first be created. A JSON request body with a HTTP method of POST
must be used to do this.
{
"name":"cpu-util",
"description":"An example report for cpu utilization",
"consolidationFunction":"average",
"datapointDuration":600,
"reportSize":288
}
This will result in a report being created which can then be retrieved by sending a GET request to
/rest/reports/cpu-util
. The
datapointDuration
of
600
signifies that the datapoint consolidation should occur once every 10 minutes, while the
reportSize
(i.e. number of the datapoints) shows that the report will retain up to 2 days worth of the latest datapoints.
{
"consolidationFunction": "average",
"datapointDuration": 600,
"datapoints": [],
"description": "An example report for cpu utilization",
"id": "aef6f6a3a0bz7bf6449537c9d",
"keepSamples": false,
"minimumSampleSize": 1,
"name": "cpu-util",
"reportSize": 288,
"version": 0
}
Note that an ID has been generated automatically and that no datapoints are associated with the report.
5.2.2 Adding Samples
Until samples are added and associated with the report, datapoint consolidation will generate datapoints with a
data
field equal to
null
. Once samples are added, however, they will be averaged and inserted into the next datapoint.
Create samples for the
cpu-util
by sending a POST request as follows:
[
{
"agent": "cpu-monitor",
"timestamp":"2012-01-01 12:00:00 UTC",
"data": {
"minutes1": 0.5,
"minutes5": 0,
"minutes15": 0
}
},
{
"agent": "cpu-monitor",
"timestamp":"2012-01-01 12:01:00 UTC",
"data": {
"minutes1": 1,
"minutes5": 0.5,
"minutes15": 0.05
}
},
{
"agent": "cpu-monitor",
"timestamp":"2012-01-01 12:02:00 UTC",
"data": {
"minutes1": 1,
"minutes5": 0.5,
"minutes15": 0.1
}
},
{
"agent": "cpu-monitor",
"timestamp":"2012-01-01 12:03:00 UTC",
"data": {
"minutes1": 0.75,
"minutes5": 1,
"minutes15": 0.25
}
},
{
"agent": "cpu-monitor",
"timestamp":"2012-01-01 12:04:00 UTC",
"data": {
"minutes1": 0,
"minutes5": 1,
"minutes15": 0.85
}
}
]
This sample data contains average load for the last 1, 5, and 15 minute intervals. The samples were recorded at one-minute
intervals starting at noon on January 1st, 2012.
5.2.3 Consolidating Data
A consolidation function must run to generate datapoints from the given samples. This scheduled consolidation will
occur at intervals of
datapointDuration
seconds. For each field in the
data
object in samples, all values will be averaged.
If non-numeric values are included, the following strategies will be followed:
- All fields which contain a single numeric value in any included sample will be averaged and the non-numeric or null values will be ignored.
- All fields which contain a list will be consolidated into a single, flat list.
- All fields which contain only non-numeric or null values will be consolidated into a single, flat list.
If no historical datapoints are provided in the creation of a report as in this example, the next consolidation
will be scheduled for the current time plus the
datapointDuration
. In this example, the scheduled consolidation
is at 10 minutes from the creation date. If historical datapoints are included in the report creation, the latest
datapoint's
endDate
plus the
datapointDuration
will be used as the scheduled time. If this date was in the past,
the next scheduled consolidation will occur at the appropriate interval from the last
endDate
.
5.2.4 Retrieving Report Data
To retrieve the consolidated datapoints, simply perform a GET request on the report once again. Alternatively,
the GET for a report's
datapoints may be used.
{
"consolidationFunction": "average",
"datapointDuration": 600,
"datapoints": [
{
"firstSampleDate": null,
"lastSampleDate": null,
"data": null,
"startDate": "2012-01-01 11:49:00 UTC",
"endDate": "2012-01-01 11:59:00 UTC"
},
{
"firstSampleDate": "2012-01-01 12:00:00 UTC",
"lastSampleDate": "2012-01-01 12:04:00 UTC",
"data": {
"minutes1": 0.65,
"minutes15": 0.25,
"minutes5": 0.6
},
"startDate": "2012-01-01 11:59:00 UTC",
"endDate": "2012-01-01 12:09:00 UTC"
}
],
"description": "An example report for cpu utilization",
"id": "aef6f6a3a0bz7bf6449537c9d",
"keepSamples": false,
"minimumSampleSize": 1,
"name": "cpu-util",
"reportSize": 288,
"version": 0
}
Note that of the two datapoints above, only the second actually contains data, while the other is set to
null
.
Only samples lying within the datapoint's duration, or from the
startDate
to the
endDate
, are included in the
consolidation. Therefore the first datapoint, which covered the 10 minute period just before the samples' recorded
timestamps, contained no data. The second, which covers the 10 minute period matching that of the samples, contains
the averaged sample data. This data could be used to display consolidated report data in a custom interface.
5.2.5 Possible Configurations
Configuration options may be changed to affect the process of report generation. These are documented in the
API for the
Report object and the
Sample object.