(Quick Reference)

5 Reporting Framework

5 Reporting Framework

The reporting framework is a set of tools to make time-based reports from numerical data. The following sections will (1) provide an overview of the framework and the concepts related to it, and (2) work through an example report (CPU Utilization) with details regarding which web services to use and with what data.

The REST API reference is located in the Report Resource section.

5.1 Overview

5.1.1 Concepts

The reporting framework uses 3 core concepts: reports, datapoints, and samples.
  • Report - A report is a time-based view of numerical data.
  • Datapoint - A datapoint is a consolidated set of data for a certain time period.
  • Sample - A sample is a snapshot of a certain set of data at a particular point in time.

To illustrate, consider the memory utilization of a virtual machine: at any given point in time, you can get the memory utilization by using your operating system's performance utilities (top for Linux, Task Manager for Windows):

2400/12040MB
By recording the memory utilization and time constantly for 1 minute, you could gather the following data:
TimeMemory Utilization
3:53:55 PM2400/12040 MB
3:54:13 PM2410/12040 MB
3:54:27 PM2406/12040 MB
3:54:39 PM2402/12040 MB
3:54:50 PM2409/12040 MB
Each of the rows in the table above represent a sample of data. By averaging the rows we can consolidate them into one or more datapoints:
Start timeEnd TimeMemory Utilization
3:53:30 PM3:54:00 PM2400/12040 MB
3:54:00 PM3:54:30 PM2408/12040 MB
3:54:30 PM3:55:00 PM2406/12040 MB
Note that each datapoint covers exactly the same amount of time, and averages all samples within that period of time.
A report, then, is simply a list of datapoints with some additional configuration information:
FieldValue
NameMemory Utilization Report
Datapoint Duration30 seconds
Report Size3 datapoints

Datapoints:

Start timeEnd TimeMemory Utilization
3:53:30 PM3:54:00 PM2400/12040 MB
3:54:00 PM3:54:30 PM2408/12040 MB
3:54:30 PM3:55:00 PM2406/12040 MB

5.1.2 Capabilities

While storing simple information like memory utilization is nice, the reporting framework is built to automatically handle much more complex information.

Consolidating Samples

Samples are JSON documents which are pushed into the report using the samples API. Samples are then stored until the consolidation operation creates a datapoint out of them. The table below shows how different data types are handled in this operation:

TypeConsolidation Function Handling
NumbersNumerical data is averaged
StringsStrings are aggregated into an array
ObjectsThe consolidation function recursively consolidates sub-objects
ListsLists are combined into a single flat list containing all elements
MixedIf samples have different types of data for the same field, the values are aggregated into an array.
NullThese values will be ignored unless all values for a sample field are set to null, resulting in a null result.

If the mixed data types contains at least one number, it will be treated as numerical data. The non-numerical data will be ignored and the result will be averaged.

Below is an example of how the consolidation function works:

Samples:
TimeNumberExStringExListExMixedExMixedNumberEx
3:53:55 PM2400"str1"["elem1"]"str1""str1"
3:54:13 PM2410"str2"["elem2", "elem3"]["elem1"]["elem1"]
3:54:27 PM2405"str3"["elem4"]null5

Resulting Datapoint after consolidation:

TimeNumberExStringExListExMixedExMixedNumberEx
3:55:00 PM2405["str1", "str2", "str3"]["elem1", "elem2", "elem3", "elem4"]["str1", "elem1"]5

Minimum Number of Samples

If your dataset is highly variable (i.e. values contained in samples are not very close together), converting a single sample into a datapoint may provide misleading information. It may be better to have a datapoint with an "Unknown" value. This can be accomplished by setting the minimum number of samples for a datapoint in the report.

The minimumSampleSize field in the Report API explains that if the specified size of samples is not met when the consolidation function is performed, the datapoint is considered "null" and no data is available for it. When this occurs, the sample data is discarded and the data field of the datapoint is set to "null".

For information on how to set this option, see the REST API Report Resource section in the documentation.

Report Size

Reports have a predetermined number of datapoints, or size, which sets a limit on the amount of data that can be stored. After the report size has been reached, as newly created datapoints are pushed into the report, the oldest datapoints will automatically be deleted. This is to aid in managing the storage capacity of the server hosting MWS.

On report creation, a Mongo collection will be initialized that is the maximum size of a single entry (currently 16 MB) multiplied by the report size. Be careful in setting a large report size as this will quickly allocate the entire disk if many reports with large report sizes are created.

5.2 Example Report (CPU Utilization)

To understand how the behavior and usage of the reporting framework, a sample report covering CPU Utilization will be shown in this section. It will not cover how to gather or display data for reports, but will cover some basic operations that are available with Moab Web Services to facilitate reporting.

5.2.1 Creating A Report

Before any data is sent to Moab Web Services, a report must first be created. A JSON request body with a HTTP method of POST must be used to do this.

POST /rest/reports
{
  "name":"cpu-util",
  "description":"An example report for cpu utilization",
  "consolidationFunction":"average",
  "datapointDuration":600,
  "reportSize":288
}

This will result in a report being created which can then be retrieved by sending a GET request to /rest/reports/cpu-util. The datapointDuration of 600 signifies that the datapoint consolidation should occur once every 10 minutes, while the reportSize (i.e. number of the datapoints) shows that the report will retain up to 2 days worth of the latest datapoints.

GET /rest/reports/cpu-util
{
    "consolidationFunction": "average",
    "datapointDuration": 600,
    "datapoints": [],
    "description": "An example report for cpu utilization",
    "id": "aef6f6a3a0bz7bf6449537c9d",
    "keepSamples": false,
    "minimumSampleSize": 1,
    "name": "cpu-util",
    "reportSize": 288,
    "version": 0
}

Note that an ID has been generated automatically and that no datapoints are associated with the report.

5.2.2 Adding Samples

Until samples are added and associated with the report, datapoint consolidation will generate datapoints with a data field equal to null. Once samples are added, however, they will be averaged and inserted into the next datapoint.

Create samples for the cpu-util by sending a POST request as follows:

POST /rest/reports/cpu-util/samples
[
  {
    "agent": "cpu-monitor",
    "timestamp":"2012-01-01 12:00:00 UTC",
    "data": {
      "minutes1": 0.5,
      "minutes5": 0,
      "minutes15": 0
    }
  },
  {
    "agent": "cpu-monitor",
    "timestamp":"2012-01-01 12:01:00 UTC",
    "data": {
      "minutes1": 1,
      "minutes5": 0.5,
      "minutes15": 0.05
    }
  },
  {
    "agent": "cpu-monitor",
    "timestamp":"2012-01-01 12:02:00 UTC",
    "data": {
      "minutes1": 1,
      "minutes5": 0.5,
      "minutes15": 0.1
    }
  },
  {
    "agent": "cpu-monitor",
    "timestamp":"2012-01-01 12:03:00 UTC",
    "data": {
      "minutes1": 0.75,
      "minutes5": 1,
      "minutes15": 0.25
    }
  },
  {
    "agent": "cpu-monitor",
    "timestamp":"2012-01-01 12:04:00 UTC",
    "data": {
      "minutes1": 0,
      "minutes5": 1,
      "minutes15": 0.85
    }
  }
]

This sample data contains average load for the last 1, 5, and 15 minute intervals. The samples were recorded at one-minute intervals starting at noon on January 1st, 2012.

5.2.3 Consolidating Data

A consolidation function must run to generate datapoints from the given samples. This scheduled consolidation will occur at intervals of datapointDuration seconds. For each field in the data object in samples, all values will be averaged. If non-numeric values are included, the following strategies will be followed:
  1. All fields which contain a single numeric value in any included sample will be averaged and the non-numeric or null values will be ignored.
  2. All fields which contain a list will be consolidated into a single, flat list.
  3. All fields which contain only non-numeric or null values will be consolidated into a single, flat list.

If no historical datapoints are provided in the creation of a report as in this example, the next consolidation will be scheduled for the current time plus the datapointDuration. In this example, the scheduled consolidation is at 10 minutes from the creation date. If historical datapoints are included in the report creation, the latest datapoint's endDate plus the datapointDuration will be used as the scheduled time. If this date was in the past, the next scheduled consolidation will occur at the appropriate interval from the last endDate.

5.2.4 Retrieving Report Data

To retrieve the consolidated datapoints, simply perform a GET request on the report once again. Alternatively, the GET for a report's datapoints may be used.

GET /rest/reports/cpu-util
{
    "consolidationFunction": "average",
    "datapointDuration": 600,
    "datapoints": [
        {
            "firstSampleDate": null,
            "lastSampleDate": null,
            "data": null,
            "startDate": "2012-01-01 11:49:00 UTC",
            "endDate": "2012-01-01 11:59:00 UTC"
        },
        {
            "firstSampleDate": "2012-01-01 12:00:00 UTC",
            "lastSampleDate": "2012-01-01 12:04:00 UTC",
            "data": {
                "minutes1": 0.65,
                "minutes15": 0.25,
                "minutes5": 0.6
            },
            "startDate": "2012-01-01 11:59:00 UTC",
            "endDate": "2012-01-01 12:09:00 UTC"
        }
    ],
    "description": "An example report for cpu utilization",
    "id": "aef6f6a3a0bz7bf6449537c9d",
    "keepSamples": false,
    "minimumSampleSize": 1,
    "name": "cpu-util",
    "reportSize": 288,
    "version": 0
}

Note that of the two datapoints above, only the second actually contains data, while the other is set to null. Only samples lying within the datapoint's duration, or from the startDate to the endDate, are included in the consolidation. Therefore the first datapoint, which covered the 10 minute period just before the samples' recorded timestamps, contained no data. The second, which covers the 10 minute period matching that of the samples, contains the averaged sample data. This data could be used to display consolidated report data in a custom interface.

5.2.5 Possible Configurations

Configuration options may be changed to affect the process of report generation. These are documented in the API for the Report object and the Sample object.