September 2011

Volume 26 Number 09

Forecast: Cloudy - Reporting on Diagnostics Data

By Joseph Fultz | September 2011

Joseph FultzYou’ve probably been hearing a lot about Azure lately. It’s a great development platform for creating and deploying Web apps. Unfortunately, management can be a bit of a challenge. Azure provides a nice framework for collecting and transferring diagnostic data about the running application and role, but falls short on getting the data into a useful data store—and it doesn’t help with the visualization and analysis of that data.

Enter the SQL Azure Reporting CTP, our topic for this month. It may be mundane (by developer standards), but I’m going to cover it in the context of managing and monitoring my Azure deployments. I’ll demonstrate it using data from performance counters captured as part of the diagnostics for a Web Role. To date, the mechanism most folks use is to either transfer the information back home or access a SQL Azure data store from a local machine via reporting tools like SQL Server Reporting Services or Microsoft Excel. Unfortunately, current implementations following those paths have some significant downsides. On the one hand, if all the data is being transferred to a local store for reporting, this can both run up costs and present stale information. On the other hand, if you’re accessing SQL Azure to run a local report or tool, you might decrease the amount of data transferred by aggregating and summarizing the data, but data transfer is still there—and with a nontrivial report, you’ll likely still have some latency. By keeping the data in the cloud and also reporting on it from the cloud, you can send only the report view of the data, resulting in the freshest data and the least latency for viewing the report.

The Design and Setup 

To get this project underway, the first thing I’ve got to do is determine what data I’ll be collecting, where I’m going to put it and how I’m going to report on it. In this case, I’m going to collect percent time in GC and percent CPU time, both of which are common counters I look at when running performance tests. I’ll need to move the data from its landing place in Azure Table Storage to SQL Azure in order to facilitate reporting. Figure 1 illustrates the flow of data.

The Flow of Data
Figure 1 The Flow of Data

Using this diagram of the data flow, I can also identify my areas of effort. The numbers associate the following work items with the diagram:

  1. Configure diagnostics for roles
  2. Done through configuration
  3. Transfer to SQL Azure
    1. Create a worker role that at some frequency moves data
    2. Define a schema for the data tables
  4. Reporting
    1. Create a query to retrieve needed data in a usable format
    2. Create a report definition to render data into the report

SQL Azure Reporting

At the time of this writing, the reporting functionality of SQL Azure exists as a CTP that’s available for sign-up at connect.microsoft.com/sqlazurectps. Once SQL Azure Reporting is active on my account, I can navigate to it in the management portal and see something like what’s shown in Figure 2.

SQL Azure Reporting CTP
Figure 2 SQL Azure Reporting CTP

It’s important to note that I’ll need both the Web Service URL and the Username later on as I create this example. If you’re following along, you’ll need yours as well. Basically, once SQL Azure is provisioned, it’s ready and simply awaiting deployment of your reports.

You’ll also need Business Intelligence Development Studio (BIDS) to create the reports and deploy them to the service.

Diagnostics Setup

To collect the data I need, I have to create a couple of PerformanceCounterConfiguration objects and add them to the PerformanceCounters.DataSources in the OnStart method of the roles I’m interested in. I have a simple Web Role that’s not modified from the template, but it will provide a source of data. I’ll set both of my counters to sample every five seconds and then transfer the collected data to Table Storage every minute. Figure 3 shows the code for this. These numbers are completely ridiculous to use in anything other than a proof-of-concept sample, but that’s exactly what I’ve created, and I’d like to collect enough data in a short amount of time to generate something representative.

Figure 3 Sample Data Collection Code

// Create performance counter.
var performanceConfiguration = new PerformanceCounterConfiguration();
performanceConfiguration.CounterSpecifier = @"\Processor(_Total)\% Processor Time";
performanceConfiguration.SampleRate = System.TimeSpan.FromSeconds(5.0);
 
// Add counter to the configuration.
config.PerformanceCounters.DataSources.Add(performanceConfiguration);
 
performanceConfiguration = new PerformanceCounterConfiguration();
performanceConfiguration.CounterSpecifier =
  @"\.NET CLR Memory(_Global_)\% Time in GC";
performanceConfiguration.SampleRate = System.TimeSpan.FromSeconds(5.0);
 
config.PerformanceCounters.DataSources.Add(performanceConfiguration);
 
config.PerformanceCounters.ScheduledTransferPeriod =
  System.TimeSpan.FromMinutes(1.0);
 
DiagnosticMonitor.Start(
  "Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString", config);

Capturing the Data and Reporting

Now I need to get the data into a format and a store for reporting, and then I can create the report. My obvious choice for storing the data will be SQL Azure, and not only because it’s the structured storage solution for Azure, it’s also the only data source that SQL Azure Reporting can currently consume. When choosing the deployment location for the Azure Storage, SQL Azure and SQL Azure Reporting roles, it’s best to put them in the same datacenter or at least in the same geographic region to avoid any ingress or egress charges.

Data Schema and Transfer

For the example, I’ll just keep the schema simple. I have an Id column, which is an identity to use as the primary key (PK), though a more realistic PK would be something like ComputerName + CounterName + TimeStamp. For now, I’ll just use the identity field. Figure 4 shows my simple schema for storing my performance counter data.

Figure 4 Simple Schema for Storing Performance Counter Data

Name Type
Id Identity, Int
TimeStamp Nvarchar(50)
CounterName Nvarchar(50)
CounterValue Real

I originally had the TimeStamp as an actual TimeStamp column, but I changed it to a string to make it easier to manipulate. As a matter of general practice, I use a local SQL Server instance for initial development and, when I’m ready, I script it to a new query window, change the connection to connect to SQL Azure, and run the script to create the tables.

Now that I have data being placed in Table Storage and also a place to put it in SQL Azure, I need the worker role to move the data, so I add a worker role to my solution. Within the OnStart method, I add the standard configuration publisher code because I need to grab some information from settings, as shown in Figure 5.

Figure 5 Standard Configuration Publisher Code

ServicePointManager.DefaultConnectionLimit = 12; 
CloudStorageAccount.SetConfigurationSettingPublisher((configName, configSetter) =>
{
  configSetter(RoleEnvironment.GetConfigurationSettingValue(configName));
  RoleEnvironment.Changed += (sender, arg) =>
  {
    if (arg.Changes.OfType().
      Any((change) => (change.ConfigurationSettingName == configName)))
    {
    if(!configSetter(RoleEnvironment.GetConfigurationSettingValue(configName)))
  {
  RoleEnvironment.RequestRecycle();
    }
  }
  };
});

Next, I add a method named TransferPerfDataToSql to the class. Within this method I’m going to grab the data available and average it. I have the Run method sleeping for about a minute, so there will be multiple entries for each counter. Because I’m not really interested in the full fidelity of the collected counters, I’m going to average them at each point of transfer. Thus, whatever the interval is for sampling the performance counters, I’m going to turn it into a single entry for a minute-and-15-second interval. For clarity in this sample code, I’ll do this separately for each counter, though it could be done in one statement. Figure 6 shows what it looks like to do it for one counter.

Figure 6 Getting a Single Interval Value

var account = CloudStorageAccount.FromConfigurationSetting(
  "Microsoft.WindowsAzure.Plugins.Diagnostics.ConnectionString");
var context = new PerformanceDataContext(account.TableEndpoint.ToString(),
  account.Credentials);
var data = context.PerfData;
 
 
// Get average CPU.
List selectedData =
  (from d in data
   where d.CounterName == @"\Processor(_Total)\% Processor Time"
   select d).ToList();
 
double AvgCPU = (from d in selectedData
                 where d.CounterName == @"\Processor(_Total)\% Processor Time"
                 select d.CounterValue).Average();

After I get the value, I need to remove the data from Table Storage. I could call the REST API directly, but in this case I just loop through the selected objects and pass each one to DeleteObject on the context:

foreach (PerformanceData perfdata in selectedData)
{
  context.DeleteObject(perfdata);
}
 
context.SaveChanges();

I repeat this code for the GC counter as well. Each value I collect needs to be added to my data store. I’ve created an .edmx file for the CounterData table, against which I write the code shown in Figure 7 to add data to the table.

Figure 7 Adding Data to the CounterData Table

PerfDataEntities pde = new PerfDataEntities();
CounterData cd = new CounterData();
cd.CounterName = @"\Processor(_Total)\% Processor Time";
cd.CounterValue = (float)AvgCPU;
cd.TimeStamp = System.DateTime.Now.ToString();
 
pde.AddToCounterDatas(cd);
 
cd = new CounterData();
cd.CounterName = @"\.NET CLR Memory(_Global_)\% Time in GC";
cd.CounterValue = (float)AvgTimeGC;
cd.TimeStamp = System.DateTime.Now.ToString();
 
pde.AddToCounterDatas(cd);
pde.SaveChanges();
context.SaveChanges();

With this in place, the only thing left to do is to write the actual report—that most coveted of all tasks. (Yes, that is sarcasm.) 

The Report

I’m not a particularly accomplished report writer and, to be quite honest, that’s a characteristic I want to be sure to consistently demonstrate so as to never end up with that part of the project myself. So, I’ll create a relatively simple report to illustrate displaying the data in a grid and plotting a line to visualize the data.

The first step is to query the data in the format I need in order to produce the report. The data I’ve stored in the SQL Azure database is the average value for about a minute of samples, which means that each data point represents a minute. For my report, however, I’d rather have the data as a single point for each hour of time, giving me 24 points per day that I might use to trend over a short number of days. To do this I’ll tease apart the TimeStamp field using DATEPART and group by both day and hour. This results in the following query:

SELECT
  DATEPART(DD, [TimeStamp]) as [Day]
  ,DATEPART(HH, [TimeStamp]) as [Hour]
  ,[CounterName]
  ,Avg([CounterValue]) [Value]
FROM [dbo].[CounterData]
Group by DATEPART(DD, [TimeStamp]), DATEPART(HH, [TimeStamp]), CounterName
Order By CounterName

Moving on, I open BIDS and create a new report project. I add two Data Sources, one for cloud and one for local. It’s best to develop the report against a local store, as it will be faster to preview the report, which means no egress fees will be accrued for report execution. As a matter of practice when working on a report—or on any project for the cloud—when I get a piece of it working locally I run a quick litmus test in the cloud to make sure that it works there as well. Having both data sources defined makes this pretty simple to test against SQL Azure, and I’ll test major report sections by actually deploying it and running it from the SQL Azure Reporting CTP. After adding the data sources, I add a dataset to the report and use the previously mentioned SQL to define the data set.

With the supporting elements in place in the report project, I add a blank report to the project and name it CounterDatabyHour.rdl. Because I want a grid of data and a line graph, I add a tablix and a sparkline chart. For the tablix I use the day value for the second through n column headers on the first row, the hour value for the second through n column headers on the second row, and the data as the values across the columns for each row generated by Group By CounterName. For the chart data I set the CounterName as the series data and use hour and day as the category groups. This results in a design surface that looks like what’s shown in Figure 8.

Designing the Chart
Figure 8 Designing the Chart

There are expressions in the tablix, but those are merely CStr functions on the field values. The report is ready to go and all I need to do is push it to the cloud. First, though, I need to configure the solution to deploy to the cloud. To do that, I open the solution properties and set the TargetServerURL to the URL provided in the Azure Management Portal with “reportserver” tacked on to it. Thus, mine is: https://rsprodctp2585.ctp.reporting.database.windows.net/ReportServer.

I must use HTTPS for this operation for SQL Azure Reporting, but that’s preferred in this case anyway because a company would want to keep its server state and performance information to itself. With this set I deploy the report and I’m prompted for my credentials—the username I copied from the Management Portal, as shown in Figure 9.

Deploying the Report
Figure 9 Deploying the Report

If nothing fails, the report should deploy successfully and I can navigate to the site directly, log in and run the report from the browser (Figure 10).

Viewing the Report in the Browser
Figure 10 Viewing the Report in the Browser

In some cases you’ll want to link directly to the report, but in many others you might use a report control embedded within an application or a Web page. The key point here is that for the person consuming the reports, nothing is different from the typical experience of using SQL Server Reporting Services in his solution. The report developer performs some slightly altered operations, but it’s mostly the same for him, too.

Wrapping Up

My simple example is just that, but those who are a little more handy with BIDS and general report writing will find that the goodies in the SQL Azure Reporting CTP—for creating parameters, linked reports, expressions and other such report-writing features—provide a platform for rich reports that are about as complex as anyone would want them to be. SQL Azure Reporting gives developers a familiar paradigm for developing and distributing reports, with the essential difference that no one has to worry about the infrastructure to host it. One of the biggest benefits is using it for reporting on various types of diagnostic data, as I did here with the performance counters, and also for monitoring various aspects of any deployment. With SQL Azure Reporting, you can collect data to a central database for reporting and even report across deployments, storing data in separate databases—as long as they’re SQL Azure databases. Other data sources are on the roadmap, but even in its current incarnation, SQL Azure Reporting is a powerful and welcome addition the Azure toolbox.


Joseph Fultz is a software architect at HP, working as part of the HP.com Global IT. Previously he was a software architect for Microsoft working with its top-tier enterprise and ISV customers defining architecture and designing solutions.

Thanks to the following technical expert for reviewing this article: Jim Keane