Detecting Anomalies in Performance Objectives Prior to Integration


Steve Skalski

June 2007

Updated December 2007

Revised February 2008

Summary: Sometimes, it is important to do performance-/stress-testing prior to integration-testing. (9 printed pages)


Performance, Extensibility, Flexibility
Determining Critical and Subcritical Paths
Creating an Environment that Replicates Business Units of Work
Exposing Potential Cracks in Service Reliability
Sizing the Server for Performance- and Stress-Testing
What Are We Going to Measure?
Lessons Learned and Takeaways
Critical-Thinking Questions
Further Study

Performance, Extensibility, Flexibility

Being an avid hiker in the western part of Washington State, you tend to appreciate that the word "flat" here is nonexistent. As such, eventually you get that urge to try something more vertical than a simple trail. After all, it's just a mater of reducing distance with altitude—right? Mt. Rainier was that calling for me.

Mt. Rainier is located in western Washington in the United States and is surrounded by Mt. Rainier National Park. It is 14,411 feet in altitude and consists of 23 active glaciers, multiple crevasses per glacier, exposed rotting volcanic rock, and continuous melt and refreezing conditions during a majority of the year. It possesses the ideal training conditions, especially for expeditions heading for Mt. Everest. This environment houses several world-class climbing groups in nearby cities outside the national park.

My first thought was risk and risk mitigation. I was not going to put myself in danger of death by trying this summit attempt without professional help. I found a professional climber who had summited Mt. Rainier over 200 times at that time and now has summited the mountain 483 times. George was a consummate professional and the ideal choice to outsource risk.

Next, like proper hardware and software, was proper equipment. That was the easiest task, as the group that George was associated with published a list of equipment required for the climb. Again, risk was averted.

Finally, there was training, or performance. The training required for the climb consisted of two parts. The first part was a one-day class on mountain-climbing fundamentals. Discussed and practiced were techniques for climbing with minimum muscle burn, roping up into groups of six, belaying, freefall arrest, recognizing snow bridges, and a variety of other important instructions and techniques. The second part of training was physical endurance and the requirements for the actual climb. Being in excellent shape, I allowed one month in order to harden my body—a macho miscalculation, as it turned out.

The day of the actual climb was to be 18 hours, 18,000 feet, and 18 miles in duration. This was in two parts. The first was to go from Paradise to Camp Muir, or from 5,400 to 10,800 feet of elevation and about 4.5 miles. While 18,000 feet and 18 miles translate to 1,000 feet per mile, it is only an average. In reality, the toughest part of this piece is two miles of 1,660 feet per mile altitude gain. That was unforeseen by me; however, I made it. Beware of averages and what they mean.

After getting to Camp Muir, about 24 of us settled down in a pasteboard cabin that leaked cold wind. While we were expected to sleep from about 19:00 to about 1:00, it was difficult not only because of the body trying to acclimate to the 10,800 foot altitude, but in part because of the 24 individuals making noise and going in and out of the cabin.

We were up at 1:00, ate, dressed, and started up the mountain. It is important to go early in the morning, because melt can cause a lot of dangerous conditions. There were several rest stops along the way, with the 12,300 foot mark on Disappointment Cleaver being the most memorable. Getting there was a tough climb, with long-reaching steps, and you ended up in a winded state. The guides have the ultimate authority in determining the group's safety and make the decision of whether you progress on the climb. While I was still gasping for air, the guide asked me if I was okay. I answered, "Give me a minute, and I will be." Wrong answer! That was enough, as far as the guide was concerned, and I and five others were sent back down to Camp Muir.

On the way back down, I was disappointed. Yet, I was already doing failure analysis and deciding how to prepare for my next climb. It was rather evident to me that my physical training had not prepared me for what I had encountered. What I needed was a plan that would simulate the true conditions. After all, this is what training is all about: to do and redo situations that you will encounter in a real-life scenario. Hmm, sounds like performance- and stress-testing in the IT world.

Needless to say, after my experience, George and I put together a training plan to simulate this experience. It was extended to three months and allowed for conditions that were greater than I had experienced on the mountain. One year later, I was standing on top of Mt. Rainier.

Last year, I started working on a service-oriented architecture (SOA) project. In particular, I was responsible for the request-and-response message flow in order-processing through the data-access layer (DAL), stored procedure (sproc), and Microsoft SQL Server 2005 layers. I wanted a structure that allowed for a centralized point of control for the order process, allowed for multiple monitors and controllers, and simplified the design of the monitor and controller. As I architected the sprocs, it became evident that they were not the traditional type of stored procedures; they were multithreaded, handled state transition and validity, were the center of queuing structures and integrity, and required the proper design strategy for achieving the performance goals of the QOS specifications. Clearly, this routing through the DAL was a critical path for order-processing and had to be solid prior to system integration.

Determining Critical and Subcritical Paths

Just as with the lessons from climbing Mt. Rainier, it was clear that a testing strategy had to be devised that was much more than the traditional unit-test scenario. Not only did testing have to ensure performance requirements, but the metrics also had to be realistic and measure real-world cases.

My PM for the project first brought up the idea of early performance-testing and left me to my own devices to devise a strategy. The first order of business was to look at the quality of service (QOS) document and see what the performance metrics were for the system. As expected, the metrics were defined as some minimum number of orders per unit time. These I termed as business units (BUs) of work. Further examination of the scenarios, controller, and monitor showed the sequence of DAL calls that would be representative of different BUs. The final step in BU analysis was to determine the BUs that represented an order, as defined by scenarios and the QOS document.

Creating an Environment that Replicates Business Units of Work

My instinct to "trust documentation, but verify code" led me to modifying the DAL calls with a trace capability. A memory table recorded entry to the method call, a couple of parameters, and a time stamp. Upon exit of the method, the method name and time stamp were recorded. A test was performed using scenarios representative of different aspects of the order process. Armed with this information, the details of different BUs were solidified.

The SOA design architecture allowed for different components to reside at different URL locations. This manifested itself by showing gaps in time-stamp values, the reason for which was intensified by network activity, "on the wire," and network-latency time. This clearly would have a limiting effect on the maximum rate that the database procedures could be driven. While this might sound like a positive situation, it does not exercise the SQL Server 2005 database server to its maximum capability and does not expose potential "cracks" in reliability.

Exposing Potential Cracks in Service Reliability

To determine the maximum performance of the database sprocs, the network time was eliminated. The performance tests had to do BUs in an exclusive SQL Server 2005 environment that could be accomplished using TSQL scripts that would drive BUs with the concept of "zero wait time" between individual calls in the BU.

Having devised a strategy for testing, the next order of business was to size and acquire a SQL Server hardware test platform. The QOS document and SOA product specifications did not require a Datacenter class-processing platform, and we settled on an available 4xXeon 3.0 GHz box with 8 gigabytes (GB) of memory.

Sizing the Server for Performance- and Stress-Testing

The HD back end of the database required some additional thought, regarding possible performance scenarios that might be encountered. We analyzed the tables that were being accessed, based on the BUs, and tried to account for table-to-HD mapping possibilities, data-transfer rates, and queue depths encountered in preliminary testing. The tables were classified as read-only support tables, write-only tables, and read/write tables. The read-only support tables were small and would end up being memory-resident once accessed. The other two classes of tables would be the focus of any performance-tuning.

Table-access analysis required three volumes for the purpose of minimizing contention and HD queue depths. An available fiber SAN was configured with three separate paths (showing up as volumes) and multiple physical units per path to reduce queue-depth wait times.

In addition, as a part of the selection criterion, the tests had to be run long enough to gather meaningful performance. This translated into a large enough storage volume for the different tables. The final back end consisted of 1 terabyte of SAN storage divided up into three volumes sized as 0.5 terabytes, 0.25 terabytes, and 0.25 terabytes.

What Are We Going to Measure?

Not knowing where our potential bottlenecks would be at the start of the tests, we chose a top-down approach to measuring performance characteristics, which were the overall systems performance-monitoring (perfmon) metrics of processor utilization, memory pressure, and disk-queue depths. As soon as a primary bottleneck was qualified, we drilled down in a category to find the culprit.

In a stress- or performance-testing scenario, the test is configured to push the system. Typically, if the processor utilization is low, something in the other three categories (memory, HD I/O activity, or network activity) is causing a task to wait and not use the processor. It could be excessive memory use, resulting in paging or HD I/O. It could also be direct I/O activity, resulting in large queue depths on a unit. In either case, the queue depth is proportional to the amount of time to complete an I/O that has just entered the queue. The more entries in the queue, the longer it takes the I/O to complete. Finally, network load and speed have an impact on the completion time of a request.

Recall that we had previously eliminated network metrics. Specific SQL Server metrics were chosen along the same lines with the addition of metrics that reflected internal SQL Server points of contention.

As in any performance test, there must be independent and dependent variables. In this case the independent variable was BU/second and the dependent variables were the metrics previously defined. Table 1 denotes the performance object, a selected counter, and the bottleneck threshold.

Table 1. Performance-testing dependent and independent variables with threshold

Performance object

Selected counter

Bottleneck threshold


Percent processor time

>80 percent-90 percent

Physical disk

Avg. disk queue length (D)

>2*number of spindles

Physical disk

Avg. disk queue length (E)

>2*number of spindles

Physical disk

Avg. disk queue length (F)

>2*number of spindles



>circa 50/second

SQL Server: Locks

Lock waits/sec


SQL Server: Locks

Lock time-outs/sec


SQL Server: Locks


>0-any deadlocks are wasted resources

SQL Server: SQL errors


>0-any errors are wasted resources

SQL Server: Databases


Independent variable


Before running a series of performance tests, some simple checks must be run, including:

· A final run through the SQL query plan, with a view of the actual execution plan, to ensure that the infamous "missing index" will not reduce performance numbers. (Make sure that the execution plan is turned off for the actual performance run.)

· Run the test script in a one-loop mode, to ensure that no glaring errors occur.

· Check the database installation, to ensure that the tables are located on the correct volumes.

· Ensure that the performance monitor has all of the metrics set up correctly and that a recording mode has been tested.

After the preliminary checks have been made, one or two validation runs must be run. This will eliminate any unforeseen exception conditions.

· What is happening to volume usage, relative to the number of loops?

· Is there enough disk storage available to complete the run?

· Run a minimum of two threads.

· Are any deadlocks occurring?

· Do the scripts complete without error?

· Between any runs, ensure that you delete the database and do a fresh install.

By this point, I had a feeling for the performance of the different test cases. I decided to vary the number of threads executed, while keeping the loop count constant. This served two purposes: to evaluate the performance (BUs/second) and to provide stability feedback (stress-testing). For each run, I extracted the key metrics into a Microsoft Office Excel spreadsheet. A typical run should have the following indicators:

· Number of threads for this run

· Processor utilization

· Average queue depth on C:, D:, E:, and F:

· Lock requests per second

· Lock waits per second

· Active transactions

· Disk bytes per second on C:, D:, E:, and F:

· Run start time

· Run end time

· Calculated elapsed time

· Calculated transactions per second

Lessons Learned and Takeaways

Now that we have gone through the concepts of performance, scalability, and stress-testing, the relationship should be clear. Performance-testing, taken to the extreme, is stress-testing. When we ran multiple threads, we allowed for maximum interaction between individual TSQL statements on the various threads. The greater the number of threads, the greater the probability that each individual TSQL statement simultaneously will execute at the same time that another statement in the same script is executing. Sometimes, a failure will occur immediately; other times, it will take considerable time to surface. The greater the number of threads and the greater the number of processors, the greater the probability of surfacing an existing failure point.

As soon as the SQL Server 2005 server testing was completed, the core performance-testing was expanded to include the DAL, and then the entire BU process path. The early performance-testing that was done prior to integration had a positive effect by providing one fewer category of bugs to resolve, and led to an on-time release candidate. Detecting some of the deadlocks and performance issues found in this exercise shortened the integration-testing. This allowed the integration team to focus on issues that were unrelated to the DAL critical path.


Successfully climbing Mt. Rainier required practicing performance scenarios that would be greater than those encountered in the actual climb. This is exactly what has to be done in performance- and stress-testing software.

Critical-Thinking Questions

· What is happening to the QOS performance specifications, relative to the runs?

· Are the transaction rates increasing?

· Is the curve increasing at a linear rate, or does it have a knee?

· If it has a knee, there must be a bottleneck. Which of the three major categories is starting to get out of bounds? Make only one change, and rerun the series of tests.

· Is there a secondary effect during a long run that is showing any one of the three major perfmon indicators showing a rise? Sometimes, this rise is so slow that you do not notice it unless the run is relatively long.

· What are the performance-critical paths? What are the critical subpaths that can be isolated for performance-testing and stress-testing?

· What are the representative business units of work for each path?

· How can you increase the stress on a critical path?

· How do you validate the testing scenario?

· How does the QOS relate to critical-path performance-testing?

· How do you layer the performance- and stress-testing process?

· How do you define success?

Further Study

· Aubley, Curt.Tuning and Sizing NT Servers. Upper Saddle River, NJ: Prentice Hall, 1998.

· Friedman, Mark, and Odysseas Pentakalos. Windows 2000 Performance Guide. Sebastopol, CA: O'Reilly, 2002.

· Jain, Raj. The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling. New York, NY: Wiley, 1991.


BU (Business unit)—A term that I use to signify elementary operations to perform a business objective. For example, several sproc calls can be used to execute an order. At a higher level, several DAL calls could represent that same order. At an even higher level, several business-object calls could represent the same process.

Critical path—The chain of dependent events that determines the overall execution time.

DAL (Data-access layer)—The interface layer between the business layer and the database.

Deadlock—When two user processes each have a lock on separate objects and each process is trying to acquire a lock that the other process owns. The SQL Server will break the deadlock by selecting the least critical process, performing transaction back out, and notifying the selected candidate of the deadlock break.

Lock—In the SQL Server, a method of serializing access to a resource, to prevent multiple parallel access thereby destroying its integrity.

Network latency—The amount of time between the request and the receipt of the full message. This time can vary, depending on how busy the network is.

QOS document—Quality-of-service document.

Query plan—The series of steps that SQL Server uses in executing sprocs or TSQL statements.

Queue depth—The number of entries that are in the queue waiting to be serviced.

SAN—Storage-area network.

SOA (Service-oriented architecture)—An architecture that uses a collection of services that are called to perform a business unit of work.

Sproc—Stored procedure.

Terabyte—A unit of measure equal to 1,099,511,627,776 bytes, or 1,024 gigabytes.

Volume—Logical representation of storage the shows up as D: (for example). In SAN architecture, a volume can consist of one or more physical HD units. These units can be used to spread the I/O activity of a request, thereby lowering the average queue depth.

About the author

Steve Skalski is an architect developer with J&S Consultants, Inc., and has over 30 years of experience. With the beta release of .NET Framework, he has concentrated on C#, SQL Server performance, Web services, SOA, WSE security implementations, and just recently Microsoft CSF and WCF. Steve is a MCAD.NET, MCSD.NET early achiever, CISSP, and CISSP/ISSAP.


This article was published in Skyscrapr, an online resource provided by Microsoft. To learn more about architecture and the architectural perspective, please visit