Stress Testing

The goal of stress testing is to run a system at abnormally high load levels to identify issues such as memory leaks, thread contention, failure modes, and bottlenecks. Stress testing is a form of negative testing. Negative testing evaluates a system outside of normal boundaries to detect whether the system fails in a predictable and acceptable way. While stress testing can give some indications of scale behavior, scale testing is used to determine scale characteristics.

Often, diagnostic tests are simpler if they run on a single computer. Stress tests for the Partner Portal application ran on a 32-bit and a 64-bit standalone installation. Test scenarios were executed individually and also mixed together in a combined stress test. Stressing individual scenarios makes it easier to isolate problem areas, while combining them identifies interaction problems under stress.

You can perform stress testing by using high loads for Web tests that simulate user behavior, or by using integration tests that call the code under test directly. Both techniques were used for stress testing the SharePoint Guidance Library components and the Partner Portal reference implementation. For more information about stress testing using integration tests, see Integration Testing.

Although stress testing is not used to estimate system throughput, you should make sure that the throughput that is achieved during stress testing is reasonable. For example, if your system cannot serve one request per second while at a 50 percent processor load, it is likely that your system has bottlenecks that need to be identified and fixed.

Creating the Stress Test Environment

The SharePoint stress test environment should be created as a private network, isolated from the corporate domain, with a dedicated domain controller. This configuration allows you to control the traffic and server load without outside influences. The following illustration shows the environment that was used for stress testing the Partner Portal application.

SharePoint stress test environment

Ff649623.e8dba960-7cd1-4c0f-a802-e8956e9fc2ad(en-us,PandP.10).png

Analyzing Stress Test Performance Data

Performance analysis is an interactive process of executing tests and analyzing the results to detect abnormalities. The Partner Portal development team used the Visual Studio Team System (VSTS) test suite to run the scale and stress tests.

The targeted CPU utilization for the stress test was a steady state load of 80 percent. The number of simulated user sessions were balanced until the target load level was reached. The load tests ran for durations of 10 minutes to 12 hours. The data was collected from the specified counters every .5 seconds. When a test is complete, VSTS writes the results to a disk. The data was analyzed by creating graphs of counter values and looking at the resulting patterns. The following illustration shows a situation that identified contention that occurred in the application.

Test result graph showing significant contention

Ff649623.3f0b514a-445f-4eae-84fa-686eb132418f(en-us,PandP.10).png

Analyzing test results requires some knowledge of expected baseline values. In this example, a normal value for contentions was approximately 1–2 per second; however, the test produced several hundred per second, with a peak value of over one million contentions per second. That level of contention had severe impact on responsiveness and throughput. After further analysis, the development team identified a problem with the trace provider. They used the WinDBG tool to trap the condition. After debugging, they discovered that the contention problem was a result of the method used to write messages to the unified logging system (ULS). They also identified an issue with the unregister call for the trace provider. As a result, they changed the logic so that it registered and unregistered much less frequently. These changes reduced the contentions per second to less than one per second.

A key aspect of becoming proficient at analysis is recognizing common patterns in the counter values and isolating the responsible logic. For example, if your system has memory leaks, you will see a decrease in free memory while the number of private bytes increases.

A successful stress test needs to ensure that the scenarios accurately reflect your production load, as shown in the following illustration. In the illustration, both Web server processes are above 80 percent, and memory from all computers is stable.

Successful stress test

Ff649623.eb2fdaaf-a1ae-42da-8f26-f12ee63024b1(en-us,PandP.10).png

For more information about counter patterns that indicate common issues, see the following MSDN blog entries:

Home page on MSDN | Community site