Overview of System Performance


Duncan Mackenzie
Microsoft Developer Network

March 2002

Summary: Discusses the goals of application and system architecture performance, scalability, and availability, and includes a list of resources that help you design and implement your systems to meet these goals. (15 printed pages)




A large aspect of application and system architecture is ensuring that a system will perform well, scale to the required number of users, and have minimum down time. These different goals—performance, scalability, and availability—are related, and are based on a set of core concepts that apply to almost any type of system. In this article, I will describe some of those core concepts and provide a list of resources to help you design and implement your systems to meet all these goals.


Although "availability" is a critical metric for enterprise systems, it can be defined in multiple ways, which makes it hard to achieve. Before beginning a discussion of this concept, I will start by detailing the meaning I apply to the term. People frequently classify system availability by measuring the percent of time in which the system is usable, although some measurements do not count planned down time (such as for maintenance). The following table shows these common classes and the associated availability percentages and related annual down time.

Availability classAvailability measurementAnnual down time
Two nines99%3.7 days
Three nines99.9%8.8 hours
Four nines99.99%53 minutes
Five nines99.999%5.3 minutes

For your own measurements of down time, it is useful to consider everything from the point of view of the business, classifying down time as any time the users are unable to access the system. If one server is down, but the users are not affected because another server is taking over the load, then the system is still available. If you wish to track server availability separately from system availability, then most monitoring tools will allow you to track the time when the system is running and available to accept requests.

Note   The down time values given above are based on a requirement of 24/7 availability. If a system is only required to be available for part of that time, Monday to Friday from 9 a.m. to 5 p.m. for example, then the calculation should be based on that time span. A system that is required to be available 40 hours a week needs an annual down time of less than 2 minutes to achieve five nines availability, but since maintenance and other planned outages can be scheduled outside of working hours, it is easier to achieve this goal.

Overall availability is calculated based on the total down time of the system over a period of time (5.3 minutes over a year equals 99.999% availability), but it can also be expressed using an alternative calculation that takes into account the time required to recover from a failure. In the calculation below, MTTF (Mean Time To Failure) is the average time between system failures and MTTR (Mean Time To Recover) is the average time to recover from these failures:


Figure 1. Availability calculation

This is not a major change to the perception of availability—recovery time was always included in the time that the system was unavailable, but it does serve to clearly indicate the importance of rapid recovery in increasing availability. A system that is up for a year before a failure, but then takes three days to recover from that failure, is not as available as a system that fails ten times in that same year but recovers within 10 minutes.

Causes of Down Time

Clients do not differentiate between hardware and software failures. They do not care if the hard disk crashed or if the data integrity rules failed; they simply measure the time that the system was unusable. In the industry, hardware failure accounts for less than 20 percent of all system outages—it is therefore imperative that a "High Availability" system views people and process failure at least as thoroughly, and perhaps more so, than hardware failure. According to analyst Donna Scott in her article, NSM: Often the Weakest Link in Business Availability:

"…an average of 80 percent of mission-critical application service down time is directly caused by people or process failures. The other 20 percent is caused by technology failure, environmental failure or a disaster."

The Microsoft Operations Framework, which is based on the work done by the Microsoft internal operations group along with the experiences of our customers, is an excellent resource for looking at the people and process side of system availability.

Determining Single Points of Failure

A Single Point of Failure (SPOF) means that the failure of one system component would prevent continued access to applications and data. SPOFs are common in most systems and therefore difficult to remove completely. There are different areas you need to take into account when you are considering SPOFs, including:

The Facilities:

Do we have alternative power supplies to the data center?

If running a Web site, are we reliant on a single connection to the Internet?

The Network:

Are we using a single network lead to connect the network hardware?

If more than one network connection, is there a single Switch/Hub into which we might have plugged both leads from a machine?

The Machine:

Single CPU?

Single network card?

Single power supply?

The Data:

One or more independent drives (instead of RAID subsystem)?

Storage tied to single machine (instead of system area network, or SAN)?

Once you have had a look at the different parts of your system, you should be able to identify what SPOFs you have. Ideally you should remove these points of failure in every area; for example, it's no good having a really fault tolerant server, if it's relying on a single switch to connect it to the network. Most of the SPOFs can be removed by using technology, i.e. using clustering, but don't forget that part of the problem is also process- and people-based; are you dependant on a single person to do something in the event of a failure?

Chapter 11 of the Microsoft Windows® 2000 Server Resource Kit, Planning a Reliable Configuration, covers single points of failure and other aspects of planning a reliable configuration.

Planning for the Expected and the Unexpected

When availability is discussed, the main focus is usually on dealing with unplanned outages such as server crashes, power failures, hard-drive failure, and network disruption, but down time is often caused by planned maintenance and changes to the system or infrastructure. To reduce this common cause of down time, a system design must allow for maintenance (rebuilding a server, upgrading an OS, or replacing a piece of hardware) without resulting in a loss of service. Many of the methods used to prevent unexpected down time also work well for planned outages though, so both can be handled without compromise.

Load Balancing and Clustering

Two important and related concepts used in designing a highly available system are clustering and load balancing, both of which are designed to use multiple resources to increase availability and possibly performance. Clustering is the general term used for linking two or more independent servers together to perform a single task (or set of tasks), and its primary intent is to provides redundancy. Most clustering systems are configured to "failover"; if one server fails, the workload of that server is taken over by another member of the cluster (or, alternatively, the load is redistributed across the remaining cluster members). By letting a group of servers work together as if they were a single system, your system is more likely to stay available when a fault occurs. Now, I stated above that the primary intent of a cluster is for redundancy—you have n servers that look and perform like a single server, but that is not the only way in which a cluster can be configured. Alternatively, you can set up a cluster of servers to provide increased performance by distributing the processing load across them. Both of these technologies are "clustering" by definition, but the second scenario is described using the term Load Balancing to reflect the different behavior. A Load Balancing system still provides redundancy, as a server failure simply causes the load of n servers to be redistributed across n-1 servers.

Microsoft provides a variety of technologies to support the use of multiple servers, including:

  • The Microsoft Cluster Service (MSCS)
  • Network Load Balancing (NLB)
  • Microsoft Application Center

Microsoft Cluster Service

MSCS is one of the two complimentary clustering technologies that are available for Windows 2000, with Network Load Balancing being the other. MSCS allows multiple servers (two with Microsoft Windows Advanced Server 2000, and up to four with Microsoft Windows Datacenter Server 2000) to be connected together and configured to support failover clustering. This is primarily intended to provide redundancy, although individual services can be shuffled amongst the two servers to distribute some load and therefore provide increased performance. Full details on this technology, including links to additional information, can be found in Chapter 18 of the Windows 2000 Server Deployment Planning Guide, Ensuring the Availability of Applications and Services.

Network Load Balancing (NLB)

NLB is the second clustering technology included with Windows 2000, but it serves a different purpose than the Clustering Service described above. Network Load Balancing is intended to distribute traffic across a cluster of servers, allowing multiple machines to appear as a single server to clients. NLB provides "scale-out," handling more load by increasing the number of machines, but still increasing availability by redistributing load if a server in the cluster fails. A technical overview of NLB is provided by Chapter 19 of the Windows 2000 Distributed Systems Resource Kit, Network Load Balancing.

The two diagrams below illustrate the difference between the two different clustering technologies available in Windows:


Figure 2. Microsoft Cluster Service before and after a server has failed


Figure 3. Network Load Balancing before and after a server has failed

Microsoft Application Center 2000

A complete product to set up and manage server clusters, Microsoft Application Center 2000 builds on and extends the clustering features of Windows. Targeted primarily at Web applications, Application Center 2000 can manage clusters of IIS (Web) servers and COM+ component servers. The ability to handle clusters of COM+ servers means that Application Center can also be useful to non-Web applications. In addition to technical details of setting up load balancing, managing the content of Web or component server farm can be labor-intensive. Application Center 2000 provides a variety of features to keep the server content in sync (Web pages, components, and even settings), and to move content through various stages of test, staging and production. General information on Application Center is available on its Web site, and you can also find a variety of case studies and samples on MSDN that show the use of Application Center, including:

Planning for Reliability and High Availability (from Microsoft Commerce Server 2000)
Architecture Decisions for Dynamic Web Applications: Performance, Scalability, and Reliability

Note that a combination of clustering technologies can be used to provide you with the desired availability; you may be using Network Load Balancing across your Web server farm, but have your Microsoft SQL Server™ database set up with failover clustering using Microsoft Cluster Service. There are many different options in terms of your complete system configuration and Application Center itself builds on the existing Windows 2000 clustering technologies.

Server Specific Clustering Information

In addition to configuring the Operating System to support clustering, the other server products you are running must also be configured correctly to provide availability and/or performance benefits. Several Microsoft server products are listed below, along with details on how they can work with various clustering technologies. Although it is not a complete list, most products that support clustering should follow the same general principles as this selection of products.

Microsoft SQL Server 2000

Since database servers are a common topic for any discussion of availability, there is a large body of reference material available about database servers. SQL Server supports the Microsoft Clustering Service, through which it can participate in a failover cluster of Windows 2000 Advanced or Datacenter Servers, and it can also take advantage of Network Load Balancing in some situations. In addition to the services of the Operating System, SQL Server provides its own high-availability features including Log Shipping and enhanced backup and restore functionality. The following resources should help you understand the clustering options for SQL Server 2000, and the various related features that help provide high availability:

SQL Server 2000 Failover Clustering

Backing Up and Restoring Databases

Duwamish Diary: If At First You Don't Succeed, Failover

The SQL Server 2000 Resource Kit contains five excellent chapters on availability options, including:

Building a Highly Available Database Cluster (from Duwamish Online)

Microsoft Exchange Server 2000

Exchange is often used as a critical enterprise application, where high availability is essential, and it can achieve that level of availability through a variety of methods. Exchange supports both failover clustering, using the Microsoft Cluster Service, and load balancing. Both types of solutions are described in the links below:

IBM Redbook: Installing and Managing Microsoft Exchange 2000 Clusters

Microsoft Exchange 2000 Clustering: Exchange Virtual Servers

Microsoft Exchange 2000 Front-End Server and SMTP Gateway Hardware Scalability Guide

Microsoft Internet Information Server (IIS)

IIS installations are most likely to utilize some form of load balancing to achieve high availability, rather than failover clustering, whether that is accomplished through software (NLB) or hardware load balancing. Failover IIS clusters are still possible though, and the following guide details the available options:

Windows 2000 Web Server Best Practices for High Availability

Microsoft Application Center 2000 is capable of clustering IIS for availability and scalability purposes, as described in the following product tour:

Application Center Product Tour

Microsoft BizTalk Server 2000/2002

Microsoft BizTalk® Server 2000 and 2002 have addressed the requirement of high availability through multiple methods, including support for Microsoft Cluster Service and also through their own BizTalk-specific functionality.

Microsoft BizTalk Server 2002: High-Availability Solutions Using Microsoft Windows 2000 Cluster Service

Deploying BizTalk Server: Clustering Considerations

Microsoft BizTalk Server 2000: High-Availability Solutions Through Failover Clustering

BizTalk Server Deployment Considerations

Microsoft Commerce Server 2000

Because Commerce Server uses the services of Windows, IIS, and SQL Server, its clustering support is also based on the support offered by those products. In addition to looking at each of these products individually, there are several resources that detail clustering options specifically targeted at Commerce Server 2000:

Planning for Reliability and High Availability

Shopping Services Architect Patterns Summary

Microsoft Systems Architecture: Internet Data Center

Commerce Server 2000 High Availability Reference Architecture

Data Storage Considerations

In the process of finding single points of failure in your system(s), a key area that must be addressed is the data storage system used by your applications. The use of fault-tolerant RAID configuration is strongly recommended for all your data storage requirements; depending on your needs RAID 1, 5 or 1+0 will provide the features you need. For more information on the different RAID levels, check out Chapter 15 - High Availability Options: The Technical Side of High Availability of the SQL Server Resource Kit. When clustering is involved, such as with a clustered SQL Server, the disk storage will commonly be placed into some form of shared disk array (an external array connected via fiber channel or SCSI) so if a failover occurs, the newly active server has access to completely up-to-date data.


Scalability is another key system feature when building applications or infrastructure for the enterprise and it is a buzzword that is frequently used to advertise various server products. Scalability measures how a system's capacity can grow to handle increased load; the upper limit of this growth and the efficiency with which that growth occurs are key defining factors of how "scalable" a system is. In both technical and marketing materials, "scalability" and capacity are often discussed as if they are the same concept; a system that can handle a million users is described as extremely scalable, even if it could never be scaled-down to a reasonable cost to handle 1000 users, or could not be scaled up to two million users. Within the scope of this article, scalability describes whether or not a system can grow/shrink (scale) to handle a change in load and how efficiently that growth occurs.

The efficiency of a system's scalability is determined by the proportional change in hardware requirements as system load is increased. A system that can handle n users with a certain amount of hardware and can increase its capacity to double that number of users by doubling the hardware is said to have "linear scalability," which is relatively very efficient growth compared to many enterprise-level systems. The Building a Highly Available and Scalable Web Farm article about the Duwamish Books sample application discusses the scalability of the its Web application and the increased capacity found as additional Web servers were added to the system. A general discussion of scalability concepts can be found at the Microsoft Windows Server System Web site and a good coverage of scalability terms and concepts is provided by the Microsoft research paper on scalability terminology.

When scaling a system, there are two general ways in to do it: scale up or scale out. Scale-up refers to using a single server, and adding system hardware resources (processors and memory) as needed to increase overall system performance. The other method, scaling out, requires clustering and/or load balancing technologies to support the use of multiple servers, and additional servers are added as required to handle increased load. Certain applications (such as database servers, ERP systems, and others) are best suited to the scale-up model, running on a single server that is made as large as necessary. Other types of systems (such as Web sites and n-tier systems) are able to work well in a distributed environment and can be scaled out across multiple servers.

Supporting Scale-Up

With the release of Windows 2000 Datacenter Server, the upper ceiling for scaling a server up has been increased significantly. Servers running Windows 2000 Datacenter Server can now use up to 32 processors and 64 GB of memory, and take advantage of advanced application management features to divide that hardware amongst different applications running on the same machine. Very impressive scale-up can be achieved without being required the move to Windows 2000 Datacenter; Windows 2000 Advanced Server supports up to 8 processors (and 8 GB of memory)—sufficient hardware for a large number of application systems. The materials below provide more information on those Microsoft Server products (including SQL Server) that fit well in a single monolithic server configuration.

Microsoft SQL Server 2000

SQL Server 2000 on Large Servers

Building a Highly Available Database Cluster

SQL Server Scalability Case Studies

Microsoft Internet Security and Acceleration Server (ISA)

Although ISA, like most networking/Internet solutions, is very well suited for scaling out, it also provides support for working with a large number of processors and a large amount of memory, allowing it to scale up as well. The Standard Edition of ISA supports up to four processors, while the Enterprise Edition has no limit on the number of processors it can support, but it would be limited by the version of Windows it was running on.

Installation and Deployment Guide

Microsoft Windows 2000 Datacenter

Scalability and Reliability Improvements in Windows 2000 Datacenter

Scaling Out

While scaling up has the advantages of single-server management, and works well with transaction-processing applications that were traditionally mainframe systems, there is a distributed approach to scalability as well. By scaling out and using multiple servers to handle your system's load, you are implementing a cluster scenario, which helps to improve your availability while still providing you with the increased capacity and performance that you require.

Web Applications

The use of scale out in deploying Web applications is extremely common and allows the use of Web server farms (and gardens) to support large amounts of traffic to a particular Web-based system. Microsoft Web server based products, such as Microsoft Commerce Server, Microsoft Content Management Server and IIS itself, support two main methods for scaling out: the use of Application Center 2000 to create and manage Web farms, and the use of Network Load Balancing to distribute traffic across multiple servers. Both of these technologies are covered through earlier links in the clustering sections of this document, but these two links provide more information specifically related to the scaling of Web applications:

Building a Highly Available and Scalable Web Farm (from Duwamish Online)

Scaling Business Web Sites with Application Center (Chapter 1 of the Application Center Resource Kit)

Microsoft Internet Security and Acceleration Server (ISA)

Although ISA was discussed earlier in the Supporting Scale-Up section, it provides for great scale-out performance using an array of ISA servers.

Scale-Out Caching with ISA Server

Microsoft SQL Server 2000

Scaling with SQL Server is often focused on scaling up, but it also very capable of scaling out through the use of features such as Distributed Partitioned Views. These links provide more details on the scale-out features of SQL Server 2000:

Microsoft SQL Server MegaServers: Achieving Software Scale-Out

Designing Federated SQL Server 2000 Servers

Microsoft BizTalk Server 2002

BizTalk Server 2000 and 2002 can both be scaled out using a large number of servers, with individual servers capable of being dedicated to individual tasks or handling all parts of a system. The links below detail the various options available in scaling BizTalk Server and although they are documenting the behavior of BizTalk Server 2000, the concepts apply equally to BizTalk Server 2002:

BizTalk Server Deployment Considerations

Chapter 7: BizTalk Server Design of the Internet Data Center Reference Architecture

Designing Systems to be Scalable

With scalability, as with availability and performance, the entire system must be taken into account and bottlenecks identified. Even if your database server could scale to handle double the current load, your entire system may be bottlenecked by some specific aspect of the application, such as the hard drive controller or the network interface. By planning ahead for scalability, it is possible that bottlenecks can be identified and designed out of the system. Even if a single server is all that is planned, by designing as if there were anywhere from 1 to n individual servers, scaling to a multiple-server configuration at a later date will be much easier. Wherever possible, remove all assumptions about the physical distribution of your components, database, and Web servers. A system that is abstracted from its physical distribution can be scaled in a variety of ways, including distributing individual components or groups of components onto different machines.


Figure 4. Monolithic architecture

Even if you are sure you will be deploying to a single server (see Figure 4), designing for multiple servers (see Figure 5) will make scaling out much easier in the future.


Figure 5. Distributed application architecture

The Windows DNA model, in which you architect your system based on logical tiers (data, business and presentation) that could then be physically distributed as desired, was well suited for scaling out or scaling up as required without any change to your application code. In the .NET world, the same general principles still hold: do not make assumptions about the physical distribution of your application and divide your application into logical tiers that can be located as necessary. The following books, white papers and articles provide good coverage of the application design principles that lead to scalable systems:

Server Performance and Scalability Killers

INFO: Design Guidelines for VB Components Under ASP (Q243548)

Developing Scalable Web Applications (Microsoft Windows 2000 Advanced documentation)

Designing Distributed Applications with Visual Studio .NET (the starting point into a set of documentation on building applications that scale and perform well)

Shopping Services Architect Patterns Summary

A Blueprint for Building Web Sites Using the Microsoft Windows Platform


Although performance and scalability are tightly related and often considered at the same time, it can be helpful to separate the two for the purpose of discussion. How well a system performs is usually measured by the amount of time it takes to respond to specific user requests, or to accomplish a specific task. When scalability is included in this definition of performance, we have to modify it to include the number of users/requests being handled at the time of the performance measurement.

There are two main concepts to be considered with performance: designing high-performance applications, and testing the performance of existing systems and code. There is no specific order to these concepts though, as the testing of your system and the resulting design iterations should be an ongoing process.

Designing for Performance

Regardless of the type of system you are creating, there are general guidelines that help you to achieve high performance. Many of these guidelines may appear targeted at a specific language, product version, or even type of application, but all of them contain information that helps increase your awareness of key performance issues and possible solutions:

Performance Considerations for Run-Time Technologies in the .NET Framework

Performance Tips and Tricks in .NET Applications

Open the Box! Quick! (an article by Eric Gunnerson that looks at the performance implications of boxed value types in C#)

.NET Architectural Sample Applications
A variety of complete systems are provided in this section, all of which have been implemented using .NET and serve as excellent examples of different design and implementation strategies.

.NET Application Architecture Panel Discussion: PDC 2001
This article summarizes a panel discussion that took place at the Professional Developers Conference 2001 and covered a huge number of architectural issues revolving around .NET development.

Microsoft .NET PetShop Sample Application
As mentioned in the panel discussion (above), this sample was built to illustrate performance best practices with the .NET Framework.

Performance Testing

Once you have a working build of an application, or application component, you can start performance testing. If you are doing performance testing a complete system, or a product that you did not develop, you can use load testing tools such as the Web Application Stress tool (WAS), Application Center Test (ACT, part of Microsoft Visual Studio® .NET Enterprise Architect edition) and others to simulate usage and then measure the performance:

Benchmarks: Making Sure They're Meaningful (from the Performance Artist column in MSDN News)

Performance Testing with the Web Application Stress Tool

Understanding Performance Testing (for Web Applications)

Stress Testing Data Access Components in Windows DNA Applications

As an alternative to load simulation tools, which are designed to test the performance of a complete application, code profiling can be used on your own applications to obtain a detailed analysis of your application's performance. Code profiling involves the instrumentation of your compiled code, adding a variety of hooks so that internal application flow can be tracked/logged by an external application. The benefit of code profiling over external testing is that it directly pinpoints which classes, procedures and sometimes even lines (depending on the profiling tool), are taking up the majority of your application's execution time. Armed with this information, you can prioritize your tuning work to concentrate on those areas of code that are called most frequently or appear to have the largest effect on your performance. Code profiling is supported for .NET applications through the common language runtime (CLR), and this support is detailed in the following articles:

NET CLR Profiling Services: Track Your Managed Components to Boost Application Performance

The .NET Profiling API and the DNProfiler Tool

Note   Many third-party tools, such as the Compuware DevPartner Profiler, exist to provide code profiling for managed and unmanaged applications and should be investigated before you consider writing your own profiling systems.


Availability, scalability and performance are all related, as they all depend on combinations of physical architecture and design decisions. The use of clustering technology described in this article provides increased availability, and when used to distribute load across a set of servers, also provides for increased scalability and performance. This close relationship between these key metrics requires that all three must be considered when determining the physical architecture of a system. It is possible to create a system that is simultaneously highly available, highly scalable and high-performance, but it is not an easy task nor is it necessarily required. The ideal system is designed to achieve the availability, scalability and performance that its requirements indicate. There are many compromises that can be made to reduce the cost of building a highly available system, but in each case you are increasing the risk of a system failure. A detailed risk assessment that assigns a cost to down time will assist you in determining the correct balance between availability and budget priorities. The MSF risk management model focuses on application development but applies to any type of system and is a good starting point for performing your own risk assessments.