.gif)
Performance Testing Guidance for Web Applications
J.D. Meier, Carlos Farre, Prashant Bansode, Scott Barber, and Dennis Rea
Microsoft Corporation
September 2007
Objectives
- Learn what performance testing is.
- Learn the core activities of performance testing.
- Learn why performance testing matters.
- Learn about the relevance of project context to performance
testing.
- Learn how tuning fits into the performance testing cycle.
Overview
Performance testing is a type of testing intended to determine the
responsiveness, throughput, reliability, and/or scalability of a system under a
given workload. Performance testing is commonly conducted to accomplish the
following:
- Assess production readiness
- Evaluate against performance criteria
- Compare performance characteristics of multiple systems or
system configurations
- Find the source of performance problems
- Support system tuning
- Find throughput levels
This chapter provides a set of foundational building blocks on which to base
your understanding of performance testing principles, ultimately leading to
successful performance-testing projects. Additionally, this chapter introduces
various terms and concepts used throughout this guide.
How to Use This Chapter
Use this chapter to understand the purpose of performance testing and the core
activities that it entails. To get the most from this chapter:
- Use the “Project Context” section to understand how to
focus on the relevant items during performance testing.
- Use the “Relationship Between Performance Testing and
Tuning” section to understand the relationship between performance testing
and performance tuning, and to understand the overall performance tuning
process.
- Use the “Performance, Load, and Stress Testing” section to
understand various types of performance testing.
- Use the “Baselines” and “Benchmarking” sections to
understand the various methods of performance comparison that you can use to
evaluate your application.
- Use the “Terminology” section to understand the common
terminology for performance testing that will facilitate articulating
terms correctly in the context of your project.
Core Activities of Performance Testing
Performance testing is typically done to help identify bottlenecks in a
system, establish a baseline for future testing, support a performance tuning
effort, determine compliance with performance goals and requirements, and/or
collect other performance-related data to help stakeholders make informed
decisions related to the overall quality of the application being tested. In
addition, the results from performance testing and analysis can help you to
estimate the hardware configuration required to support the application(s) when
you “go live” to production operation.
.gif)
Figure 1.1 Core Performance Testing Activities
The performance testing approach used in this guide consists of the
following activities:
- Activity 1. Identify the Test Environment. Identify
the physical test environment and the production environment as well as
the tools and resources available to the test team. The physical
environment includes hardware, software, and network configurations. Having
a thorough understanding of the entire test environment at the outset enables
more efficient test design and planning and helps you identify testing
challenges early in the project. In some situations, this process must be
revisited periodically throughout the project’s life cycle.
- Activity 2. Identify Performance Acceptance Criteria.
Identify the response time, throughput, and resource utilization goals and
constraints. In general, response time is a user concern, throughput is a
business concern, and resource utilization is a system concern.
Additionally, identify project success criteria that may not be captured
by those goals and constraints; for example, using performance tests to
evaluate what combination of configuration settings will result in the
most desirable performance characteristics.
- Activity 3. Plan and Design Tests. Identify key
scenarios, determine variability among representative users and how to
simulate that variability, define test data, and establish metrics to be
collected. Consolidate this information into one or more models of system
usage to be implemented, executed, and analyzed.
- Activity 4. Configure the Test Environment.
Prepare the test environment, tools, and resources necessary to execute
each strategy as features and components become available for test. Ensure
that the test environment is instrumented for resource monitoring as
necessary.
- Activity 5. Implement the Test Design. Develop the
performance tests in accordance with the test design.
- Activity 6. Execute the Test. Run and monitor your
tests. Validate the tests, test data, and results collection. Execute
validated tests for analysis while monitoring the test and the test
environment.
- Activity 7. Analyze Results, Report, and Retest.
Consolidate and share results data. Analyze the data both individually and
as a cross-functional team. Reprioritize the remaining tests and
re-execute them as needed. When all of the metric values are within
accepted limits, none of the set thresholds have been violated, and all of
the desired information has been collected, you have finished testing that
particular scenario on that particular configuration.
Why Do Performance Testing?
At the highest level, performance testing is almost always conducted to
address one or more risks related to expense, opportunity costs, continuity,
and/or corporate reputation. Some more specific reasons for conducting
performance testing include:
- Assessing release readiness by:
- Enabling you to predict or estimate the performance
characteristics of an application in production and evaluate whether or
not to address performance concerns based on those predictions. These
predictions are also valuable to the stakeholders who make decisions
about whether an application is ready for release or capable of handling
future growth, or whether it requires a performance improvement/hardware
upgrade prior to release.
- Providing data indicating the likelihood of user
dissatisfaction with the performance characteristics of the system.
- Providing data to aid in the prediction of revenue losses
or damaged brand credibility due to scalability or stability issues, or
due to users being dissatisfied with application response time.
- Assessing infrastructure adequacy by:
- Evaluating the adequacy of current capacity.
- Determining the acceptability of stability.
- Determining the capacity of the application’s
infrastructure, as well as determining the future resources required to
deliver acceptable application performance.
- Comparing different system configurations to determine
which works best for both the application and the business.
- Verifying that the application exhibits the desired
performance characteristics, within budgeted resource utilization
constraints.
- Assessing adequacy of developed software performance by:
- Determining the application’s desired performance characteristics
before and after changes to the software.
- Providing comparisons between the application’s current and
desired performance characteristics.
- Improving the efficiency of performance tuning by:
- Analyzing the behavior of the application at various load
levels.
- Identifying bottlenecks in the application.
- Providing information related to the speed, scalability,
and stability of a product prior to production release, thus enabling you
to make informed decisions about whether and when to tune the system.
Project Context
For a performance testing project to be successful, both the approach to
testing performance and the testing itself must be relevant to the context of
the project. Without an understanding of the project context, performance
testing is bound to focus on only those items that the performance tester or
test team assumes to be important, as opposed to those that truly are important,
frequently leading to wasted time, frustration, and conflicts.
The project context is nothing more than those things that are, or may
become, relevant to achieving project success. This may include, but is not
limited to:
- The overall vision or intent of the project
- Performance testing objectives
- Performance success criteria
- The development life cycle
- The project schedule
- The project budget
- Available tools and environments
- The skill set of the performance tester and the team
- The priority of detected performance concerns
- The business impact of deploying an application that
performs poorly
Some examples of items that may be relevant to the performance-testing
effort in your project context include:
- Project vision. Before beginning performance
testing, ensure that you understand the current project vision. The
project vision is the foundation for determining what performance testing
is necessary and valuable. Revisit the vision regularly, as it has the
potential to change as well.
- Purpose of the system. Understand the purpose of
the application or system you are testing. This will help you identify the
highest-priority performance characteristics on which you should focus
your testing. You will need to know the system’s intent, the actual
hardware and software architecture deployed, and the characteristics of
the typical end user.
- Customer or user expectations. Keep customer or
user expectations in mind when planning performance testing. Remember that
customer or user satisfaction is based on expectations, not simply
compliance with explicitly stated requirements.
- Business drivers. Understand the business drivers –
such as business needs or opportunities – that are constrained to some
degree by budget, schedule, and/or resources. It is important to meet your
business requirements on time and within the available budget.
- Reasons for testing performance. Understand the
reasons for conducting performance testing very early in the project. Failing
to do so might lead to ineffective performance testing. These reasons
often go beyond a list of performance acceptance criteria and are bound to
change or shift priority as the project progresses, so revisit them
regularly as you and your team learn more about the application, its
performance, and the customer or user.
- Value that performance testing brings to the project.
Understand the value that performance testing is expected to bring to the
project by translating the project- and business-level objectives into
specific, identifiable, and manageable performance testing activities.
Coordinate and prioritize these activities to determine which performance
testing activities are likely to add value.
- Project management and staffing. Understand the
team’s organization, operation, and communication techniques in order to
conduct performance testing effectively.
- Process. Understand your team’s process and
interpret how that process applies to performance testing. If the team’s process
documentation does not address performance testing directly, extrapolate
the document to include performance testing to the best of your ability,
and then get the revised document approved by the project manager and/or
process engineer.
- Compliance criteria. Understand the regulatory
requirements related to your project. Obtain compliance documents to ensure
that you have the specific language and context of any statement related
to testing, as this information is critical to determining compliance
tests and ensuring a compliant product. Also understand that the nature of
performance testing makes it virtually impossible to follow the same
processes that have been developed for functional testing.
- Project schedule. Be aware of the project start
and end dates, the hardware and environment availability dates, the flow
of builds and releases, and any checkpoints and milestones in the project
schedule.
The Relationship Between Performance Testing and Tuning
When end-to-end performance testing reveals system or application
characteristics that are deemed unacceptable, many teams shift their focus from
performance testing to performance tuning, to discover what is necessary to
make the application perform acceptably. A team may also shift its focus to
tuning when performance criteria have been met but the team wants to reduce the
amount of resources being used in order to increase platform headroom, decrease
the volume of hardware needed, and/or further improve system performance.
Cooperative Effort
Although tuning is not the direct responsibility of most performance
testers, the tuning process is most effective when it is a cooperative effort
between all of those concerned with the application or system under test,
including:
- Product vendors
- Architects
- Developers
- Testers
- Database administrators
- System administrators
- Network administrators
Without the cooperation of a cross-functional team, it is almost impossible
to gain the system-wide perspective necessary to resolve performance issues
effectively or efficiently.
The performance tester, or performance testing team, is a critical component
of this cooperative team as tuning typically requires additional monitoring of
components, resources, and response times under a variety of load conditions
and configurations. Generally speaking, it is the performance tester who has
the tools and expertise to provide this information in an efficient manner,
making the performance tester the enabler for tuning.
Tuning Process Overview
Tuning follows an iterative process that is usually separate from, but not
independent of, the performance testing approach a project is following. The
following is a brief overview of a typical tuning process:
- Tests are conducted with the system or application
deployed in a well-defined, controlled test environment in order to ensure
that the configuration and test results at the start of the testing
process are known and reproducible.
- When the tests reveal performance characteristics deemed
to be unacceptable, the performance testing and tuning team enters a
diagnosis and remediation stage (tuning) that will require changes to be
applied to the test environment and/or the application. It is not uncommon
to make temporary changes that are deliberately designed to magnify an
issue for diagnostic purposes, or to change the test environment to see if
such changes lead to better performance.
- The cooperative testing and tuning team is generally given
full and exclusive control over the test environment in order to maximize
the effectiveness of the tuning phase.
- Performance tests are executed, or re-executed after each
change to the test environment, in order to measure the impact of a
remedial change.
- The tuning process typically involves a rapid sequence of
changes and tests. This process can take exponentially more time if a
cooperative testing and tuning team is not fully available and dedicated
to this effort while in a tuning phase.
- When a tuning phase is complete, the test environment is
generally reset to its initial state, the successful remedial changes are applied
again, and any unsuccessful remedial changes (together with temporary
instrumentation and diagnostic changes) are discarded. The performance
test should then be repeated to prove that the correct changes have been
identified. It might also be the case that the test environment itself is
changed to reflect new expectations as to the minimal required production
environment. This is unusual, but a potential outcome of the tuning
effort.
Performance, Load, and Stress Testing
Performance tests are usually described as belonging to one of the following
three categories:
- Performance testing. This type of testing
determines or validates the speed, scalability, and/or stability
characteristics of the system or application under test. Performance is
concerned with achieving response times, throughput, and resource-utilization
levels that meet the performance objectives for the project or product. In
this guide, performance testing represents the superset of all of the
other subcategories of performance-related testing.
- Load testing. This subcategory of performance testing
is focused on determining or validating performance characteristics of the
system or application under test when subjected to workloads and load
volumes anticipated during production operations.
- Stress testing. This subcategory of performance
testing is focused on determining or validating performance
characteristics of the system or application under test when subjected to
conditions beyond those anticipated during production operations. Stress
tests may also include tests focused on determining or validating
performance characteristics of the system or application under test when
subjected to other stressful conditions, such as limited memory,
insufficient disk space, or server failure. These tests are designed to
determine under what conditions an application will fail, how it will
fail, and what indicators can be monitored to warn of an impending
failure.
Baselines
Creating a baseline is the process of running a set of tests to capture
performance metric data for the purpose of evaluating the effectiveness of
subsequent performance-improving changes to the system or application. A
critical aspect of a baseline is that all characteristics and configuration
options except those specifically being varied for comparison must remain
invariant. Once a part of the system that is not intentionally being varied for
comparison to the baseline is changed, the baseline measurement is no longer a
valid basis for comparison.
With respect to Web applications, you can use a baseline to determine
whether performance is improving or declining and to find deviations across different
builds and versions. For example, you could measure load time, the number of
transactions processed per unit of time, the number of Web pages served per
unit of time, and resource utilization such as memory usage and processor usage.
Some considerations about using baselines include:
- A baseline can be created for a system, component, or
application. A baseline can also be created for different layers of
the application, including a database, Web services, and so on.
- A baseline can set the standard for comparison, to
track future optimizations or regressions. It is important to
validate that the baseline results are repeatable, because considerable
fluctuations may occur across test results due to environment and workload
characteristics.
- Baselines can help identify changes in performance.
Baselines can help product teams identify changes in performance that reflect
degradation or optimization over the course of the development life cycle.
Identifying these changes in comparison to a well-known state or
configuration often makes resolving performance issues simpler.
- Baselines assets should be reusable. Baselines are
most valuable if they are created by using a set of reusable test assets. It
is important that such tests accurately simulate repeatable and actionable
workload characteristics.
- Baselines are metrics. Baseline results can be
articulated by using a broad set of key performance indicators, including
response time, processor capacity, memory usage, disk capacity, and
network bandwidth.
- Baselines act as a shared frame of reference.
Sharing baseline results allows your team to build a common store of acquired
knowledge about the performance characteristics of an application or
component.
- Avoid over-generalizing your baselines. If your project
entails a major reengineering of the application, you need to reestablish
the baseline for testing that application. A baseline is application-specific
and is most useful for comparing performance across different versions.
Sometimes, subsequent versions of an application are so different that previous
baselines are no longer valid for comparisons.
- Know your application’s behavior. It is a good
idea to ensure that you completely understand the behavior of the
application at the time a baseline is created. Failure to do so before
making changes to the system with a focus on optimization objectives is
frequently counterproductive.
- Baselines evolve. At times you will have to
redefine your baseline because of changes that have been made to the
system since the time the baseline was initially captured.
Benchmarking
Benchmarking is the process of comparing your system’s performance
against a baseline that you have created internally or against an industry
standard endorsed by some other organization.
In the case of a Web application, you would run a set of tests that comply with
the specifications of an industry benchmark in order to capture the performance
metrics necessary to determine your application’s benchmark score. You can then
compare your application against other systems or applications that also
calculated their score for the same benchmark. You may choose to tune your
application performance to achieve or surpass a certain benchmark score. Some
considerations about benchmarking include:
- You need to play by the rules. A benchmark is
achieved by working with industry specifications or by porting an existing
implementation to meet such standards. Benchmarking entails identifying all
of the necessary components that will run together, the market where the
product exists, and the specific metrics to be measured.
- Because you play by the rules, you can be transparent.
Benchmarking results can be published to the outside world. Since comparisons
may be produced by your competitors, you will want to employ a strict set
of standard approaches for testing and data to ensure reliable results.
- You divulge results across
various metrics. Performance
metrics may involve load time, number of transactions processed per unit
of time, Web pages accessed per unit of time, processor usage, memory
usage, search times, and so on.
Terminology
The following definitions are used throughout this guide. Every effort has
been made to ensure that these terms and definitions are consistent with formal
use and industry standards; however, some of these terms are known to have
certain valid alternate definitions and implications in specific industries and
organizations. Keep in mind that these definitions are intended to aid
communication and are not an attempt to create a universal standard.
|
Term / Concept
|
Description
|
|
Capacity
|
The capacity of a system is the total workload it can handle
without violating predetermined key performance acceptance criteria.
|
|
Capacity test
|
A capacity test complements load testing by determining your server’s
ultimate failure point, whereas load testing monitors results at various
levels of load and traffic patterns. You perform capacity testing in
conjunction with capacity planning, which you use to plan for future growth,
such as an increased user base or increased volume of data. For example, to
accommodate future loads, you need to know how many additional resources
(such as processor capacity, memory usage, disk capacity, or network
bandwidth) are necessary to support future usage levels. Capacity testing
helps you to identify a scaling strategy in order to determine whether you
should scale up or scale out.
|
|
Component test
|
A component test is any performance test that targets an
architectural component of the application. Commonly tested components
include servers, databases, networks, firewalls, and storage devices.
|
|
Endurance test
|
An endurance test is a type of performance test focused on
determining or validating performance characteristics of the product under
test when subjected to workload models and load volumes anticipated during
production operations over an extended period of time. Endurance testing is a
subset of load testing.
|
|
Investigation
|
Investigation is an activity based on collecting information
related to the speed, scalability, and/or stability characteristics of the
product under test that may have value in determining or improving product
quality. Investigation is frequently employed to prove or disprove hypotheses
regarding the root cause of one or more observed performance issues.
|
|
Latency
|
Latency is a measure of responsiveness that represents the time it
takes to complete the execution of a request. Latency may also represent the
sum of several latencies or subtasks.
|
|
Metrics
|
Metrics are measurements obtained by running performance tests as
expressed on a commonly understood scale. Some metrics commonly obtained
through performance tests include processor utilization over time and memory
usage by load.
|
|
Performance
|
Performance refers to information regarding your application’s response
times, throughput, and resource utilization levels.
|
|
Performance test
|
A performance test is a technical investigation done to determine
or validate the speed, scalability, and/or stability characteristics of the
product under test. Performance testing is the superset containing all other
subcategories of performance testing described in this chapter.
|
|
Performance budgets or allocations
|
Performance budgets (or allocations) are constraints placed
on developers regarding allowable resource consumption for their component.
|
|
Performance goals
|
Performance goals are the criteria that your team wants to meet before
product release, although these criteria may be negotiable under certain
circumstances. For example, if a response time goal of three seconds is set
for a particular transaction but the actual response time is 3.3 seconds, it
is likely that the stakeholders will choose to release the application and
defer performance tuning of that transaction for a future release.
|
|
Performance objectives
|
Performance objectives are usually specified in terms of response
times, throughput (transactions per second), and resource-utilization levels
and typically focus on metrics that can be directly related to user
satisfaction.
|
|
Performance requirements
|
Performance requirements are those criteria that are absolutely
non-negotiable due to contractual obligations, service level agreements
(SLAs), or fixed business needs. Any performance criterion that will not
unquestionably lead to a decision to delay a release until the criterion
passes is not absolutely required ― and therefore, not a requirement.
|
|
Performance targets
|
Performance targets are the desired values for the metrics
identified for your project under a particular set of conditions, usually
specified in terms of response time, throughput, and resource-utilization
levels. Resource-utilization levels include the amount of processor capacity,
memory, disk I/O, and network I/O that your application consumes. Performance
targets typically equate to project goals.
|
|
Performance testing objectives
|
Performance testing objectives refer to data collected through the
performance-testing process that is anticipated to have value in determining
or improving product quality. However, these objectives are not necessarily
quantitative or directly related to a performance requirement, goal, or
stated quality of service (QoS) specification.
|
|
Performance thresholds
|
Performance thresholds are the maximum acceptable values for the metrics
identified for your project, usually specified in terms of response time,
throughput (transactions per second), and resource-utilization levels.
Resource-utilization levels include the amount of processor capacity, memory,
disk I/O, and network I/O that your application consumes. Performance
thresholds typically equate to requirements.
|
|
Resource utilization
|
Resource utilization is the cost of the project in terms of system
resources. The primary resources are processor, memory, disk I/O, and network
I/O.
|
|
Response time
|
Response time is a measure of how responsive an application or
subsystem is to a client request.
|
|
Saturation
|
Saturation refers to the point at which a resource has reached full
utilization.
|
|
Scalability
|
Scalability refers to an application’s ability to handle additional
workload, without adversely affecting performance, by adding resources such
as processor, memory, and storage capacity.
|
|
Scenarios
|
In the context of performance testing, a scenario is a sequence of
steps in your application. A scenario can represent a use case or a business
function such as searching a product catalog, adding an item to a shopping
cart, or placing an order.
|
|
Smoke test
|
A smoke test is the initial run of a performance test to see if
your application can perform its operations under a normal load.
|
|
Spike test
|
A spike test is a type of performance test focused on determining
or validating performance characteristics of the product under test when
subjected to workload models and load volumes that repeatedly increase beyond
anticipated production operations for short periods of time. Spike testing is
a subset of stress testing.
|
|
Stability
|
In the context of performance testing, stability refers to the
overall reliability, robustness, functional and data integrity, availability,
and/or consistency of responsiveness for your system under a variety
conditions.
|
|
Stress test
|
A stress test is a type of performance test designed to evaluate an
application’s behavior when it is pushed beyond normal or peak load
conditions. The goal of stress testing is to reveal application bugs that
surface only under high load conditions. These bugs can include such things
as synchronization issues, race conditions, and memory leaks. Stress testing
enables you to identify your application’s weak points, and shows how the
application behaves under extreme load conditions.
|
|
Throughput
|
Throughput is the number of units of work that can be handled per
unit of time; for instance, requests per second, calls per day, hits per
second, reports per year, etc.
|
|
Unit test
|
In the context of performance testing, a unit test is any test that
targets a module of code where that module is any logical subset of the
entire existing code base of the application, with a focus on performance
characteristics. Commonly tested modules include functions, procedures,
routines, objects, methods, and classes. Performance unit tests are
frequently created and conducted by the developer who wrote the module of
code being tested.
|
|
Utilization
|
In the context of performance testing, utilization is the percentage
of time that a resource is busy servicing user requests. The remaining
percentage of time is considered idle time.
|
|
Validation test
|
A validation test compares the speed, scalability, and/or stability
characteristics of the product under test against the expectations that have
been set or presumed for that product.
|
|
Workload
|
Workload is the stimulus applied to a system, application, or
component to simulate a usage pattern, in regard to concurrency and/or data
inputs. The workload includes the total number of users, concurrent active
users, data volumes, and transaction volumes, along with the transaction mix.
For performance modeling, you associate a workload with an individual
scenario.
|
Summary
Performance testing helps to identify bottlenecks in a system, establish a
baseline for future testing, support a performance tuning effort, and determine
compliance with performance goals and requirements. Including performance
testing very early in your development life cycle tends to add significant
value to the project.
For a performance testing project to be successful, the testing must be
relevant to the context of the project, which helps you to focus on the items
that that are truly important.
If the performance characteristics are unacceptable, you will typically want
to shift the focus from performance testing to performance tuning in order to
make the application perform acceptably. You will likely also focus on tuning if
you want to reduce the amount of resources being used and/or further improve
system performance.
Performance, load, and stress tests are subcategories of performance testing,
each intended for a different purpose.
Creating a baseline against which to evaluate the effectiveness of
subsequent performance-improving changes to the system or application will
generally increase project efficiency.
.gif)