.gif)
Performance Testing Guidance for Web Applications
J.D. Meier, Carlos Farre, Prashant Bansode, Scott Barber, and Dennis Rea
Microsoft Corporation
September 2007
Objectives
- Learn the difference between concurrent users and user
sessions and why this is important when defining input for Web load tests.
- Learn how to identify individual usage scenarios.
- Learn about the metrics that will help in developing
realistic workload characterizations.
- Learn how to incorporate individual usage scenarios and
their variances into user groups.
- Learn how to identify and model special considerations
when blending groups of users into single models.
- Learn how to construct realistic workload models for Web
applications based on expectations, documentation, observation, log files,
and other data available prior to the release of the application to
production.
Overview
The most common purpose of Web load tests is to simulate the
user’s experience as realistically as possible. For performance testing to
yield results that are directly applicable to understanding the performance
characteristics of an application in production, the tested workloads must
represent a real-world production scenario. To create a reasonably accurate
representation of reality, you must understand the business context for the use
of the application, expected transaction volumes in various situations,
expected user path(s) by volume, and other usage factors. By focusing on groups
of users and how they interact with the application, this chapter demonstrates an
approach to developing workload models that approximate production usage based
on various data sources.
Testing a Web site in such a way that the test can reliably
predict performance is often more art than science. As critical as it is to
creating load and usage models that will predict performance accurately, the
data necessary to create these models is typically not directly available to
the individuals who conduct the testing. When it is, it is typically not
complete or comprehensive.
While it is certainly true that simulating unrealistic
workload models can provide a team with valuable information when conducting
performance testing, you can only make accurate predictions about performance
in a production environment, or prioritize performance optimizations, when
realistic workload models are simulated.
How to Use This Chapter
Use this chapter to understand how to model workload
characterization, which can be used for performance testing to simulate
production characteristics. To get the most from this chapter:
- Use the “Approach for Modeling Application Usage” section to
get an overview of the approach for modeling workload characterization and
as a quick reference guide for you and your team.
- Use the various activity sections to understand the
details of the activities, and to find critical explanations of the
concepts of user behavior involved in workload modeling.
Approach for Modeling Application Usage
The process of identifying one or more composite application
usage profiles for use in performance testing is known as workload modeling.
Workload modeling can be accomplished in any number of ways, but to varying degrees
the following activities are conducted, either explicitly or implicitly, during
virtually all performance-testing projects that are successful in predicting or
estimating performance characteristics in a production environment:
- Identify the objectives.
- Identify key usage scenarios.
- Determine navigation paths for key scenarios.
- Determine individual user data and variances.
- Determine the relative distribution of scenarios.
- Identify target load levels.
- Prepare to implement the model.
These activities are discussed in detail in the following
sections.
Identify the Objectives
The objectives of creating a workload model typically center
on ensuring the realism of a test, or on designing a test to address a specific
requirement, goal, or performance-testing objective. (For more information, see
Chapter 9 – Determine Performance Testing Objectives and Chapter 10 – Quantify
End-User Response Time Goals.) When identifying the objectives, work with
targets that will satisfy the stated business requirements. Consider the
following key questions when formulating your objectives:
- What is the current or predicted business volume over
time? For example, how many orders are typically placed in a given time
period, and what other activities — number of searches, browsing, logging,
and so on — support order placement?
- How is the business volume expected to grow over time?
Your projection should take into account future needs such as business
growth, possible mergers, introduction of new products, and so on.
- What is the current or predicted peak load level? This
projection should reflect activities that support sales and other critical
business processes, such as marketing campaigns, newly shipped products,
time-sensitive activities such as stock exchange transactions dependent on
external markets, and so on.
- How quickly do you expect peak load levels to be reached?
Your prediction should take into consideration unusual surges in business
activity — how fast can the organization adjust to the increased demand
when such an event happens?
- How long do the peak load levels continue? That is, how
long does the new demand need to be sustained before exhaustion of a
resource compromises the service level agreements (SLAs)? For example, an
economic announcement may cause the currency-exchange market to experience
prolonged activity for two or three days, as opposed to just a few hours.
This information can be gathered from Web server logs,
marketing documentation reflecting business requirements, or stakeholders. The
following are some of the objectives identified during this process:
- Ensure that one or more models represent the peak expected
load of X orders being processed per hour.
- Ensure that one or more models represent the difference
between “quarterly close-out” period usage patterns and “typical business
day” usage patterns.
- Ensure that one or more models represent
business/marketing projections for up to one year into the future.
It is acceptable if these objectives only make sense in the
context of the project at this point. The remaining activities will help you
fill in the necessary details to achieve the objectives.
Considerations
Consider the following key points when identifying
objectives:
- Throughout the process of creating workload models,
remember to share your assumptions and drafts with the team and solicit
their feedback.
- Do not get overly caught up in striving for perfection,
and do not fall into the trap of oversimplification. In general, it is a
good idea to start executing tests when you have a testable model and then
enhance the model incrementally while collecting results.
Determine Key Usage Scenarios
To simulate every possible user task or activity in a
performance test is impractical, if not a sheer impossibility. As a result, no
matter what method you use to identify key scenarios, you will probably want to
apply some limiting heuristic to the number of activities or key scenarios you
identify for performance testing. You may find the following limiting
heuristics useful:
- Include contractually obligated usage scenario(s).
- Include usage scenarios implied or mandated by performance
testing goals and objectives.
- Include most common usage scenario(s).
- Include business-critical usage scenario(s).
- Include performance-intensive usage scenario(s).
- Include usage scenarios of technical concern.
- Include usage scenarios of stakeholder concern.
- Include high-visibility usage scenarios.
The following information sources are frequently useful in
identifying usage scenarios that fit into the categories above:
- Requirements and use cases
- Contracts
- Marketing material
- Interviews with stakeholders
- Information about how similar applications are used
- Observing and asking questions of beta-testers and
prototype users
- Your own experiences with how similar applications are
used
If you have access to Web server logs for a current
implementation of the application ― whether it is a production
implementation of a previous release, a representative prototype, or a beta
release ― you can use data from those logs to validate and/or enhance the
data collected using the resources above.
After you have collected a list of what you believe are the
key usage scenarios, solicit commentary from the team members. Ask what they
think is missing, what they think can be de-prioritized, and, most importantly,
why. What does not seem to matter to one person may still be critical to
include in the performance test. This is due to potential side effects that
activity may have on the system as a whole, and the fact that the individual
who suggests that the activity is unimportant may be unaware of the
consequences.
Considerations
Consider the following key points when determining key usage
scenarios:
- Whenever you test a Web site with a significant amount of
new features/functionality, use interviews. By interviewing the
individuals responsible for selling/marketing the new features, you will
find out what features/functions will be expected and therefore most
likely to be used. By interviewing existing users, you can determine which
of the new features/functions they believe they are most likely to use.
- When testing a pre-production Web site, the best option is
to roll out a (stable) beta version to a group of representative users (roughly
10-20 percent the size of the expected user base) and analyze the log
files from their usage of the site.
- Run simple in-house experiments using employees,
customers, clients, friends, or family members to determine, for example,
natural user paths and the page-viewing time differences between new and
returning users. This method is a highly effective method of data
collection for Web sites that have never been live, as well as a validation
of data collected by using other methods.
- Remember to ask about usage by various user types, roles,
or personas. It is frequently the case that team members will not remember
to tell you about the less common user types or roles if you do not
explicitly ask.
- Think about system users and batch processes as well as
human end users. For example, there might be a batch process that runs to
update the status of orders while users are performing activities in the
site. Be sure to account for those processes because they might be
consuming resources.
- For the most part, Web servers are very good at serving
text and graphics. Static pages with average-size graphics are probably
less critical than dynamic pages, forms, and multimedia pages.
- Think about nonhuman system users and batch processes as
well as end users. For example, there might be a batch process that runs
to update the status of orders while users are performing activities on
the site. In this situation, you would need to account for those processes
because they might be consuming resources.
- For the most part, Web servers are very effective at
serving text and graphics. Static pages with average-size graphics are
probably less critical than dynamic pages, forms, and multimedia pages.
Determine Navigation Paths for Key Scenarios
Now that you have a list of key scenarios, the next activity
is to determine how individual users actually accomplish the tasks or
activities related to those scenarios.
Human beings are unpredictable, and Web sites commonly offer
redundant functionality. Even with a relatively small number of users, it is
almost certain that real users will not only use every path you think they will
to complete a task, but they also will inevitably invent some that you had not
planned. Each path a user takes to complete an activity will put a different
load on the system. That difference may be trivial, or it may be enormous
― there is no way to be certain until you test it. There are many methods
to determine navigation paths, including:
- Identifying the user paths within your Web application
that are expected to have significant performance impact and that
accomplish one or more of the identified key scenarios
- Reading design and/or usage manuals
- Trying to accomplish the activities yourself
- Observing others trying to accomplish the activity without
instruction
After the application is released for unscripted user
acceptance testing, beta testing, or production, you will be able to determine
how the majority of users accomplish activities on the system under test by
evaluating Web server logs. It is always a good idea to compare your models
against reality and make an informed decision about whether to do additional
testing based on the similarities and differences found.
Apply the same limiting heuristics to navigation paths as
you did when determining which paths you wanted to include in your performance
simulation, and share your findings with the team. Ask what they think is
missing, what they think can be de-prioritized, and why.
Considerations
Consider the following key points when determining
navigation paths for key scenarios:
- Some users will complete more than one activity during a
visit to your site.
- Some users will complete the same activity more than once
per visit.
- Some users may not actually complete any activities during
a visit to your site.
- Navigation paths are often easiest to capture by using
page titles.
- If page titles do not work or are not intuitive for your
application, the navigation path may be easily defined by steps the user
takes to complete the activity.
- First-time users frequently follow a different path to
accomplish a task than users experienced with the application. Consider
this difference and what percentage of new versus return user navigation
paths you should represent in your model.
- Different users will spend different amounts of time on
the site. Some will log out, some will close their browser, and others
will leave their session to time out. Take these factors into account when
determining or estimating session durations.
- When discussing navigation paths with your team or others,
it is frequently valuable to use visual representations.
Example Visual Representation
.gif)
Figure 12.1 Workload for Key Scenarios
Determine Individual User Data and Variances
No matter how accurate the model representing navigation
paths and usage scenarios is, it is not complete without accounting for the
data used by and the variances associated with individual users. While thinking
of users as interchangeable entities leads to tests being simpler to design and
analyze, and even makes some classes of performance issues easier to detect, it
masks much of the real-world complexity that your Web site is likely to
encounter in production. Accounting for and simulating this complexity is
crucial to finding the performance issues most likely to be encountered by real
users, as well as being an essential element to making any predictions or
estimations about performance characteristics in production.
The sections that follow detail some of the sources of
information from which to model individual user data and variances, and some of
the data and variances that are important to consider when creating your model
and designing your tests.
Web Site Metrics in Web Logs
For the purposes of this chapter, Web site metrics are the
variables that help you understand a site’s traffic and load patterns from the
server’s perspective. Web site metrics are generally averages that may vary
with the flow of users accessing the site, but they generally provide a
high-level view of the site’s usage that is helpful in creating models for
performance testing. These metrics ultimately reside in the Web server logs.
(There are many software applications that parse these logs to present these
metrics graphically or otherwise, but these are outside of the scope of this
chapter.) Some of the more useful metrics that can be read or interpreted from
Web server logs (assuming that the Web server is configured to keep logs)
include:
- Page views per period. A page view is a
page request that includes all dependent file requests (.jpg files, CSS
files, etc). Page views can be tracked over hourly, daily, or weekly time
periods to account for cyclical patterns or bursts of peak user activity
on the Web site.
- User sessions per period. A user session is
the sequence of related requests originating from a user visit to the Web
site, as explained previously. As with page views, user sessions can span
hourly, daily, and weekly time periods.
- Session duration. This metric represents the
amount of time a user session lasts, measured from the first page request
until the last page request is completed. Session duration takes into
account the amount of time the user pauses when navigating from page to
page.
- Page request distribution. This metric represents
the distribution, in percentages, of page hits according to functional
types (Home, login, Pay, etc.). The distribution percentages will
establish a weighting ratio of page hits based on the actual user
utilization of the Web site.
- Interaction speed. This metric represents the time
users take to transition between pages when navigating the Web site,
constituting the think time behavior. It is important to remember that
every user will interact with the Web site at a different rate.
- User abandonment. This metric represents the
length of time that users will wait for a page to load before growing
dissatisfied and exiting the site. Sessions that are abandoned are quite
normal on the Internet and consequently will have an impact on the load
test results.
Determine the Relative Distribution of Scenarios
Having determined which scenarios to simulate and what the
steps and associated data are for those scenarios, and having consolidated
those scenarios into one or more workload models, you now need to determine how
often users perform each activity represented in the model relative to the
other activities needed to complete the workload model.
Sometimes one workload distribution is not enough. Research
and experience have shown that user activities often vary greatly over time. To
ensure test validity, you must validate that activities are evaluated according
to time of day, day of week, day of month, and time of year. As an example,
consider an online bill-payment site. If all bills go out on the 20th
of the month, the activity on the site immediately before the 20th
will be focused on updating accounts, importing billing information, and so on
by system administrators, while immediately after the 20th,
customers will be viewing and paying their bills until the payment due date of
the 5th of the next month. The most common methods for determining
the relative distribution of activities include:
- Extract the actual usage, load values, common and uncommon
usage scenarios (user paths), user delay time between clicks or pages, and
input data variance (to name a few) directly from log files.
- Interview the individuals responsible for
selling/marketing new features to find out what features/functions are
expected and therefore most likely to be used. By interviewing existing
users, you may also determine which of the new features/functions they
believe they are most likely to use.
- Deploy a beta release to a group of representative users (roughly
10-20 percent the size of the expected user base) and analyze the log
files from their usage of the site.
- Run simple in-house experiments using employees,
customers, clients, friends, or family members to determine, for example,
natural user paths and the page-viewing time differences between new and
returning users.
- As a last resort, you can use your intuition, or best
guess, to make estimations based on your own familiarity with the site.
Teams and individuals use a wide variety of methods to
consolidate individual usage patterns into one or more collective models. Some
of those include spreadsheets, pivot tables, narrative text, Unified Modeling
Language (UML) collaboration diagrams, Markov Chain diagrams, and flow charts.
In each case the intent is to make the model as a whole easy to understand,
maintain, and communicate across the entire team.
One highly effective method is to create visual models of
navigation paths and the percentage of users you anticipate will perform each
activity that are intuitive to the entire team, including end users,
developers, testers, analysts, and executive stakeholders. The key is to use
language and visual representations that make sense to your team without
extensive training. In fact, visual models are best when they convey their
intended meaning without the need for any training at all. After you create such
a model, it is valuable to circulate that model to both users and stakeholders
for review/comment. Following the steps taken to collect key usage scenarios, ask
the team members what they think is missing, what they think can be
de-prioritized, and why. Often, team members will simply write new percentages
on the visual model, making it very easy for everyone to see which activities have
achieved a consensus, and which have not.
Once you are confident that the model is appropriate for
performance testing, supplement that model with the individual usage data
collected for each navigation path during the “Determine Individual User Data
and Variances” activity, in such a way that the model contains all the data you
need to create the actual test.
Figure 12.2 Visual Model of Navigation
Paths
Considerations
Consider the following key points when determining the
relative distribution of scenarios:
- Create visual models and circulate them to users and
stakeholders for review/comment.
- Ensure that the model is intuitive to non-technical users,
technical designers, and everyone in between.
- Because performance tests frequently consume large amounts
of test data, ensure that you include enough in your data files.
- Ensure that the model contains all of the supplementary
data necessary to create the actual test.
Identify Target Load Levels
A customer visit to a Web site comprises a series of related
requests known as a user session. Users with different behaviors who navigate
the same Web site are unlikely to cause overlapping requests to the Web server
during their sessions. Therefore, instead of modeling the user experience on
the basis of concurrent users, it is more useful to base your model on user
sessions. User sessions can be defined as a sequence of actions in a
navigational page flow, undertaken by a customer visiting a Web site.
Quantifying the Volume of Application Usage: Theory
It is frequently difficult to determine and express an
application’s usage volume because Web-based multi-user applications communicate
via stateless protocols. Although terms such as “concurrent users” and
“simultaneous users” are frequently used, they can be misleading when applied
to modeling user visits to a Web site. In Figures 12.3 and 12.4 below, each
line segment represents a user activity, and different activities are
represented by different colors. The solid black line segment represents the
activity “load the Home page.” User sessions are represented horizontally
across the graph. In this hypothetical representation, the same activity takes
the same amount of time for each user. The time elapsed between the Start of
Model and End of Model lines is one hour.
.jpg)
Figure 12.3 Server Perspective of User
Activities
Figure 12.3 above represents usage volume from the perspective
of the server (in this case, a Web server). Reading the graph from top to
bottom and from left to right, you can see that user 1 navigates first to page
“solid black” and then to pages “white,” “polka dot,” “solid black,” “white,”
and “polka dot.” User 2 also starts with page “solid black,” but then goes to
pages “zebra stripe,” “grey,” etc. You will also notice that virtually any
vertical slice of the graph between the start and end times will reveal 10
users accessing the system, showing that this distribution is representative of
10 concurrent, or simultaneous, users. What should be clear is that the server
knows that 10 activities are occurring at any moment in time, but not how many
actual users are interacting with the system to generate those 10 activities.
Figure 12.4 below depicts
another distribution of activities by individual users that would generate the
server perspective graph above.
.jpg)
Figure 12.4 Actual Distribution of User
Activities Over Time
In this graph, the activities of
23 individual users have been captured. Each of these users conducted some
activity during the time span being modeled, and their respective activities
can be thought of as 23 user sessions. Each of the 23 users began interacting
with the site at a different time. There is no particular pattern to the order
of activities, with the exception of all users who started with the “solid
black” activity. These 23 users actually represent the exact same activities in
the same sequence shown in Figure 12.3. However, as depicted in Figure 12.4, at
any given time there are 9 to 10 concurrent users. The modeling of usage for
the above case in terms of volume can be thought of in terms of total hourly
users, or user sessions counted between “Start of Model” and “End of Model.”
Without some degree of empirical data (for example, Web
server logs from a previous release of the application), target load levels are
exactly that — targets. These targets are most frequently set by the business,
based on its goals related to the application and whether those goals are
market penetration, revenue generation, or something else. These represent the
numbers you want to work with at the outset.
Quantifying the Volume of Application Usage
If you have access to Web server logs for a current
implementation of the application — whether it is a production implementation
of a previous release, a representative prototype, or a beta release — you can
use data from these logs to validate and/or enhance the data collected by using
the resources above. By performing a quantitative analysis on Web server logs,
you can determine:
- The total number of visits to the site over a period of
time (month/week/day).
- The volume of usage, in terms of total averages and peak
loads, on an hourly basis.
- The duration of sessions for total averages and peak loads
on an hourly basis.
- The total hourly averages and peak loads translated into
overlapping user sessions to simulate real scalability volume for the load
test.
- The business cycles or special events that result in
significant changes in usage.
The following are the inputs and outputs used for
determining target load levels.
Inputs
- Usage data extracted from Web server logs
- Business volume (both current and projected) mapping to
objectives
- Key scenarios
- Distribution of work
- Session characteristics (navigational path, duration,
percentage of new users)
Output
By combining the volume information with objectives, key
scenarios, user delays, navigation paths, and scenario distributions from the
previous steps, you can determine the remaining details necessary to implement
the workload model under a particular target load.
Integrating Model Variance
Because the usage models are “best guesses” until production
data becomes available, it is a good idea to create no fewer than three usage
models for each target load. This has the effect of adding a rough confidence
interval to the performance measurements. Stakeholders can focus on the results
from one test based on many fallible assumptions, as well as on how many inaccuracies
in those assumptions are likely to impact the performance characteristics of
the application.
The three usage models that teams generally find most
valuable are:
- Anticipated Usage (the model or models you created in the
“Determine Individual User Data and Variance” activity)
- Best Case Usage, in terms of performance (that is,
weighted heavily in favor of low-performance cost activities)
- Worst
Case Usage, in terms of performance (that is, weighted heavily in favor of
high-performance cost activities)
The following chart is an example of the information that
testing for all three of these models can provide. As you can see, in this
particular case the Anticipated Usage and Best Case Usage resulted in similar
performance characteristics. However, the Worst Case Usage showed that there is
nearly a 50-percent drop-off in the total load that can be supported between it
and the Anticipated Usage. Such information could lead to a reevaluation of the
usage model, or possibly to a decision to test with the Worst Case Usage model
moving forward as a kind of safety factor until empirical data becomes
available.
.gif)
Figure 12.5 Usage Models
Considerations
Consider the following key points when identifying target
load levels:
- Although the volumes resulting from the activities above
may or may not end up correlating to the loads the application will
actually encounter, the business will want to know if and how well the
application as developed or deployed will support its target loads.
- Because the workload models you have constructed represent
the frequency of each activity as a percentage of the total load, you
should not need to update your models after determining target load
levels.
- Although it frequently is the case that each workload
model will be executed at a variety of load levels and that the load level
is very easy to change at run time using most load-generation tools, it is
still important to identify the expected and peak target load levels for
each workload model for the purpose of predicting or comparing with
production conditions. Changing load levels even slightly can sometimes
change results dramatically.
Prepare to Implement the Model
Implementation of the workload model as an executable test
is tightly tied to the implementation method — typically, creating scripts in a
load-generation tool. For more information about implementing and validating a
test, see Chapter 14 – Test Execution.
Considerations
Consider the following key points when preparing to
implement the model:
- Do not change your model without serious consideration
simply because the model is difficult to implement in your tool.
- If you cannot implement your model as designed, ensure
that you record the details about the model you do implement.
- Implementing the model frequently includes identifying
metrics to be collected and determining how to collect those metrics.
Summary
When conducting performance testing with the intent of
understanding, predicting, or tuning production performance, it is crucial that
test conditions be similar or at least close to production usage or projected
future business volume.
For accurate, predictive test results, user behavior must
involve modeling the customer sessions based on page flow, frequency of hits,
the length of time that users stop between pages, and any other factor specific
to how users interact with your Web site.
.gif)