.gif)
Performance Testing Guidance for Web Applications
J.D. Meier, Carlos Farre, Prashant Bansode, Scott Barber, and Dennis Rea
Microsoft Corporation
September 2007
Objectives
- Learn how to determine realistic durations and
distribution patters for user delay times.
- Learn how to incorporate realistic user delays into test
designs and test scripts.
- Learn about key variables to consider when defining
workload characterization.
- Learn about the elements of user behavior that will aid
with modeling the user experience when creating load tests.
Overview
This chapter describes the process of determining realistic individual
user delays, user data, and abandonment. For performance testing to yield
results that are directly applicable to understanding the performance
characteristics of an application in production, the tested workloads must
represent the real-world production environment. To create a reasonably
accurate representation of reality, you must model users with a degree variability
and randomness similar to that found in a representative cross-section of
users.
How to Use This Chapter
Use this chapter to understand how to model variances such
as user delays, user data, and user abandonment so that your workload
characterization will create realistic usage patterns, thus improving the
accuracy of production simulations. To get the most from this
chapter:
- Use the “User Delay” section, along with the sections that
follow, to understand the key concepts of user delay modeling and its
impact on workload characterization.
- Use the “Determining Individual User Data” section to
understand the key concepts of user data and its impact on workload
characterization.
- Use the “User Abandonment” section to understand the key
concepts of user abandonment and its impact on workload characterization.
User Delays
The more accurately users are modeled, the more reliable
performance test results will be. One frequently overlooked aspect of accurate
user modeling is the modeling of user delays. This section explains how to
determine user delay times to be incorporated into your workload model and
subsequently into your performance scripts.
During a session, the user can be in a number of different
states — browsing, logging onto the system, and so on. Customers will have
different modes of interacting with the Web site; some users are familiar with
the site and quickly go from one page to another, while others take longer to
decide which action they will take. Therefore, characterizing user behavior
must involve modeling the customer sessions based on page flow, frequency of
hits, the amount of time users’ pause between viewing pages, and any other
factor specific to how users interact with your Web site.
Consequences of Improperly Modeling User Delays
To ensure realistic load tests,
any reasonable attempt at applying ranges and distributions is preferable to
ignoring the concept of varying user delays. Creating a load test in which
every user spends exactly the same amount of time on each page is simply not
realistic and will generate misleading results. For example, you can very
easily end up with results similar to the following.
.gif)
Figure 13.1 Results for Using Static User
Delays
In case you are not familiar with
response graphs, each dot represents a user activity (in this case, a page
request); the horizontal axis shows the time, in seconds, from the start of the
test run; and individual virtual testers are listed on the vertical axis. This
particular response graph is an example of “banding” or “striping.” Banding
should be avoided when doing load or performance testing, although it may be
valuable as a stress test. From the server’s perspective, this test is the same
as 10 users executing the identical actions synchronously: Home pageà wait x secondsà page1.
To put a finer point on it, hold a
ruler vertically against your screen and move it slowly across the graph from
left to right. This is what the server sees: no dots, no dots, no dots, lots of
dots, no dots. This is a very poor representation of actual user communities.
The following figure is a much
better representation of actual users, achieved by adding some small-range uniform
and normally distributed delays to the same test.
.gif)
Figure 13.2 Results for Using Normally
Distributed User Delays
If you perform the same activity
with the ruler, you will see that the dots are more evenly distributed this
time, which dramatically increases both the realism of the simulated load and
the accuracy of the performance test results.
Step 1 – Determine User Delays
Delays that occur while users view content on Web pages —
also commonly known as think times — represent the answers to questions
such as “How long does it take a user to enter their login credentials?” and
“How much time will users spend reading this page?” You can use several different
methods to estimate think times associated with user activities on your Web
site. The best method, of course, is to use real data collected about your
production site. This is rarely possible, however, because testing generally
occurs before the site is released to production. This necessitates making
educated guesses or approximations regarding activity on the site.
The most commonly useful methods of determining this include
the following:
- When testing a Web site that is already in production, you
can determine the actual values and distribution by extracting the average
and standard deviation for user viewing (or typing) time from the log file
for each page. With this information, you can easily determine the think
time for each page. Your production site may also have Web traffic–monitoring
software that provides this type of information directly.
- If you have no log files, you can run simple in-house
experiments using employees, customers, clients, friends, or family
members to determine, for example, the page-viewing time differences
between new and returning users. This type of simplified usability study
tends to be a highly effective method of data collection for Web sites
that have never been live, as well as validation of data collected by using
other methods.
- Time yourself using the site, or by performing similar
actions on a similar site. Obviously, this method is highly vulnerable to
personal bias, but it is a reasonable place to start until you get a
chance to time actual users during User Acceptance Testing (UAT) or
conduct your own usability study.
- In the absence of any better source of information, you
can leverage some of the metrics and statistics that have already been
collected by research companies such as Nielsen//NetRatings, Keynote, or
MediaMetrix. These statistics provide data on average page-viewing times
and user session duration based on an impersonal sample of users and Web
sites. Although these numbers are not from your specific Web site, they
can work quite well as first approximations.
There is no need to spend a lot of time collecting
statistically significant volumes of data, or to be excessively precise. All
you really need to know is how long a typical user will spend performing an
activity, give or take a second or two. However, depending on the nature of
your site, you may want to determine user delay times separately for first-time
and experienced users.
Step 2 – Apply Delay Ranges
Simply determining how much time one person spends visiting
your pages, or what the variance in time between users is, is not enough in
itself — you must vary delay times by user. It is extremely unlikely that each
user will spend exactly the same amount of time on a page. It is also extremely
likely that conducting a performance test in which all users spend the same
amount of time on a page will lead to unrealistic or at least unreliable
results.
To convert the delay times or delay ranges from step 1 into
something that also represents the variability between users, the following three
pieces of information are required:
- The minimum delay time
- The maximum delay time
- The distribution or pattern of user delays between those
points
If you do not have a minimum and maximum value from your
analysis in step 1, you can apply heuristics as follows to determine acceptable
estimates:
- The minimum value could be:
- An experienced user who intended to go to the page but
will not remain there long (for example, a user who only needs the page
to load in order to scan, find, and click the next link).
- A user who realized that they clicked to the wrong page.
- A user who clicked through a form that had all of its
values pre-filled.
- The minimum length of time you think a user needs to type
the required information into the form.
- Half of the value that you determined was “typical.”
- The maximum value could be:
- Session time-out.
- Sufficient time for a user to look up information for a
form.
- No longer than it takes a slow reader to read the entire
page.
- The time it takes to read, three times out loud, the text
that users are expected to read. (This is the heuristic used by the film
industry for any onscreen text.)
- Double the value that you determined was “typical.”
Although you want your estimate to be relatively close to
reality, any range that covers ~75 percent of the expected users is sufficient
to ensure that you are not unintentionally skewing your results.
Step 3 – Apply Distributions
There are numerous mathematical models for these types of
distributions. Four of these models cover the overwhelming majority of user
delay scenarios:
- Linear or uniform distribution
- Normal distribution
- Negative exponential distribution
- Double hump normal distribution
Linear or Uniform Distribution
A uniform distribution between a
minimum and a maximum value is the easiest to model. This distribution model
simply selects random numbers that are evenly distributed between the upper and
lower bounds of the range. This means that it is no more likely that the number
generated will be closer to the middle or either end of the range. The figure
below shows a uniform distribution of 1000 values generated between 0 and 25. Use
a uniform distribution in situations where there is a reasonably clear minimum
and maximum value, but either have or expect to have a distinguishable pattern
between those end points.
.gif)
Figure 13.3 Uniform Distribution
Normal Distribution
A normal distribution, also known
as a bell curve, is more difficult to model but is more accurate in
almost all cases. This distribution model selects numbers randomly in such a
way that the frequency of selection is weighted toward the center, or average
value. The figure below shows a normal distribution of 1000 values generated
between 0 and 25 (that is, a mean of 12.5 and a standard deviation of 3.2).
Normal distribution is generally considered to be the most accurate
mathematical model of quantifiable measures of large cross-sections of people
when actual data is unavailable. Use a normal distribution in any situation
where you expect the pattern to be shifted toward the center of the end points.
The valid range of values for the standard deviation is from 0 (equivalent to a
static delay of the midpoint between the maximum and minimum values) and the
maximum value minus the minimum value (equivalent to a uniform distribution).
If you have no way to determine the actual standard deviation, a reasonable
approximation is 25 percent of (or .25 times the range) of the delay.
.gif)
Figure 13.4 Normal Distribution
Negative Exponential Distribution
Negative exponential distribution creates a distribution similar
to that shown in the graph below. This model skews the frequency of delay times
strongly toward one end of the range. This model is most useful for situations
such as users clicking a “play again” link that only activates after multimedia
content has completed playing. The following figure shows a negative
exponential distribution of 1000 values generated between 0 and 25.
.gif)
Figure 13.5 Negative Exponential
Distribution
Double Hump Normal Distribution
The double hump normal
distribution creates a distribution similar to that shown in the graph below. To
understand when this distribution would be used, consider the first time you visit
a Web page that has a large amount of text. On that first visit, you will
probably want to read the text, but the next time you might simply click
through that page on the way to a page located deeper in the site. This is
precisely the type of user behavior this distribution represents. The figure below
shows that 60 percent of the users who view this page spend about 8 seconds on
the page scanning for the next link to click, and the other 40 percent of the
users actually read the entire page, which takes about 45 seconds. You can see
that both humps are normal distributions with different minimum, maximum, and
standard deviation values.
.gif)
Figure 13.6 Double Hump Normal Distribution
To implement this pattern, simply write a snippet of code to
generate a number between 1 and 100 to represent a percentage of users. If that
number is below a certain threshold (in the graph above, below 61), call the
normal distribution function with the parameters to generate delays with the
first distribution pattern. If that number is at or above that threshold, call
the normal distribution function with the correct parameters to generate the
second distribution pattern.
Determining Individual User Data
Once you have a list of key scenarios, you will need to
determine how individual users actually accomplish the tasks or activities
related to those scenarios, and the user-specific data associated with a user
accomplishing that task or activity.
Unfortunately, navigation paths alone do not provide all of
the information required to implement a workload simulation. To fully implement
the workload model, you need several more pieces of information. This
information includes:
- How long users may spend on a page?
- What data may need to be entered on each page?
- What conditions may cause a user to change navigation
paths?
Considerations
Consider the following key points when identifying unique
data for navigation paths and/or simulated users:
- Performance tests frequently consume large amounts of test
data. Ensure that you have enough data to conduct an effective test.
- Using the same data repeatedly will frequently lead to
invalid performance test results.
- Especially when designing and debugging performance tests,
test databases can become dramatically overloaded with data. Periodically
check to see if the data base is storing unrealistic volumes of data for
the situation you are trying to simulate.
- Consider including invalid data in your performance tests.
For example, include some users who mistype their password on the first
attempt but do it correctly on a second try.
- First-time users usually spend significantly longer on
each page or activity than experienced users.
- The best possible test data is test data collected from a
production database or log file.
- Consider client-side caching. First-time users will be
downloading every object on the site, while frequent visitors are likely
to have many static objects and/or cookies stored in their local cache.
When capturing the uniqueness of the user’s behavior, consider whether
that user represents a first-time user or a user with an established
client-side cache.
User Abandonment
User abandonment refers to situations where customers
exit the Web site before completing a task, because of performance slowness.
People have different rates of tolerance for performance, depending on their
psychological profile and the type of page they request. Failing to account for
user abandonment will cause loads that are highly unrealistic and improbable.
Load tests should simulate user abandonment as realistically as possible or
they may cause types of load that will never occur in real life — and create
bottlenecks that might never happen with real users. Load tests should report
the number of users that might abandon the Web site due to poor performance.
In a typical Web site traffic pattern, when the load gets
too heavy for the system/application to handle, the site slows down, causing
people to abandon it, thus decreasing the load until the system speeds back up
to an acceptable rate. Abandonment creates a self-policing mechanism that
recovers performance at previous levels (when the overload occurred), even at
the cost of losing some customers. Therefore, one reason to accurately account
for user abandonment is to see just how many users “some” is. Another reason is
to determine the actual volume your application can maintain before you start
losing customers. Yet another reason to account for user abandonment is to
avoid simulating, and subsequently resolving, bottlenecks that realistically
might not even be possible.
If you do not account for abandonment at all, the load test
may wait indefinitely to receive the page or object it requested. When the test
eventually receives that object, even if “eventually” takes hours longer than a
real user would wait, the test will move on to the next object as if nothing
were wrong. If the request for an object simply is not acknowledged, the test
skips it and makes a note in the test execution log with no regard as to
whether that object was critical to the user. Note that there are some
cases where not accounting for abandonment is an accurate representation of
reality; for instance, a Web-based application that has been exclusively
created for an audience that has no choice but to wait because there are no
alternative methods of completing a required task.
Considerations
The following are generally useful guidelines related to
user abandonment:
- Check the abandonment rate before evaluating response
times. If the abandonment rate for a particular page is less than about 2
percent, consider the possibility of those response times being outliers.
- Check the abandonment rate before drawing conclusions
about load. Remember, every user who abandons is one less user applying
load. Although the response-time statistics may look good, if you have
75-percent abandonment, load is roughly 75 percent lighter than it was
being tested for.
- If the abandonment rate is more than about 20 percent,
consider disabling the abandonment routine and re-executing the test to
help gain information about what is causing the problem.
Summary
The process of designing realistic user delays into tests
and test scripts is critical for workload characterizations to generate
accurate results. For performance testing to yield results that are directly
applicable to understanding the performance characteristics of an application
in production or a projected future business volume, the tested workloads must
represent reality, replicating user delay patterns.
To create a reasonably accurate representation of reality, you
must model user delays with variability and randomness by taking into account
individual user data and user abandonment, similar to a representative
cross-section of users.
.gif)