.gif)
Performance Testing Guidance for Web Applications
J.D. Meier, Carlos Farre, Prashant Bansode, Scott Barber, and Dennis Rea
Microsoft Corporation
September 2007
Objectives
- Learn how to apply principles of effective reporting to
performance test data.
- Learn when to share technical results versus produce
stakeholder reports.
- Learn what questions various team members expect performance
reports to answer.
Overview
Managers and stakeholders need more than simply the results
from various tests — they need conclusions based on those results, and
consolidated data that supports those conclusions. Technical team members also
need more than just results — they need analysis, comparisons, and details of how
the results were obtained. Team members of all types get value from performance
results being shared more frequently. In this chapter, you will learn how to
satisfy the needs of all the consumers of performance test results and data by
employing a variety of reporting and results-sharing techniques, and by learning
exemplar scenarios where each technique tends be well received.
How to Use This Chapter
Use this chapter to understand the principles of effective performance
test results reporting, and as a reference for exemplars of effective data
presentation. To get the most from this chapter:
- Use the “Principles of Effective Reporting” section to
understand the key concepts and principles behind effective reporting.
- Use the “Frequently Reported Performance Data” section to
learn about various ways that performance data can be presented and the types
of results to which those methods are most effectively applied.
- Use the “Questions to Be Answered by Reporting” section to
understand how reports are designed for various audiences, and how to deliver
the right information to the right audience in a format that they find
intuitive.
Principles of Effective Reporting
The key to effective reporting is to present information of
interest to the intended audience in a quick, simple, and intuitive manner. The
following are some of underlying principles of effective reporting:
- Report early, report often
- Report visually
- Report intuitively
- Use the right statistics
- Consolidate data correctly
- Summarize data effectively
- Customize reports for the intended audience
- Use concise verbal summaries
- Make the data available
Report Early, Report Often
Continual sharing of information and data is critical to the
efficiency and overall success of a performance-testing project. However, not
all of the information and data to be shared needs to take the form of a formal
or semiformal report. One effective approach is to send stakeholders summary
charts and tables every day or two in an e-mail message that contains a concise
statement of key points. Use the feedback and questions you receive from those
stakeholders when deciding what to put in the next formal or semiformal report.
In this way you can gauge the needs of your audience before writing what is
intended to be a stand-alone or final document.
Sharing information and data with the technical team can be an
even more straightforward process. It may be as simple as posting the location
of the new results files to a team wiki before you begin analyzing them, and
then posting links to any charts and graphs that derive from your analysis.
Report Visually
Most people find that data and statistics reported in a graphical
format are easier to digest. This is especially true of performance results
data, where the volume of data is frequently very large and most significant
findings result from detecting patterns in the data. It is possible to find
these patterns by scanning through tables or by using complex mathematical
algorithms, but the human eye is far quicker and more accurate in the vast
majority of cases.
Once a pattern or “point of interest” has been identified
visually, you will typically want to isolate that pattern by removing the
remaining “chart noise.” In this context, chart noise includes all of the data points
representing activities and time slices that contain no points of interest
(that is, the ones that look like you expect them to). Removing the chart noise
enables you to more clearly evaluate the pattern you are interested in, and
makes reports more clear.
Report Intuitively
Whether formal or informal, reports should be able to stand
on their own. If a report leaves the reader with questions as to why the
information is important, the report has failed. While reports do not need to provide
the answers to issues to be effective, the issues should be quickly and
intuitively clear from the presentation.
One method to validate the intuitiveness of a report is to
remove all labels or identifiers from charts and graphs and all identifying
information from narratives and then present the report to someone unfamiliar
with the project. If that person is able to quickly and correctly point to the
issue of concern in the chart or graph, or identify why the issue discussed in
the narrative is relevant, then you have created an intuitive report.
Use the Right Statistics
Even though there is a widespread need to understand many
statistical concepts, many software developers, testers, and managers either do
not have strong backgrounds in or do not enjoy statistics. This can lead to
significant misrepresentations of performance test results when reporting. If
you are not sure what statistics to use to highlight a particular issue, do not
hesitate to ask for assistance.
Consolidate Data Correctly
While it is not strictly necessary to consolidate results,
it tends to be much easier to demonstrate patterns in results when those
results are consolidated into one or two graphs rather than distributed over
dozens. That said, it is important to remember that only results from identical
test executions that are statistically similar can be consolidated into
performance report output tables and charts.
Additional Considerations
In order for results to be consolidated, both the test and
the test environment must be identical, and the test results must be
statistically equivalent. One approach to determining if results are similar
enough to be consolidated is to compare results from at least five test
executions and apply the following rules:
- If more than 20 percent (or one out of five) of the test
execution results appear not to be similar to the rest, something
is generally wrong with the test environment, the application, or the test
itself.
- If a 95th percentile value for any test
execution is greater than the maximum or less than the minimum value for
any of the other test executions, it is not statistically similar.
- If every page/timer result in a test execution is
noticeably higher or lower on the chart than the results of all the rest
of the test executions, it is not statistically similar.
- If a single page/timer result in a test execution is
noticeably higher or lower on the chart than all the rest of the test
execution results, but the results for all the rest of the pages/timers in
that test execution are not, the test executions are probably
statistically similar.
Summarize Data Effectively
Summarizing results frequently makes it much easier to
demonstrate meaningful patterns in the test results. Summary charts and tables
present data from different test executions side by side so that trends and
patterns are easy to identify. The overall point of these tables and charts is
to show team members how the test results compare to the performance goals of
the system so they can make important decisions about taking the system live,
upgrading the system, or even, in some cases, completely reevaluating the
project.
Additional Considerations
Keep the following key points in mind when summarizing test
data:
- Use charts and tables that make your findings clear.
- Use text to supplement tables and charts, not the other
way around.
- If a chart or table is confusing to the reader, don’t use
it.
Customize Reports for the Intended Audience
Performance test results are most commonly read by one of
three audiences: technical team members, non-technical team members, and
stakeholders outside of the core team. These three groups tend to look for very
different things in a performance report and are inclined to prefer different
presentation methods. When reporting, make sure that you identify which group
or groups you are reporting to and what their expectations are before deciding
on the best way to present the results you have collected.
Use Concise Verbal Summaries
Results should have at least a short verbal summary
associated with them, and some results are best or most easily presented in
writing alone. What you decide to include in that text depends entirely on your
intended audience. Some audiences may require just one or two sentences capturing
the key point(s) you are trying to make with the graphic. For example:
“From observing this graph, you can see that the system
under test meets all stated performance goals up to 150 hourly users but at
that point degrades quickly to an essentially unusable state.”
Other audiences may also require a detailed explanation of
the graph being presented. For example:
“In this graph, you see the average response time in
seconds, portrayed vertically on the left side of the graph, plotted against
the total number of hourly users simulated during each test execution,
portrayed horizontally along the bottom of the graph. The intersection points
depict ”
Make the Data Available
There is a disturbingly popular belief that performance
testing (or other testing) data should not be shared in its raw form out of
fear that the consumers of that data will use or analyze it improperly. While
this concern is not invalid, of much greater concern is the fact that it is
simply not reasonable to expect any one person or team to be able to extract
all of the value from a set of data at one point in time. Data provides
different value to different people at different times, and the only way to get
the most out of the data is to make that data continually available to the
team. Additionally, making the data available tends to minimize some people’s perception
that the performance results are simply fabrications based on a set of tools
and processes that they do not understand.
Frequently Reported Performance Data
The following are the most frequently reported types of
results data. The sections that follow describe what makes this data
interesting to whom, as well as considerations for reporting that type of data.
- End-user response times
- Resource utilizations
- Volumes, capacities, and rates
- Component response times
- Trends
End-user Response Times
End-user response time is by far the most commonly requested
and reported metric in performance testing. If you have captured goals and
requirements effectively, this is a measure of presumed user satisfaction with
the performance characteristics of the system or application. Stakeholders are
interested in end-user response times to judge the degree to which users will
be satisfied with the application. Technical team members are interested
because they want to know if they are achieving the overall performance goals
from a user’s perspective, and if not, in what areas those goals not being met.
Exemplar1
.gif)
Figure 16.1 Response Time
Exemplar2
.gif)
Figure 16.2 Response Time Degradation
Considerations
Even though end-user response times are the most commonly
reported performance-testing metric, there are still important points to
consider.
- Eliminate outliers before reporting. Even one
legitimate outlier can dramatically skew your results.
- Ensure that the statistics are clearly communicated.
The difference between an average and a 90th percentile, for
example, can easily be the difference between “ship it” and “fix it.”
- Report abandonment separately. If you are
accounting for user abandonment, the collected response times for
abandoned pages may not represent the same activity as non-abandoned
pages. To be safe, report response times for non-abandoned pages with an
end-user response time graph and response times and abandonment percentages
by page on a separate graph or table.
- Report every page or transaction separately. Even
though some pages may appear to represent an equivalence class, there
could be differences that you are unaware of.
Resource Utilizations
Resource utilizations are the second most requested and
reported metrics in performance testing. Most frequently, resource utilization
metrics are reported verbally or in a narrative fashion. For example, “The CPU
utilization of the application server never exceeded 45 percent. The target is
to stay below 70 percent.” It is generally valuable to report resource
utilizations graphically when there is an issue to be communicated.
Exemplar for Stakeholders
.gif)
Figure 16.3 Processor Time
Exemplar for Technical Team Members
.gif)
Figure 16.4 Processor Time and Queue
Additional Considerations
Points to consider when reporting resource utilizations
include:
- Know when to report all of the data and when to
summarize. Very often, simply reporting the peak value for a
monitored resource during the course of a test is adequate. Unless an
issue is detected, the report only needs to demonstrate that the correct
metrics were collected to detect the issue if it were present during the
test.
- Overlay resource utilization metrics with other load
and response data. Resource utilization metrics are most powerful
when presented on the same graph as load and/or response time data. If
there is a performance issue, this helps to identify relationships across
various metrics.
- If you decide to present more than one data point,
present them all. Resource utilization rates will often change
dramatically from one measurement to the next. The pattern of change
across measurements is at least as important as the current value. Moving
averages and trend lines obfuscate these patterns, which can lead to
incorrect assumptions and regrettable decisions.
Volumes, Capacities, and Rates
Volume, capacity, and rate metrics are also frequently
requested by stakeholders, even though the implications of these metrics are often
more challenging to interpret. For this reason, it is important to report these
metrics in relation to specific performance criteria or a specific performance
issue. Some examples of commonly requested volume, capacity, and rate metrics
include:
- Bandwidth consumed
- Throughput
- Transactions per second
- Hits per second
- Number of supported registered users
- Number of records/items able to be stored in the database
Exemplar
.gif)
Figure 16.5 Throughput
Additional Considerations
Points to consider when reporting volumes, capacities and
rates include:
- Report metrics in context. Volume, capacity, and
rate metrics typically have little stand-alone value.
- Have test conditions and supporting data available.
While this is a good idea in general, it is particularly important with
volume, capacity, and rate metrics.
- Include narrative summaries with implications.
Again, while this is a good idea in general, it is virtually critical to
ensure understanding of volume, capacity, and rate metrics.
Component Response Times
Even though component response times are not reported to
stakeholders as commonly as end-user response times or resource utilization
metrics, they are frequently collected and shared with the technical team. These
response times help developers, architects, database administrators (DBAs), and
administrators determine what sub-part or parts of the system are responsible
for the majority of end-user response times.
Exemplar
.gif)
Figure 16.6 Sequential Consecutive Database
Updates
Additional Considerations
Points to consider when reporting component response times
include:
- Relate component response times to end-user activities.
Because it is not always obvious what end-user activities are impacted
by a component’s response time, it is a good idea to include those relationships
in your report.
- Explain the degree to which the component response time
matters. Sometimes the concern is that a component might become a
bottleneck under load because it is processing too slowly; at other times,
the concern is that end-user response times are noticeably degraded as a
result of the component. Knowing which of these conditions applies to your
project enables you to make effective decisions.
Trends
Trends are one of the most powerful but least-frequently
used data-reporting methods. Trends can show whether performance is improving
or degrading from build to build, or the rate of degradation as load increases.
Trends can help technical team members quickly understand whether the changes
they recently made achieved the desired performance impact.
Exemplar
.gif)
Figure 16.7 Response Time Trends for Key
Pages
Additional Considerations
Points to consider when reporting trends include:
- Trends typically do not add value until there are at
least three measurements. Sometimes trends cannot be effectively
detected until there are more than three measurements. Start creating your
trend charts with the first set of data, but be cautious about including
them in formal reports until you have collected enough data for there to
be an actual trend to report.
- Share trends with the technical team before including
them in formal reports. This is another generally good practice, but
it is particularly relevant to trends because developers, architects,
administrators, and DBAs often will have already backed out a change that
caused the trend to move in the wrong direction before they are able to
compile their report. In this case, you can decide that the trend report
is not worth including, or you can simply make an annotation describing
the cause and stating that the issue has already been resolved.
Questions to Be Answered By Reporting
Almost every team member has unique wants, needs, and
expectations when it comes to reporting data and results obtained through
performance testing. While this makes sharing information obtained through
performance testing challenging, knowing what various team members expect and
value in advance makes providing valuable information to the right people, at
the right level of detail and at the right time, much easier
All Roles
Some questions that are commonly posed by team members
include:
- Is performance getting better or worse?
- Have we met the requirements/service level agreements (SLAs)?
- What reports are available?
- How frequently can I get reports?
- Can I get a report with more/less detail?
Executive Stakeholders
Executive stakeholders tend to have very specific reporting
needs and expectations that are often quite different from those of other team members.
Stakeholders tend to prefer information in small, digestible chunks that clearly
highlight the key points. Additionally, stakeholders like visual
representations of data that are intuitive at a glance, as well as “sound bite”–size
explanations of those visual representations. Finally, stakeholders tend to
prefer consolidated and summarized information on a less frequent (though not
significantly less frequent) basis than other team members. The following are
common questions that executive stakeholders want performance test reports to answer:
- Is this ready to ship?
- How will these results relate to production?
- How much confidence should I have in these results?
- What needs to be done to get this ready to ship?
- Is the performance testing proceeding as anticipated?
- Is the performance testing adding value?
Project-Level Managers
Project-level managers — including the project manager,
development lead or manager, and the test lead or manager — have all of the
same needs and questions as the executive stakeholders, except that they want
the answers more frequently and in more detail. Additionally, they commonly
want to know the following:
- Are performance issues being detected efficiently?
- Are performance issues being resolved efficiently?
- What performance testing should we be conducting that we
currently are not?
- What performance testing are we currently doing that is not
adding value?
- Are there currently any blockers? If so, what are they?
Technical Team Members
Although technical team members have some degree of interest
in all of the questions posed by managers and stakeholders, they are more
interested in receiving a continual flow of information related to test
results, monitored data, observations, and opportunities for analysis and
improvement. Technical team members tend to want to know the following:
- What do these results mean to my specialty/focus area?
- Where can I go to see the results for the last test?
- Where can I go to get the raw data?
- Can you capture metric X during the next test run?
Types of Results Sharing
In the most basic sense, there are three distinct types of
results sharing: raw data display, technical reports, and stakeholder reports. While
all are based on timely, accurate, and relevant communication of results,
observations, concerns, and recommendations, each type targets a different
audience, and the most effective methods of communicating data differ dramatically.
Raw Data Display
While not explicitly a reporting scenario, the sharing of
raw data for collaboration purposes involves many of the same principles of data
presentation that are applied to reports in order to improve the effectiveness
of the collaboration.
In general, most people would rather view data and
statistics in graphical form instead of in tables. In some cases, however, tables
are the most efficient way to show calculated results for all of the data. It
is recommended that you use tables sparingly in reports, while including the
tabular form of the data used to create charts and graphs as an appendix or
attachment to a report, so that interested stakeholders can refer to it.
Results from the following types of tests can be well
represented in a tabular format:
- Baseline
- Benchmark
- Scalability
- Any other user-experience–based test
Tables are an excellent way to present volumes of data in a
clean and orderly manner and to support the findings they ultimately lead to. However,
you should be careful not to overuse tables in your reports. Many people quickly
skip over tables and read only the surrounding text or examine only the charts
that go with them. Be certain that whether you use the tables discussed below
or other types, you present in your report only those tables that clearly make
an important point. Huge tables containing all of the supporting data may be of
interest to a few individuals, but not to most, and thus should be included only
in an appendix to a report. Raw data is most commonly shared in the following
formats:
- Spreadsheets
- Text files (and regular expression searches)
- Data collection tools (“canned” reports)
Technical Reports
Technical reports are generally more formal than raw data
display, but not excessively so. Technical reports should stand on their own,
but since they are intended for technical members of the team who are currently
working on the project, they do not need to contain all of the supplemental
detail that a stakeholder report normally does. In the simplest sense,
technical reports are made up of the following:
- Description of the test, including workload model and test
environment
- Easily digestible data with minimal pre-processing
- Access to the complete data set and test conditions
- Short statements of observations, concerns, questions, and
requests for collaboration
Technical reports most commonly include data in the
following formats:
- Scatter plots
- Pareto charts
- Trend charts
- Summary spreadsheets
Stakeholder Reports
Stakeholder reports are the most formal of the performance
data sharing formats. These reports must be able to stand alone while at the
same time being intuitive to someone who is not working on the project in a
day-to-day technical role. Typically, these reports center on acceptance
criteria and risks. To be effective, stakeholder reports typically need to
include:
- The acceptance criteria to which the results relate
- Intuitive, visual representations of the most relevant
data
- A brief verbal summary of the chart or graph in terms of
criteria
- Intuitive, visual representations of the workload model
and test environment
- Access to associated technical reports, complete data sets,
and test conditions
- A summary of observations, concerns, and recommendations
When preparing stakeholder reports, consider that most stakeholder
reports meet with one (or more) of the following three reactions. All three are
positive in their own way but may not seem to be at first. These reactions and
some recommended responses follow:
- “These are great, but where’s the supporting data?”
This is the most common response from a technical stakeholder. Many
people and organizations want to have all of the data so that they can
draw their own conclusions. Fortunately, this is an easy question to
handle: simply include the entire spreadsheet with this supporting data as
an appendix to the report.
- “Very pretty, but what do they mean?” This is
where text explanations are useful. People who are not familiar with
performance testing or performance results often need to have the
implications of the results spelled out for them. Remember that more than
90 percent of the times, performance testers are the bearers of bad news
that the stakeholder was not expecting. The tester has the responsibility
to ensure that the stakeholder has confidence in the findings, as well as
presenting this news in a constructive manner.
- “Terrific! This is exactly what I wanted! Don’t worry
about the final report — these will do nicely.” While this seems
like a blessing, do not take it as one. Sooner or later, your tables and
charts will be presented to someone who asks one of the two preceding
questions, or worse, asks how the data was obtained. If there is not at
least a final report that tells people where to find the rest of the data,
people will question the results because you are not present to answer
those questions.
Creating a Technical Report
Although six key components of a technical report are listed
below, all six may not be appropriate for every technical report. Similarly,
there may be additional information that should be included based on exactly
what message you are trying to convey with the report. While these six
components will result in successful technical reports most of the time,
remember that sometimes creativity is needed to make your message clear and
intuitive.
Consider including the following key components when
preparing a technical report:
- A results graph
- A table for single-instance measurements (e.g., maximum
throughput achieved)
- Workload model (graphic)
- Test environment (annotated graphic)
- Short statements of observations, concerns, questions, and
requests for collaboration
- References section
Exemplar Results Graph
.jpg)
Figure 16.8 Consolidated Statistics
Exemplar Tables for Single-Instance Measurements
.gif)
Figure 16.9 Single Instance Measurements
Exemplar Workload Model Graphic
.gif)
Figure 16.10 Workload Model
Exemplar Test Environment Graphic
.gif)
Figure 16.11 Test Environment
Exemplar Summary Statement
“The results graph shows both response times and resource
utilization together. Close examination shows that Application Server CPU Usage
and queue length coincide with significantly degraded response time. It appears
as if the application server CPU usage was the catalyst to the degradation, but
this has yet to be confirmed. The remaining charts and graphs are included as
supplemental information for easy reference.”
Exemplar References Section
“Raw data and additional supporting information is
checked into the version-control system with the build and tagged as
PerfTest-{date}-{issue number}.”
Creating a Stakeholder Report
Although eight key components of a stakeholder report are
listed below, all eight may not be appropriate for every stakeholder report. Similarly,
there may be additional information that should be included based on exactly
what message you are trying to convey with the report. While these eight
components will result in successful stakeholder reports most of the time,
remember that sometimes creativity is needed to make your message clear and
intuitive.
Consider including the following key components when
preparing a stakeholder report:
- Criteria to which the results relate
- A results graph
- A table for single-instance measurements (e.g., maximum
throughput achieved)
- A brief verbal summary of the chart or graph in terms of
criteria
- Workload model (graphic)
- Test environment (annotated graphic)
- Summary of observations, concerns, and recommendations
- References section
Exemplar Criteria Statement
“This report relates to end-user response time
compliances as documented in the requirements management system as requirements
Perf### through Perf??? at one-half of the expected peak load with the most
commonly expected usage scenario.”
Exemplar Results Graph
.gif)
Figure 16.12 Response Time Compliance
Summary
Exemplar Tables for Single-Instance Measurements
.gif)
Figure 16.13 Single Instance Measurements
Exemplar Criteria-Based Results Summary
“All metrics collected achieved their required values
except for the response times of pages 8 and 10.
- Page 10 failed to achieve its required value by 2
percent.
- Page 8 failed to achieve its required value by 38
percent.”
Exemplar Workload Model Graphic
.gif)
Figure 16.14 Workload Model
Exemplar Test Environment Graphic
.gif)
Figure 16.15 Test Environment
Exemplar Observations and Recommendations Statement
“Based on the test conditions and results, the
performance testing and tuning team recommends the following.
- Continue performance testing with increasingly
strenuous scenarios and loads.
- Priority should be given to determining the root cause
of pages 8 and 10 not achieving their acceptance criteria, and
subsequently tuning those root causes.
- At such time as additional pages demonstrate a failure
to achieve their acceptance criteria, a dedicated root cause and tuning
cycle should be undertaken.”
Exemplar References Section
“All of the data used to create this report and execute
the tests that generated that data is checked into the version-control system
as read-only with the release candidate and tagged as PerfTest-{date}-{RC
number}-Validation.
“The same data has been temporarily copied to
{\\shared-resource\location} for individuals without access to the version-control
system.”
Summary
Performance test reporting is the process of presenting results
data that will support key technological and business decisions. The key to
creating effective reports is to consider the audience of the data before
determining how best to present the data. The most effective performance-test
results will present analysis, comparisons, and details behind how the results
were obtained, and will influence critical business decision-making.
.gif)