Curt Devlin
Industry Architect—Solutions
Microsoft Corporation
developer & platform evangelism
February 2008
Summary: The Transaction Cost Analysis (TCA) methodology has proven its value
as a tool for capacity planning and system bottleneck detection of conventional
Web sites engaged in conventional business-to-consumer (B2C) scenarios. It is
possible to adapt TCA to yield similar benefits for software-as-a-service
(SaaS) capacity planning. This is a theoretical discussion about how and why
this methodology may need to be adapted to help answer some pressing
architectural questions that will inevitably confront organizations seeking to
build, run, consume, or monetize SaaS. This article also considers some of the
theoretical and practical limitations of TCA that must be addressed to
accurately forecast SaaS solution capacity. By recognizing and carefully
defining them, we can carefully stay within these prescribed limitations and
resist the temptation to over-conclude from TCA results.
This article offers some guidance and
suggestions for adapting the TCA methodology to the specific requirements of
capacity planning for SaaS. If properly adapted, TCA can provide an important
weapon in the fight for accurate capacity planning and for mitigating some of
the risks associated with service-level assurance (SLA). When TCA was first
developed, B2C Web applications were the prevalent model for Internet services.
Many aspects of SaaS, such as B2B services and multi-tenancy, were relatively
uncommon during this period. Although SaaS offers some compelling advantages,
the tools and techniques previously used for driving load and measuring results
will need to be modified for testing these emerging service scenarios. To take
full advantage of TCA, we will need a few practical and theoretical adaptations
to achieve accurate forecasting.
Contents
The Origin of the
Species
SaaS and Capacity
Planning
SaaS Challenges
It Isn’t Easy Being Green
Revisiting TCA
Some Notes About
Terminology
Beyond the Point of
Exhaustion
Usage
Profile, Verification, and Impact Prediction
Characterizing
Load: Adapting Usage Profiles for SaaS
SaaS
Composite Architecture and TCA Challenges
Performance
Targets: Beware the Latent Latency
Not All Computer are
Created Equal
Probing the Engines
of Cyberspace
Generating Load:
Send in the Clones
Composition
Architecture: TCA Over the Internet
A Habit of Highly Effective
TCA: Sharpening the Tools
TCA As an Integral
Development Tool
Conclusions
References
The
Origin of the Species
Transaction Cost Analysis (TCA) is a fairly
rigorous and proven methodology for conducting capacity planning for commercial
Web sites. TCA is superior to conventional stress testing for this purpose
because it can provide a predictive model of capacity; this means it is capable
of accommodating unanticipated growth or change in usage patterns. A predictive
model is especially important for the multi-tenant SaaS model because it is
likely that different tenants will manifest different usage patterns—even for
the same services. In these cases, it will not be practically feasible to
conduct a new round of testing to determine how each new tenant on-boarding
will impact capacity.
The theoretical foundations of the TCA
methodology, not to be confused with concept of TCA in economics, were worked
out at Microsoft by Morgan Oslake, Hilal Al-Hilali, David Guimbellot, Perry
Clarke, and David Howell as Capacity Model for Internet Transactions and were published in 1999. The
methodology has been used to publish capacity planning guidelines for some Microsoft
products, including Site Server and Exchange Server; and later, they were used
for Commerce Server under the title of Commerce Server 2002 Performance and Transaction Cost Analysis for
the Catalog System. In addition to these, TCA has
been used by many other companies to study and predict capacity, detect
bottlenecks, and establish accurate hardware requirements of their own systems.
Although this methodology has been
criticized, most of the criticism (and some of the praise) is based on
fundamental misunderstandings of TCA and what it can yield. If it can be
successfully adapted, it can also be highly useful for capacity planning for SaaS
systems. However, reliable results can only be achieved if TCA is used for its
intended purpose and within its prescribed limitations.
SaaS and Capacity Planning
The idea of software-as-a-service (SaaS) is
a highly compelling one. In some respects, SaaS is a logical outgrowth and
extension of service-oriented architecture (SOA). To state the case naively, if
services are intended to be autonomous, contract-based, interoperable, and
accessible through HTTP, there should not be any obstacle to hosting them on
the Internet. SaaS, sometimes referred to as "services-in-the-cloud,"
is essentially the delivery of software in the form of services. SaaS can be
thought of as the service-oriented component of the Microsoft Software and
Services (S+S) approach. This article focuses on the SaaS component because it
poses the greatest challenges for TCA.
The idea of services-in-the-cloud promises
some compelling advantages—particularly for the enterprise. Perhaps most
importantly, this model promises to help drive down costs. For many companies,
the prospect of consuming services in the cloud promises to dramatically reduce
the costs associated with hosting them in their own data center. SaaS can also
make costs more predictable and easier to budget. The consumer has an opportunity
to shift the burden of patching, upgrading, adding new services, and ensuring
availability to an off-premise service provider. By annualizing costs through a
service contract, business managers can focus their attention on the growth and
development of the core business itself.
The SaaS model also holds great promise for
service providers. Although distributed computing has improved the
accessibility of information at the edge of the network, it has often done so
at the cost of increasing system complexity. As a rule, complexity drives
specialization, and IT is no exception. However, the SaaS model makes it
economically feasible for service providers to concentrate highly specialized
expertise—this makes the complexity of services more manageable. By using SaaS,
providers can amortize the cost of hardware, software, and business expertise,
as well as other resources, across multiple service consumers. In particular,
the multi-tenancy hosting model makes it possible to create business
efficiencies by sharing service resources among multiple consumers. SaaS offers
economies of scale that are hard to come by in more traditional, self-hosted
models of computing.
SaaS Challenges
The case for SaaS is compelling, indeed,
but whether you plan to build, run, consume, or monetize SaaS, you will face a
number of important questions and challenges that can have a decisive impact on
your success. The following are some questions you may consider:
·
Should you use on-premise hosting or off-premise
hosting?
·
Should you use a single-tenancy model,
multi-tenancy model, or hybrid model?
·
How should you plan capacity for shared
resources?
·
How should you predict resource requirements for
tenant on-boarding?
·
How can you support or rely on SLA targets?
·
How can you maximize performance, scalability,
and availability?
·
How should you minimize hardware and energy
costs in the service center?
·
How should you estimate the impact of proposed
implementation changes?
Although a TCA study cannot provide
complete answers to all these questions, it may prove to be an invaluable tool
for quantifying hardware resource requirements, capacity and scalability
thresholds, and predicting impact of changing usage patterns. It can also help
identify key bottlenecks through pre-production testing and analysis; this
helps architects better understand the scalability and performance
characteristics of the systems they design before they reach production.
Both consumers and providers have a deeply
vested mutual interest in ensuring that SaaS services can meet requirements for
scalability, availability, and performance. A serious slowdown or interruption
of a service could be extremely damaging for both consumers and providers.
Because each service aggregator may be both a consumer and a provider of
services, they will have a powerful interest in meeting these requirements from
both perspectives. For providers who host using a multitenant model, serious
damage to the quality of service could have catastrophic consequences. For
multi-tenancy, pronounced slowdowns or disruptions will usually impact many, if
not all, tenants at once.
It
Isn’t Easy Being Green
A service-level-agreement (SLA) can define metrics for the key “abilities”; but taken alone, it can do little to
ensure that these metrics can actually be met. Faced with these uncertainties,
some service providers will resort to simply supersizing their hardware
resources, resulting in excessive cost, poor efficiency, and little or no
ability to predict what impact of growth or change in usage patterns. In some
cases, supersizing can actually mask key system bottlenecks. Such bottlenecks,
if undetected, can result in poor performance or scalability when a threshold
is breached—despite the expensive hardware.
In addition to the hardware costs, the
supersizing approach tends to accrue excess energy consumption and cost as
well. Unfortunately, when faced with the inherent danger of damaging their core
business from insufficient capacity, most SaaS hosts will opt to
oversize—unless they have a refined understanding of existing work capacity and
confidence in their ability to forecast their capacity to meet growing demand
for service.
The question must be asked: No matter how
energy efficient the service center hardware is, can we declare that it is
green, knowing that it is grossly over capacity? By analogy, am I being green
if I commute in the most fuel efficient SUV available, when a small or midsize
hybrid would easily suffice?
Perhaps efficiency must be gauged against
the actual work that must be done. By rightsizing the hardware based on the
work capacity needed, we can arrive at a more truly green concept of service
center efficiency. To do this, we must be able to carefully measure capacity
based on the actual services running in the service center. Fortunately, the
business incentive and the ecological incentive are in close alignment here.
Rightsizing can reduce service center cost and climate impact at the same time.
With global warming hard upon us, rightsizing is something that can be done
today while we await further improvements in hardware efficiency. To the extent
that a TCA study can provide more accurate capacity planning, it can help save hardware and energy cost.
Revisiting TCA
A detailed description of TCA is beyond the
scope of this article, but a brief review will help set the stage for
understanding how and why it must be adapted for SaaS scenarios.
Here’s an outline of the basic steps in
their original formulation:
1. Select the highest end hardware platform.
2. Simulate load of a linear operating regime to cost each transaction.
3. Interpolate resource costs over this regime.
4. Calculate resource cost as a function of usage profile.
For readers who are new to TCA, a more
detailed description of these steps is available to you. How
to Perform Capacity Planning for .NET Applications
provides a brief primer.
Some Notes About Terminology
The terms associated with TCA are highly
overloaded. Some clarification may help to head of some common confusion.
Cost
For TCA, the term cost means the amount
of system resources that is used during execution of a service; for example,
the percentage of the total available CPU cycles used during the execution of a
specific operation.
Transaction
In TCA, a transaction is simply a
discrete operation. For a Web site, a page request/response might serve as a
transaction. The conventional meaning of an ACID database transaction,
involving a two-phase commit, is orthogonal to this notion. A TCA transaction
may involve one, many, or no ACID transactions.
Usage Profile
In simplest terms, a usage profile is the
collection of all significant operations performed by a particular site, along
with the relative frequency and cost of each of these operations. Some
confusion has arisen around this term because it has been closely associated
with behavior of the “average user” This is discussed in detail later in this
article.
Capacity
For TCA, capacity is defined as the
ability to execute work measured in transactions per second. The capacity of a
site is defined by its maximum throughput.
Beyond the Point of Exhaustion
To understand how and why TCA should be
adapted for SaaS scenarios, we must begin by understanding its basic
theoretical foundations. This discussion will begin by focusing on steps 2 and
3:
2. Simulate load of a linear operating regime to cost each transaction.
3. Interpolate resource costs over this regime.
These two steps comprise the heart of this methodology. Step 2
is actually comprised of a series of discrete tests for each operation
(transaction) type, gradually increasing load for each test. During each test,
CPU, disk I/O, memory, and network utilization are measured to determine the
resource costs per transaction. The purpose of this is to define the point of
exhaustion—where the server has exhausted its capacity to do further useful
work.
.jpg)
Figure 1. A typical TCA pattern of
exhaustion
With conventional load testing, the point
of exhaustion is typically defined when a limiting resource (such as CPU) has
reached 100 percent utilization. In contrast, the point of exhaustion for TCA
must be defined by the maximum useful work or throughput that can be executed
on a single computer. This also means that at least one computer must be
measured on each physical tier of the service being tested.
Using Figure 1 as a hypothetical example,
the maximum capacity for this computer is reached in iteration 7 at 80
operations (transactions) per second. Notice that we must continue adding load
and iterating until we have test beyond the inflexion point for the transaction
rate to begin to tail off. If we were to naively extrapolate based on the first
four or five test iterations, we may be greatly overestimating capacity. On the
other hand, if we only test under maximum utilization (100 percent), we would
end up underestimating the computer's capacity by testing well beyond the point
of exhaustion, where capacity may be significantly less than maximum
capability.
Simply quantifying the maximum throughput
for a particular computer and a particular operation does not provide a sound
basis for extrapolating to new hardware or a changing usage pattern. For that,
we need to carefully measure resource utilization and correlate it with peak
throughput. This brings us to step 3.
Measuring the Cost
TCA measures the cost of a transaction in
terms of utilization of important resources. With notable exceptions, most
services tend to be constrained by CPU utilization regardless of whether they
are Web pages or Web services. For this reason, measuring CPU is a strict
requirement for TCA testing. However, measuring memory, disk I/O, and network
usage are also of primary importance for gaining a full and accurate picture of
the hardware requirements for each tier of the system. These resources are the
“big four,” but each may need to be measured by several specific metrics. For
example, it is sometimes valuable to measure %CPU for separate processes
that are part of the application. In addition, some metrics can be vital for
detecting system bottlenecks when CPU is not the constraining resource. For a
system primarily constrained by disk I/O, measuring %disk read and %disk
write may indicate the most effective way to optimize and perhaps eliminate
the disk bottleneck. For a system where network utilization is a key factor, it
may be necessary to measure specific network cards separately so that frontend
traffic and backend traffic can be differentiated, while traffic created by
collecting performance counters can be ignored.
Assuming a typical case, Figure 2 shows a
common CPU utilization pattern for the Service Operation 1 example we started
in Figure 1. For a service constrained by CPU, this is very common pattern of
resource utilization. Sometimes referred to as a knee-curve, utilization grows
very slowly at first, reaches an inflection point, and then skyrockets toward
100 percent utilization.
.jpg)
Figure 2. A typical pattern for CPU utilization
A deeper explanation of this utilization
pattern is beyond the scope of this article, but many multithreaded,
server-side Windows-based applications adhere to it. In some cases, as load
begins to escalate, so do operating system–level activities such as (thread)
context-switching, CPU consumption, virtual memory management (page faulting), and so on. For this discussion, it must suffice to note that when resource utilization skyrockets, throughput generally declines. Later, we shall
see that average operation latency also skyrockets.
Identifying this inflection point is the
key to developing an accurate predictive model. The data point of greatest
interest in this trend is the one that corresponds to the point of maximum
throughput. If we superimpose the throughput trend on the utilization trend, as
shown in Figure 3, we can highlight this critical turning point where
throughput and utilization become inversely related. By correlating the % CPU
utilization at maximum throughput, we can determine a fairly precise and
reliable cost per transaction.
In the example, suppose the server being
tested is a 4-CPU, with 3.5 GHz processor speed. We can compute the theoretical
maximum computational cycles available as 14B cycles/sec or 14,000 Mcycles.
Figure 3 shows that when the server’s capacity to do useful work reaches the
point of exhaustion, 66 percent of the available cycles have been used. We are
now in a position to calculate the actual cost per transaction in CPU cycles at
the point of peak throughput on this computer:
66% x 14,000 Mcycles/sec / 80 transaction/sec = 115.5 Mcycles/transaction
.jpg)
Figure 3. Resource Utilization
Superimposed on Throughput
Similar calculations can be made to compute
the transaction cost in terms of memory, disk I/O, and network utilization for
each operation. By repeating a regimen of tests for each service operation (on
each physical tier) we can measure these costs for the entire site. Although
these other resources are sometimes ignored to streamline testing, it is done
at the peril of overlooking existing or potential bottlenecks and critical
thresholds in the system. If existing bottlenecks are detected, they can
sometimes be eliminated with optimizations. If a potential bottleneck is
recognized, it can often be avoided. For example, if it is known that a storage
subsystem is operating near its maximum capacity, a faster system may be needed
in production to ensure some headroom for growth.
After the cost for each operation is
carefully measured in isolation, the next step is to calculate the total
capacity of the host site, but to do this we will need to know the proportion
of load that each operation represents with respect to site as a whole.
Defining these relative proportions is the purpose of a usage profile,
which is discussed next. However, before turning to the topic of usage
profiles, it would be negligent to overlook some important conclusions.
This inverse relationship between
throughput and utilization that typically occurs beyond the point of maximum
throughput (and is illustrated in Figure 3) means that resource utilization is
a poor predictor of capacity requirements. If the measured costs for a
particular level of throughput are known, they are the most accurate predictors
of capacity requirements. At its core, this is what makes TCA such a highly valuable
methodology for capacity planning.
To drive this point home, consider
predictive analysis as a point of comparison. In predictive analysis, raw
resource utilization is monitored and recorded over time. For example, you
might record %CPU at peak times over a period of days or weeks and then
predict future CPU utilization resource needs in the future by assuming a
continued linear growth pattern. With predictive analysis, maximum capacity is
assumed to be some percentage of total capacity—for CPU we might assume 75
percent to leave some headroom as a margin of safety.
Although this approach can provide some
rough and ready estimates, it is inherently unreliable. Figure 3 shows that
resource utilization, though it may grow linearly at first, will grow exponentially
after the point of maximum throughput is reached and the capacity to handle
additional load has been exhausted. In fact, if a tier of servers are
approaching this point, predictive analysis would yield a steep slope,
suggesting a growth pattern far greater than what is actually occurring, and
inferring much greater resource requirements than may actually be warranted.
Moreover, the assumption that 75 percent CPU utilization will leave some
headroom may be completely erroneous. For example, if the maximum throughput
for the site is reached at 66 percent, as it is for Service Operation1, you
should be concerned when %CPU is greater than 50 percent--because we know that
we are nearing the point of maximum throughput. In short, we know that 66 percent
CPU is the tipping point for this service.
A much better approach would be to examine
site logs to analyze the growth of actual operation requests (load) and then
predict future capacity requirements be extrapolating from the measured costs
for these operations. This is essentially what TCA prescribes and precisely
what makes it a compelling model for planning. When accurate, reliable
forecasts are needed—as they often will be when a SaaS SLA is at stake for
instance—the accuracy of TCA is tough to beat.
However, to achieve truly reliable
forecasts of capacity, we must also either know or estimate the frequency that
each service operation will be executed. This brings the discussion to the
usage profiles and some important adaptations that will be needed for SaaS.
Usage Profile, Verification, and Impact Prediction
Essentially, a usage profile defines the
relative frequency that each operation is likely to be executed during normal
production operation of the site. In practical terms, a verification test must
drive load against the site using all operations concurrently but in the
expected proportions defined by the usage profile. In most cases, the profile
can be defined as a simple spreadsheet. This article borrows the actual profile
for the TCA study published for the Microsoft Commerce Server 2002 catalog
system searching, as shown in Figure 4.
.jpg)
Figure 4. An Abbreviated TCA Usage
Profile
The Frequency column defines the
proportions of each type of search operation. For example, if we were to
construct a verification test for this collection of operations, for example,
we would construct a script that executed a Category search three times
as often as a free-text search. Notice that only the key factors for
each physical tier are shown. Disk I/O is not a factor for the Web server, but
it could very well be for the SQL Server database based on the nature of its
role. The theoretical prediction for CPU utilization on the Web server tier (IIS
P3MC) would be calculated by averaging the Internet Information Services
(IIS) CPU cost in Mcycles, weighted by frequency. If successful, we should
expect that our measured CPU utilization closely approximate the estimated
utilization when the point of maximum throughput is reached for the entire
site.
The main purpose of verification is to
validate the estimated maximum throughput with empirical testing. Newcomers to
TCA are sometimes dismayed when they discover that a verification run does not
closely approximate the theoretical prediction. Typically, when this happens,
verification testing tends to produce less throughput than predicted. This
sometimes leads to criticism of the methodology and loss of confidence in its
accuracy. However, most often, this discrepancy creeps in because of added
latency caused by lock contention as separate operations must wait for access
to shared resources. This phenomenon, often referred to as the convoy
phenomenon, is caused by an implicit queuing affect as operations wait for
access. Actually, one of the most underestimated benefits of verification
testing is chance to discover this hidden latency and reduce or eliminate it
through various optimization techniques prior to production. Neither
conventional stress testing nor predictive analysis can reveal that a site is
underperforming its full capacity. Moreover, neither method can help define
when optimization efforts have reached the point of diminishing returns.
If verification testing shows a disparity
between estimated throughput and measured throughput, TCA provides an
opportunity to optimize and rerun the verification tests. If the disparity is
not reduced after retesting, it means that optimization efforts were
ineffective. If the gap does close, verification testing provides a basis for
determining whether further optimizations are warranted considering the law of
diminishing returns.
When sufficient verification has been
obtained, the usage profile becomes an indispensable baseline for predicting
the impact of a variety of potential changes. Occasionally, TCA has been
criticized because there is no basis for determining the frequency of each
operation when a site has not yet been in production. However, an educated
guess can often be made on the basis of activity on similar sites when
available. In any case, one of the strengths of TCA is that capacity estimates
can be easily adjusted without retesting after initial production patterns are
logged. If a change in usage pattern is observed, the usage profile can be
changed accordingly and a new estimate produced without retesting. If actual
usage patterns differ greatly from those assumed in the usage profile, it may
be advisable to rerun the verification test; but it is not necessary to retest
individual operations because these costs remain unchanged.
For a new tenant to be hosted for a
multi-tenancy service that is already in production, a reasonably accurate and
confident prediction about impact on current capacity can be made with
relatively little or no retesting—even if the new tenant has a substantially
different usage pattern on the site.
Characterizing Load: Adapting Usage Profiles for SaaS
As mentioned at the outset, TCA has been
used most often for testing conventional B2C Web sites in the past. By theory,
as well as by convention, therefore, capacity has typically been characterized
by the number of concurrent user connections. The usage profile has been
characterized in terms of the behavior of an “average user” in the course of a
typical session. A typical user might be said to browse 10 product pages,
search twice, add an item to the cart once, and checkout only .2 times during a
15-minute session. For applications such as Exchange, it might be important to
characterize the size of e-mail messages sent and/or received as well as the
frequency because these parameters would be important indicators of average
network and disk i/o requirements.
However, services-in-the-cloud will be
requested by users, computers, or both. Notions such as user behavior, think
time, or typical session duration simply do not apply very well. Indeed, some
services may not have any “user” connections at all.
Fortunately, these notions are not
essential to TCA. Characterizing capacity in a user-centric way has been a
convenient fiction which has served to elucidate the methodology of TCA and
helped to communicate site capacity that is traceable to business concepts and
requirements for conventional Web sites. These terms should be discarded
because, in part, a usage profile can be defined adequately without them.
User-centric language threatens to obscure the real purpose of a usage profile
in TCA. Focusing, instead, on the transaction rate offers a generalization that
can be applied equally well to B2C and B2B scenarios—an important
generalization for SaaS.
After the cost and relative frequency of
each transaction type is known, the expected transaction rate (the number of
expected transactions bounded by time) is sufficient to accurately estimate
capacity requirements without resorting to the concept of user behavior and
without doing violence to the TCA methodology.
SaaS Composite Architecture and TCA Challenges
In the most general terms, latency
encompasses the total amount of time it takes to execute a transaction. If you
are buying a pair of new shoes, the total latency would include the time to
find a pair you like and try them for size, but it also includes the time spent
waiting in the checkout line. One of the common misconceptions about TCA is
that it measures the entire latency of a transaction as it is experienced by
the user. In fact, TCA only measures the latency from the moment the first byte
of the request arrives at the site to the time the last byte of the response
leaves the site. For traditional Web sites, whether they are intranet-based or
Internet-based, the portion of latency that occurs between the user’s browser
and the target site is not measured. In most cases, this latency cannot be
predicted or controlled. Consequently, TCA is unable to offer a good estimate
of average user response time for Internet services.
.jpg)
Figure 5. Measured Cost and Latency for a Traditional Web
Application
In this illustration, which approximates
the Commerce Server catalog search scenario tested in the earlier sample usage
profile, notice that only those portions of cost and latency incurred on the
Web server and database server are actually measured. This constitutes only a
portion of the latency that makes up the total response time for a single
transaction from the client perspective. To distinguish this predictable
portion of latency in this discussion, it will be referred to as service
response time. In the traditional Web application, TCA would seem to be
capable of measuring average service response time, if not user response time.
Although it is important to understand this
limitation, it should not be considered a theoretical defect of the TCA
methodology because this latency between the browser and the site does not
directly impact the cost of the transaction or the capacity of the site. The
fact that the latency incurred during Internet transport cannot be predicted or
controlled becomes especially obvious for a typical commercial internet site.
The average bandwidth, even if it could be guessed, does not provide a highly
meaningful basis for predicting a specific user’s response time. Moreover, the
peaks and valleys of shifting bandwidth utilization on the Internet over time
make it virtually impossible to accurately predict user response times even for
a single user at a fixed location. If a specific user does experience slow
response, there may be little or nothing that can be done about it—short of
perhaps recommending that the user upgrade internet service and/or hardware.
This limitation, though not a defect,
presents a difficult challenge for applying TCA to some important SaaS
scenarios. To see why, this article looks more closely at a SaaS model as
depicted in Figure 6 for a moment. In composition architecture, multiple
services may be aggregated into a single composite service or Web application.
In this architecture, unlike many traditional Web applications, the
indeterminacy of the Internet now injects itself directly into the heart of the
service response time we would like to measure. In either case, if any of the
aggregated services are hosted in the cloud, service response time also becomes
unpredictable. For example, in Figure 6, the Web application host cannot
accurately predict its own service response times for the same reason that
average user response times could not be predicted in traditional Web
applications. Looked at another way, the SaaS service provider is in the same
predicament with respect to the Web application (service consumer) as the Web
application is with respect to the client (application consumer).
.jpg)
Figure 6. Measured Cost and Latency for SaaS Composition
Architecture
In the SaaS composition scenario, at least,
TCA is incapable of providing a reliable prediction of average service
response time—much less average user response time. Therefore, it is unwise
to rely on predictions about either. Attempts to estimate user response based
on TCA results are really nothing more than guesswork masquerading as rigorous
measurement. It is precisely at this juncture that we must resist the
temptation to over-conclude from what TCA tells us. Such estimates of response
time are inherently unreliable. In fact, considering the vagaries of many
private network traffic patterns, response times can be unpredictable even for
purely intranet-based scenarios. Defining a service-level assurance for
response time is a risky business—especially when two or more service tiers are
linked over the Internet.
Note that recognizing this limitation does
not invalidate the cost measurements and capacity predictions achieved through
TCA. Both the Web application provider and the service provider depicted in
Figure 6 can still reliably measure costs and predict capacity. Even network
utilization cost can be measured accurately for the purpose.
Latency becomes an especially important
issue for services because average response time is frequently a key
performance target for many SLA agreements. What our analysis really shows thus
far is that response time is inherently unpredictable in composition
architectures--especially when the composite services are hosted in the cloud.
However, as we shall see, TCA can furnish
extremely important information about minimum latency and how it is affected by
load. For example, TCA can tell us when specific response times cannot
be met with the available (tested) resources.
Performance Targets: Beware the Latent Latency
The composition scenario depicted in Figure
6 is perhaps the simplest case. As SaaS gathers momentum, more complex
scenarios will inevitably arise. For example, it is reasonable to expect that a
single frontend service will sometimes aggregate multiple offsite services,
magnifying uncertainties, and making response times even less predictable.
Such challenges notwithstanding, TCA can
provide some valuable and accurate information about performance targets. To
see how, we have to better understand the relationship between resource
utilization and latency. In simplest terms, as resource utilization increases,
the average latency per transaction rises dramatically.
This phenomenon may not amount to an iron
clad rule, but in the author's experience, it is an almost universal pattern—at
least with respect to the most constraining resource. The underlying principle
at work in this pattern is queuing theory, which is as near iron clad as
computer science generally gets. Jim Gray and Andreas Rueter have noted this
relationship between utilization and latency in Transaction Processing and offered the following formula for predicting average response time, where the term p is utilization:
Average_Response_Time
(p) = (1 / (1 – p )) * Service_Time
Although the original intent of this
formula was to address the performance issues created by queuing, we can
harness this observation to draw some of our own conclusions. Let’s assume, for
the sake of our discussion, that our optimum Service_Time (when there is no
waiting) for Service Operation 1 is 300 milliseconds (ms). If we apply this
formula to the utilization pattern shown in Figure 3, we can project the
Average_Response_Time pattern shown in Figure 7.
.jpg)
Figure 7. Latency versus Utilization
If, in this case, our target for average
user response time is 800 ms at peak capacity, Figure 7 shows with certainty
that this target cannot be met with the existing hardware resources—even if the
added network latency of the Internet is zero.
For a SaaS provider, TCA can show that,
even though the maximum throughput (transactions/sec) is sufficient to support
peak traffic, performance targets may not be met under full load. In short,
greater hardware resources or further optimizations may be needed to meet
minimum performance targets. For example, if the controlling resource is CPU
cycles, more or faster CPUs are required to reduce processing time.
Although we have used a formula to
illustrate this point, a theoretical prediction is not necessary. At least for
a root SaaS provider, average latency can be measured and recorded during
isolation testing. For this reason, measuring latency should be considered
essential for TCA when known response targets are involved. Measuring latency
can show service developers when performance becomes unacceptably slow. Let’s
look more closely why.
In addition to accurate measurements of
transaction cost and capacity, TCA may also be able to offer service
aggregators some practical information about response time. Although TCA cannot
make average Internet latency predictable under any circumstance, it can at
least help to estimate the minimum achievable service response time.
Conceivably, this could be accomplished by spot testing against the actual
service center to be used. For example, such estimates, combined with actual
TCA measurements, could help determine whether it is even feasible to achieve
reasonable response times when aggregating specific services. This information
could prove invaluable when a choice between alternative service providers must
be made.
It is interesting to note that the original
formulation of TCA calls direct attention to this predictable relationship
between latency and utilization. Performance targets were defined both in terms
of minimum acceptable throughput and maximum acceptable transaction
latency. To achieve accurate estimates of both, careful measurement of
latency is a strict requirement of this methodology. This conclusion was
strongly inferred in the original formulation, and we now draw explicit
attention to it.
A maximum acceptable latency should be
defined for each operation tested. If this limit is breached during testing,
the throughput at that level should be considered the maximum capacity for the
available hardware resources, regardless of whether the point of maximum
throughput has been reached.
During the development of TCA, the
theoretical minimum latency was established by carefully measuring the
execution time of each phase of an operation. Then a constant was added to this
value to account for the minimum acceptable wait time during concurrent
execution. If average latency actually measured during testing did not meet
this performance target, further optimizations were attempted to reduce
execution time, wait time, or both.
Another practical way to establish
performance targets for services, especially for services where an SLA targets are at stake, is to assume a reasonable average Internet latency and subtract
this from the total acceptable response time. If measured latency exceeds this
threshold, either more hardware or further optimization efforts are needed.
However, testing composite services will be
subject to many of the practical difficulties of conducting TCA-style testing
against a remote service host. Potentially, this might mean that such service
hosts offer online test services that closely approximate the performance of
actual production services. The more closely that test conditions approximate
the anticipated production runtime conditions, the more accurate and reliable
test results will be. At this point, we should turn our attention to a number
of other practical considerations, such as the tools for measuring and
instrumentation that are related to conducting TCA testing for services.
Not All Computer are Created Equal
TCA originally recommended that services
should be tested on the highest performance hardware available. This
recommendation has led to one of the most common criticisms of this
methodology—perhaps not without reason. The justification for this
recommendation was the fact many applications simply could not scale linearly
as more processors were added to a server. Moving from a two-way to a four-way
SMP computer generally did not produce twice the capacity. Faced with this fact,
it was reasoned that testing on the high-end servers would ensure that
application throughput would scale linearly at worst on less capable
servers. So it was thought that moving from a four-way computer to a two-way
computer would guarantee at least half the throughput and still alleviate the
need to test on dozens of different hardware configurations to achieve reliable
capacity numbers.
This conclusion was typically quite true
because less powerful hardware often performed super-linearly when compared to
its bigger brothers. Less powerful servers often exhibited considerably better
capacity than might be expected on the basis of these estimates. Many
applications do not scale well vertically (on eight or more processors) because
the threading model and locking strategy of the application was suboptimal on
these high-end, SMP configurations. From the perspective of software vendors,
who can neither predict nor dictate the hardware, this approach makes perfect
sense. Testing and tuning on the highest performance hardware available helped
to ensure that services would perform well at the high end of the spectrum and
no worse in other areas. Unfortunately, this method of estimating also tends to
undermine TCA’s stated purpose of minimizing hardware costs because such
estimates would tend to result in purchasing more low-end servers than actually
necessary to meet demand.
In accordance with Moore’s Law, computers
have gotten vastly more powerful since this recommendation was first made. The
most powerful servers available today can be extremely expensive. From a purely
practical perspective, it would be nice to know that reasonably accurate
capacity planning can be conducted without having to purchase millions of
dollars worth of servers for a test bed--especially if you do not intend to use
such equipment in production.
Far more importantly, PC architecture has
also changed substantially since TCA was developed. As a result, the claim that
estimates based on more powerful hardware will scale linearly at worst on less
powerful hardware is simply no longer valid—at least when crossing boundaries
between fundamentally different hardware and platform architectures. The most
obvious example of this is evident with the availability of 64-bit Windows
servers. Beginning with .NET Framework 2.0, it is possible to write services
that will run on either 32-bit or 64-bit systems. The flat memory model of
64-bit Windows represents a significant improvement over the 32-bit Windows
memory model, providing 16 terabytes of addressable virtual memory space per
process versus the maximum 4 gigabytes (GB) previously supported. Often, the
64-bit memory model eliminates many impinging memory constraints or excessive
CPU consumption due to page faults. When such fundamental differences in
platform architecture are evident, more powerful servers may make poor
indicators of performance on less powerful servers. Under these circumstances,
it is no longer reasonable to expect the latter to perform linearly at worst in
all cases. Conversely, it is no longer reasonable to assume that moving up the
ladder will result in sub-linear scaling either.
As PC architecture has evolved, other
innovations have been introduced, especially for enterprise class servers, to
address issues that typically only arise when many processors and large amounts
of physical RAM are available. Hardware diversity is increasing. Non-uniform
memory architecture (NUMA) and crossbar bus architecture are both good examples
of fundamental evolutionary changes in PC hardware architecture introduced from
the mainframe world. Capacity measurements made on one hardware platform can
easily get distorted when services are then run on these newer hardware
architectures or vice-versa. Although these latter architectures may seem somewhat
exotic compared to the more commodity x86 family of hardware, these high-end
server are increasingly commonplace in service centers where Internet
scalability is required—especially, though not exclusively, for the data tier.
More to the point, they are becoming commonplace in the scenarios specifically
targeted by SaaS, multi-tenancy services.
Considering the high cost of such servers,
the need for accurate reliable capacity planning may be more urgent than ever.
The fact that TCA cannot provide reliable estimates across these architectural
boundaries becomes far less important than knowing whether such systems can
meet the necessary capacity demands of a specific service, on a specific
server, with a specific configuration. TCA is well suited to answer these
questions. It can help determine which configuration will produce the optimum
service capacity and, therefore, the best return on investment for these key
hardware purchases. Most vendors understand the need to measure capacity and
performance before buying; and they will often provide access to test servers
in a lab environment or leases to support such testing and help minimize
initial obstacles to adoption.
The distortion that may occur when
attempting to extrapolate TCA capacity and performance estimates across
hardware architectures does not vacate the value of TCA--far from it.
Performing careful capacity planning is actually more urgent than ever on these
specialized, high-end platforms. The value of a solid baseline and predictive
model of capacity based on usage profiles can be partly gauged in proportion to
the cost of these servers.
However, this conclusion means that we
cannot blindly extrapolate upward or downward on the hardware power scale.
Fundamental differences in hardware and operating system–level architecture
must be considered. Here again, TCA remains valuable if we observe this
limitation and steadfastly resist the temptation to over conclude.
For ISVs who are planning to deliver a
software solution that implements services, the conclusion we have reached may
prove profoundly unsatisfying. If we cannot arrive at capacity estimates for a
solution independent from any hardware platform, what good are estimates to the
vendor or the prospective buyer? One reasonable approach may be to test the
solution on a specific hardware platform and configuration deemed suitable for
the solution and publish those estimates along with the hardware profile used
for testing—along with a disclaimer that “your mileage may vary” on different
hardware. One advantage of TCA is that estimates made on a specific hardware
profile with a documented usage profile are highly repeatable. In this sense,
TCA produces very real, very reliable numbers. For the buyer who elects to buy
and deploy this solution on the recommended hardware, estimates should be
golden. If the buyer decides to buy a different array of hardware, a “trust,
but verify” approach may be in order. In other words, the buyer should consider
re-testing on the target hardware.
Probing the Engines of Cyberspace
At this point, we should state the
expectation that many SaaS services will be mega services, requiring high
capacity and scalability. As implied earlier, this seems particularly likely
for those designed for multi-tenancy. If one surveys the B2C success stories of
the “Web 1.0” world, these characteristics are hallmarks. It is reasonable to
assume that high throughput and very high scale will also be needed for
multi-tenant services in the cloud. For service hosts in the “Web 2.0” world,
there will be strong economic drivers for deploying services aimed at the long
tail of small and medium size businesses as well as consumer-facing
applications that leverage SaaS on the back end. Frederick Chong and Gianpaolo
Carraro have written an article named Architecture
Strategies for Catching the Long Tail, which makes
a compelling case that economies-of-scale will be a key driver of SaaS
architecture. Notice, of course, that it is the service hosts who are able to
meet these high scale demands of the long tail that tend to become the colossi
of this brave new world—the Monsters and Amazons of services.
For our discussion, this raises a number of
important practical questions about testing at these great heights. How can we
efficiently and cost-effectively generate this much load? How can we record,
aggregate and analyze all the performance data necessary to ensure accurate
measurement and planning of these services? Are the necessary tools for
scripting, driving, and coordinating very high levels of load ready-to-hand?
Are these tools suitable for testing SOAP-based Web services requests, which
are fundamentally different in many ways from traditional Web page requests?
And perhaps most importantly, how can we reliably test when aggregated services
are hosted in a remote service center? Some practical strategy is clearly
needed conducting accurate TCA testing over the Internet.
Generating Load: Send in the Clones
Generating load in sufficient quantity to
push services designed for massive capacity beyond their point of exhaustion is
no easy feat. Many clones are needed to exert such loads. A tool is needed to
orchestrate testing and ensure that all test clones are using the same test
scripts, uniformly generating load, starting and stopping in unison, and
gradually incrementing load for each test run. In addition, a realistic test
scenario usually necessitates juggling separate security contexts, parameter
data, and possibly cookies.
Various tools capable or orchestrating test
clones in this way have evolved for testing Web applications. Early studies
often used the Web Application Stress Tool (WAST) because it was freely
available. Later, Application Center Test (ACT) provided a more flexible and
scriptable tool for this purpose. Other very capable, but sometimes quite
expensive, tools were also readily available. Most of these tools installed a
lightweight, HTTP proxy service on each clone. In addition to coordinating with
its peers, the proxy service could generate the GET, PUT, and POST
requests needed to drive load on a Web application.
Another phase of evolution was needed to
produce a tool capable of driving load using the more complex vocabulary of Web
services. To do this, a SOAP proxy (WSDL) for each service operation must be
generated, scripted, and then deployed and executed on each test clone.
Beginning with 2005, Visual Studio Team System provides a built-in load test
capability specifically designed to support load testing against Web services
at very high scale. This tool has undergone further improvements for testing
Windows Communication Foundation (WCF) services in Visual Studio 2008.
The integration of load testing directly
into the development environment makes it possible to incorporate spot testing
against an established TCA baseline; this means developers can quickly
ascertain the performance and capacity impact of a proposed code change. Used
in this way, TCA can also help streamline optimization efforts and indicate
when the point of diminishing returns is reached. TCA tends to produce the
greatest value when it is treated as an integral part of application life cycle
management (ALM).
Composition Architecture: TCA Over the Internet
If we put theoretical difficulties of
predicting response time aside for the moment, composition and integration of
services still pose some interesting practical problems for TCA. Let’s imagine
a scenario like the one depicted in Figure 8, where a small or medium business
named Contoso is a consumer of a service hosted at A Datum. Let’s also suppose
that A Datum integrates its own, on-premise assets with those provided by
another services host called B Datum. If all is well, the end-user at Contoso
is blissfully unaware of the engines of cyberspace that are performing these
services. But remember, Contoso is likely to demand an SLA. For A Datum, the
challenge will be to conduct a TCA for these services so that it can forge an SLA with Contoso that both parties can live with. But how, in concrete practical terms can
such testing be conducted?
.jpg)
Figure 8. A SaaS Scenario for TCA Over
the Internet
What makes this different from many
traditional TCA scenarios is that the assets to be tested may be owned by
several different business entities, each with their own agenda and set of
concerns. Let’s consider a few of the alternative strategies as well as some of
the obvious pros and cons of each.
1.
A Datum could conduct a TCA study
against B Datum’s production services.
At first blush,
this approach seems very risky for B Datum because it must expose its resources
to high loads solely for the purpose of allowing a single tenant to test. Moreover,
for multi-tenant services, testing in isolation from production loads may be
impossible or highly inaccurate. However, for the single-tenancy model, this
option may be feasible if A Datum is allowed to test directly against the
unused infrastructure it will ultimately use in production. It might also be
usable for the very first client on-boarded in a multi-tenant scenario.
2.
B Datum constructs and maintains
test services parallel to its production services and A Datum conducts a TCA
study using the test services.
This option
could be cost prohibitive for B Datum because it must operate and maintain a
full, parallel service center for test purposes only. Even this option can
produce somewhat misleading results if testing is done in isolation from the
load that would be created by other tenants in a multi-tenancy scenario.
Another drawback of this approach is that it would tend to serialize
on-boarding for B Datum based on access to the test bed. Although it is always
advisable to test under realistic conditions, this approach will also be
subject to the limitations mentioned earlier about testing over the Internet. A
Datum should not trust the average service response time figures.
3.
B Datum conducts a careful and
complete TCA study including test clones that access the service via the
Internet. A Datum relies on these estimates and recalculates based on its own
expected usage profile.
This may turn
out to be the most practical option for both parties in many situations. However,
the problem for A Datum is that this approach may not provide any useful
estimates of its own hardware requirements. For example, in our hypothetical
scenario A Datum must do some form of testing to determine the transaction cost
of acquiring and serving its own data. Savvy customers may demand to know these
results.
4.
B Datum publishes a complete TCA
study. A Datum conducts its own TCA study using the B Datum transaction cost
estimates to emulate service latency and network utilization.
This is a highly
practical option for many situations and A Datum can develop capacity estimates
for its own services in conjunction with those of B Datum. However, one
important drawback of this strategy is that A Datum must spend the time and
effort to implement stub services that accurately emulate B Datum’s services
(based on B Datum’s TCA estimates). Occasionally, this may also require changes
to consumer (A Datum’s) code. A notable advantage of this strategy in
multi-tenant scenarios is that B Datum’s estimates will hopefully reflect full
multi-tenant loads. A Datum is required to trust the accuracy of B Datum’s
published estimates instead of testing against a live service. In this
scenario, TCA can function as a de facto standard.
Clearly, there is no perfect or universal
strategy for TCA testing over the Internet under realistic conditions.
Regardless of which avenue is chosen, some practical compromises will be
necessary for conducting a study with composite services. By the way, note that
other forms of load testing are susceptible to many of these same practical
obstacles, but do not hold the promise of yielding reliable capacity planning
or a predictive model. Conducting a full-blown TCA study may not be cost
justified in every case. When truly high capacity is needed, as many
multi-tenant services will, the cost of a careful and rigorous study can pay
for itself many times over.
Of course, doing nothing is also always an
option; but bear in mind that an SLA without a solid foundation in fact may
prove to be a castle in the sand. Consumers who are planning to make a big bet
on services-in-the-cloud would be wise to insist on full transparency about
this information. Any service provider that has done due diligence should be
able to provide these estimates and inform new tenants about how much of
existing capacity will be absorbed at the time of on-boarding.
A
Habit of Highly Effective TCA: Sharpening the Tools
TCA has occasionally been criticized as a
Windows-only methodology. There is no real theoretical basis for this charge.
The general concepts and underlying principles that form the foundation of this
methodology are not specific to Windows or any other platform. Ironically, one
of the original references cited as helpful for understanding TCA was a book
about capacity planning for Sun Solaris by Brian Wong that is still in print
today. Not surprisingly, therefore, the basis for this claim is usually
anecdotal observations. For example, the tools needed for generating load and
measuring results all seem to be Windows-based. Or, the key counters seem to be
Windows performance counters, WMI statistics, and so on. There is nothing strange
about this because the TCA methodology was developed at Microsoft and it has
been most widely, if not exclusively, used for Windows-based applications.
However, this observation does focus attention on the practical necessity of
good tools for generating load, managing test runs, capturing resource
utilization, and analyzing results. In addition, these tools must evolve along
with the changing protocols, runtime environments, and data exchange patterns
of service-oriented solutions.
For many of the TCA studies on B2C Web
applications, such as the Commerce Server study referenced earlier, the
transaction rate is often measured by recording the Requests/Sec
counters provided by IIS and/or ASP.NET. Some of these counters can also be
used for testing ASMX Web services because these Web services are hosted
exclusively in IIS; but, by the time we arrive at WCF hosted services, the
picture changes in important ways. In WCF, HTTP service endpoints are managed
by the intrinsic channel architecture of WCF itself. IIS many not even be used
as the host process. Performance counters exposed by IIS or ASP.NET may be of
little use for WCF. Fortunately for our purposes, WCF exposes a plethora of its
own counters, including most, if not all, of the ones needed for accurate TCA
measurements. Delving further into these details is beyond the scope of this
article, but more information about the WCF counters is available here.
However, the fact that these counters are
available doesn’t completely solve our problem. On Windows servers, performance
counters can only be monitored over SNMP, which is not an Internet-friendly
protocol. Moreover, TCA recommends that these counters be collected on a
computer other than the servers being tested in order to minimize the “observer
effect.” Co-locating a “collection” server with the hosted services would seem
an obvious solution for this dilemma, but this can be difficult in practice
because of the need to stop and start data collection and test runs
simultaneously. Starting and stopping in concert makes it much easier to
isolate and analyze the measurements specific to each test run. There is a
tacit assumption within TCA, that test clones, tested servers, and data
collection points are all readily accessible during testing. In some SaaS
scenarios, such as the one depicted in Figure 8, this may be difficult or
impossible.
At first glance, collecting performance
counters via HTTP seems like a tantalizing way to resolve this issue. But it
would probably result in amplifying the observer effect. Transporting counter
data over the same protocols, ports, and network are likely to add a “tax” onto
some of the same resources being measured. Using Remote Desktop Protocol (RDP)
access hosted on a local collection server is one fairly elegant solution
because native RDP can be tunneled via HTTP, but I/O can be optimized,
permitting coordination but minimizing interference during test runs.
Even with this solution in place,
additional performance monitoring will still be needed for the clone computers
that generate test load. During a TCA, test clones must also be monitored to
ensure that service throughput is not being artificially limited by clones that
have reached their own point of exhaustion. Therefore, some provision is also
necessary to coordinate the basic measurements of test clones.
TCA
As an Integral Development Tool
TCA unquestionably yields its greatest
value when it is an integral part of the development process. This point can
hardly be overemphasized. When code is developed with the expectation that it
will be tested with this methodology, it can be fully instrumented to support
accurate testing. Regardless of which monitoring tools are used or how many
native counters and system events are available in the run-time environment,
some additional custom instrumentation is usually needed to supplement them.
One of the misconceptions about TCA is that
testing and analysis should be conducted only after the code complete milestone
has been reached. Much to the contrary, testing is far more effective if it is
done as part of unit testing. With respect to services, each operation can be
tested in isolation as soon as it is functionally complete so that a solid
baseline for utilization and latency measurements can be established. Testing
services under high levels of load as early as possible can alert developers to
issues with scalability and performance and furnish opportunities to
efficiently tune and optimize a specific operation.
Occasionally, early testing will reveal a
fundamental design defect that will cause a specific operation to reach the
point of exhaustion (as we have defined it) even though utilization of key
resources are still at relatively low levels. For example, peak throughput is
reached and/or latency is unacceptably high, but %CPU is 20. Perhaps the most
common cause of this phenomenon is that concurrent threads of execution are
serialized by a shared resource. Dramatic increases in queue lengths are
another indication that serialization is occurring.
When this phenomenon is detected, it is
vitally important to realize that simply adding hardware resources is not an
effective solution. In this case, poor performance and throughput are caused by
excessive waiting instead of slow execution. As painful as it may be,
discovering and rectifying this problem early in the development cycle is
invaluable; finding and addressing it prior to production will generally pay
handsome dividends and will usually significantly reduce hardware costs. In
some cases, the remedy may not require extensive change, but it may yield great
improvements in scalability, performance, or both.
When TCA is an integral part of the
development process and developers code with testing in mind, the quality of
code often improves as well. Developers write code with a heightened awareness
of the need for efficient sharing of scare resources and the danger of
serialization—in effect, scalability and performance are in the forefront of
their minds.
Conclusions
There is no question that devising and
executing a successful TCA study takes generous portions of hard work and
ingenuity. Yet most people who have experience with this methodology will tell
you that the results are usually well worth the effort. Regardless of the type
of Internet services that you plan to place under the microscope, there are
always numerous practical obstacles that must be overcome to achieve reliable
numbers. If you are a newcomer to TCA, it is always advisable to seek the help
of someone with real world experience using this methodology.
To be effective, the tools and techniques
used to implement TCA studies have to be continually adapted to the evolution
of Internet service protocols and technologies. Despite any theoretical
limitations or practical difficulties, TCA continues to offer a valuable tool
for accurately measuring capacity and performance characteristics of these
services, but we must be careful to adapt it the specific challenges posed by SaaS
architectures, including service integration, composition and multi-tenancy.
In light of the fundamental changes and
increased diversity of specialized hardware architectures that have emerged, we
must be on guard against extrapolating estimates from one platform to another.
The diversity of hardware and platform architectures invalidates the claim that
less powerful hardware will always perform linearly at worst.
The unpredictable nature of network latency
on the Internet prevents us from accurately forecasting the average user
response time, but TCA can still provide us with important information about
the minimum achievable latencies for services under peak load. Because Internet
latency lies at the heart of many SaaS services, TCA cannot reliably define
average response times in these cases. For this reason, service consumers and
providers should be cautious about service level assurances based on this
metric. Nevertheless, TCA can provide very important information for
determining the minimum achievable latency at a given level of load.
SaaS architectures present both theoretical
and practical challenges, but achieving credible capacity estimates and
performance information using TCA is possible if this methodology is used
within these prescribed limitations. Used in this manner, TCA can yield a
predictive model of capacity that is particularly valuable for multi-tenant
services because a well formed usage profile can help adjust capacity estimates
without retesting. This model of testing and analysis yields its best results
when used early and used as an integral part of the development process.
Dedication & Acknowledgements
This article is dedicated to Jim Gray, a
Microsoft Distinguished Engineer and a fellow sailor, whose work and many extraordinary
contributions continue to influence and inspire so many people at Microsoft and
beyond. I got a copy of Jim’s book, Transaction Processing many years ago and I
still have occasion to refer to it and find value in it today.
I would also like to acknowledge the
helpful comments, insight and support of Morgan Oslake and Hilal Al-Hilali,
both of whom were members of the original brain trust that developed TCA and
first used it to improve the quality, performance and scalability of various
Microsoft services and server products. As always, I take full credit for any mistakes
or defects that might remain.
References
- “Capacity Model for Internet Transactions”,
Morgan Oslake, Hilal Al-Hilali, David Guimbellot, ©
1999, Microsoft Corporation http://technet.microsoft.com/en-us/commerceserver/bb608755.aspx
- “Commerce Server 2002 Performance and
Transaction Cost Analysis for the Catalog System”, Microsoft TechNet
Commerce Server Tech Center, http://technet.microsoft.com/en-us/commerceserver/bb608745.aspx
© 2008 Microsoft Corporation.
- Improving .NET Application Performance
and Scalability, patterns & practices, © 2004 Microsoft Corporation
- Increased page faulting is more common on
32-bit servers due to the segmented memory architecture. For more detail
about 64-Bit windows see “Windows Server 2003 x64 Editions Deployment
Scenarios”, Charlie Russel, © 2005 Microsoft Corporation
- Transaction Processing: Concepts and
Techniques, Jim Gray and Andreas Reuter
- “Architecture Strategies for Catching the
Long Tail”, Frederick Chong and Gianpaolo Carraro, © 2006, Microsoft Corporation
- Configuration And Capacity Planning
For Solaris Servers by Brian L. Wong, © 1997 Sun Microsystems, Inc.