SaaS Capacity Planning: Transaction Cost Analysis Revisited
developer & platform evangelism
Summary: The Transaction Cost Analysis (TCA) methodology has proven its value as a tool for capacity planning and system bottleneck detection of conventional Web sites engaged in conventional business-to-consumer (B2C) scenarios. It is possible to adapt TCA to yield similar benefits for software-as-a-service (SaaS) capacity planning. This is a theoretical discussion about how and why this methodology may need to be adapted to help answer some pressing architectural questions that will inevitably confront organizations seeking to build, run, consume, or monetize SaaS. This article also considers some of the theoretical and practical limitations of TCA that must be addressed to accurately forecast SaaS solution capacity. By recognizing and carefully defining them, we can carefully stay within these prescribed limitations and resist the temptation to over-conclude from TCA results.
This article offers some guidance and suggestions for adapting the TCA methodology to the specific requirements of capacity planning for SaaS. If properly adapted, TCA can provide an important weapon in the fight for accurate capacity planning and for mitigating some of the risks associated with service-level assurance (SLA). When TCA was first developed, B2C Web applications were the prevalent model for Internet services. Many aspects of SaaS, such as B2B services and multi-tenancy, were relatively uncommon during this period. Although SaaS offers some compelling advantages, the tools and techniques previously used for driving load and measuring results will need to be modified for testing these emerging service scenarios. To take full advantage of TCA, we will need a few practical and theoretical adaptations to achieve accurate forecasting.
The Origin of the
SaaS and Capacity Planning
It Isn’t Easy Being Green
Some Notes About Terminology
Beyond the Point of Exhaustion
Usage Profile, Verification, and Impact Prediction
Characterizing Load: Adapting Usage Profiles for SaaS
SaaS Composite Architecture and TCA Challenges
Performance Targets: Beware the Latent Latency
Not All Computer are Created Equal
Probing the Engines of Cyberspace
Generating Load: Send in the Clones
Composition Architecture: TCA Over the Internet
A Habit of Highly Effective TCA: Sharpening the Tools
TCA As an Integral Development Tool
Transaction Cost Analysis (TCA) is a fairly rigorous and proven methodology for conducting capacity planning for commercial Web sites. TCA is superior to conventional stress testing for this purpose because it can provide a predictive model of capacity; this means it is capable of accommodating unanticipated growth or change in usage patterns. A predictive model is especially important for the multi-tenant SaaS model because it is likely that different tenants will manifest different usage patterns—even for the same services. In these cases, it will not be practically feasible to conduct a new round of testing to determine how each new tenant on-boarding will impact capacity.
The theoretical foundations of the TCA methodology, not to be confused with concept of TCA in economics, were worked out at Microsoft by Morgan Oslake, Hilal Al-Hilali, David Guimbellot, Perry Clarke, and David Howell as Capacity Model for Internet Transactions and were published in 1999. The methodology has been used to publish capacity planning guidelines for some Microsoft products, including Site Server and Exchange Server; and later, they were used for Commerce Server under the title of Commerce Server 2002 Performance and Transaction Cost Analysis for the Catalog System. In addition to these, TCA has been used by many other companies to study and predict capacity, detect bottlenecks, and establish accurate hardware requirements of their own systems.
Although this methodology has been criticized, most of the criticism (and some of the praise) is based on fundamental misunderstandings of TCA and what it can yield. If it can be successfully adapted, it can also be highly useful for capacity planning for SaaS systems. However, reliable results can only be achieved if TCA is used for its intended purpose and within its prescribed limitations.
The idea of software-as-a-service (SaaS) is a highly compelling one. In some respects, SaaS is a logical outgrowth and extension of service-oriented architecture (SOA). To state the case naively, if services are intended to be autonomous, contract-based, interoperable, and accessible through HTTP, there should not be any obstacle to hosting them on the Internet. SaaS, sometimes referred to as "services-in-the-cloud," is essentially the delivery of software in the form of services. SaaS can be thought of as the service-oriented component of the Microsoft Software and Services (S+S) approach. This article focuses on the SaaS component because it poses the greatest challenges for TCA.
The idea of services-in-the-cloud promises some compelling advantages—particularly for the enterprise. Perhaps most importantly, this model promises to help drive down costs. For many companies, the prospect of consuming services in the cloud promises to dramatically reduce the costs associated with hosting them in their own data center. SaaS can also make costs more predictable and easier to budget. The consumer has an opportunity to shift the burden of patching, upgrading, adding new services, and ensuring availability to an off-premise service provider. By annualizing costs through a service contract, business managers can focus their attention on the growth and development of the core business itself.
The SaaS model also holds great promise for service providers. Although distributed computing has improved the accessibility of information at the edge of the network, it has often done so at the cost of increasing system complexity. As a rule, complexity drives specialization, and IT is no exception. However, the SaaS model makes it economically feasible for service providers to concentrate highly specialized expertise—this makes the complexity of services more manageable. By using SaaS, providers can amortize the cost of hardware, software, and business expertise, as well as other resources, across multiple service consumers. In particular, the multi-tenancy hosting model makes it possible to create business efficiencies by sharing service resources among multiple consumers. SaaS offers economies of scale that are hard to come by in more traditional, self-hosted models of computing.
The case for SaaS is compelling, indeed, but whether you plan to build, run, consume, or monetize SaaS, you will face a number of important questions and challenges that can have a decisive impact on your success. The following are some questions you may consider:
· Should you use on-premise hosting or off-premise hosting?
· Should you use a single-tenancy model, multi-tenancy model, or hybrid model?
· How should you plan capacity for shared resources?
· How should you predict resource requirements for tenant on-boarding?
· How can you support or rely on SLA targets?
· How can you maximize performance, scalability, and availability?
· How should you minimize hardware and energy costs in the service center?
· How should you estimate the impact of proposed implementation changes?
Although a TCA study cannot provide complete answers to all these questions, it may prove to be an invaluable tool for quantifying hardware resource requirements, capacity and scalability thresholds, and predicting impact of changing usage patterns. It can also help identify key bottlenecks through pre-production testing and analysis; this helps architects better understand the scalability and performance characteristics of the systems they design before they reach production.
Both consumers and providers have a deeply vested mutual interest in ensuring that SaaS services can meet requirements for scalability, availability, and performance. A serious slowdown or interruption of a service could be extremely damaging for both consumers and providers. Because each service aggregator may be both a consumer and a provider of services, they will have a powerful interest in meeting these requirements from both perspectives. For providers who host using a multitenant model, serious damage to the quality of service could have catastrophic consequences. For multi-tenancy, pronounced slowdowns or disruptions will usually impact many, if not all, tenants at once.
A service-level-agreement (SLA) can define metrics for the key “abilities”; but taken alone, it can do little to ensure that these metrics can actually be met. Faced with these uncertainties, some service providers will resort to simply supersizing their hardware resources, resulting in excessive cost, poor efficiency, and little or no ability to predict what impact of growth or change in usage patterns. In some cases, supersizing can actually mask key system bottlenecks. Such bottlenecks, if undetected, can result in poor performance or scalability when a threshold is breached—despite the expensive hardware.
In addition to the hardware costs, the supersizing approach tends to accrue excess energy consumption and cost as well. Unfortunately, when faced with the inherent danger of damaging their core business from insufficient capacity, most SaaS hosts will opt to oversize—unless they have a refined understanding of existing work capacity and confidence in their ability to forecast their capacity to meet growing demand for service.
The question must be asked: No matter how energy efficient the service center hardware is, can we declare that it is green, knowing that it is grossly over capacity? By analogy, am I being green if I commute in the most fuel efficient SUV available, when a small or midsize hybrid would easily suffice?
Perhaps efficiency must be gauged against the actual work that must be done. By rightsizing the hardware based on the work capacity needed, we can arrive at a more truly green concept of service center efficiency. To do this, we must be able to carefully measure capacity based on the actual services running in the service center. Fortunately, the business incentive and the ecological incentive are in close alignment here. Rightsizing can reduce service center cost and climate impact at the same time. With global warming hard upon us, rightsizing is something that can be done today while we await further improvements in hardware efficiency. To the extent that a TCA study can provide more accurate capacity planning, it can help save hardware and energy cost.
A detailed description of TCA is beyond the scope of this article, but a brief review will help set the stage for understanding how and why it must be adapted for SaaS scenarios.
Here’s an outline of the basic steps in their original formulation:
1. Select the highest end hardware platform.
2. Simulate load of a linear operating regime to cost each transaction.
3. Interpolate resource costs over this regime.
4. Calculate resource cost as a function of usage profile.
For readers who are new to TCA, a more detailed description of these steps is available to you. How to Perform Capacity Planning for .NET Applications provides a brief primer.
The terms associated with TCA are highly overloaded. Some clarification may help to head of some common confusion.
For TCA, the term cost means the amount of system resources that is used during execution of a service; for example, the percentage of the total available CPU cycles used during the execution of a specific operation.
In TCA, a transaction is simply a discrete operation. For a Web site, a page request/response might serve as a transaction. The conventional meaning of an ACID database transaction, involving a two-phase commit, is orthogonal to this notion. A TCA transaction may involve one, many, or no ACID transactions.
In simplest terms, a usage profile is the collection of all significant operations performed by a particular site, along with the relative frequency and cost of each of these operations. Some confusion has arisen around this term because it has been closely associated with behavior of the “average user” This is discussed in detail later in this article.
For TCA, capacity is defined as the ability to execute work measured in transactions per second. The capacity of a site is defined by its maximum throughput.
To understand how and why TCA should be adapted for SaaS scenarios, we must begin by understanding its basic theoretical foundations. This discussion will begin by focusing on steps 2 and 3:
2. Simulate load of a linear operating regime to cost each transaction.
3. Interpolate resource costs over this regime.
These two steps comprise the heart of this methodology. Step 2 is actually comprised of a series of discrete tests for each operation (transaction) type, gradually increasing load for each test. During each test, CPU, disk I/O, memory, and network utilization are measured to determine the resource costs per transaction. The purpose of this is to define the point of exhaustion—where the server has exhausted its capacity to do further useful work.
Figure 1. A typical TCA pattern of exhaustion
With conventional load testing, the point of exhaustion is typically defined when a limiting resource (such as CPU) has reached 100 percent utilization. In contrast, the point of exhaustion for TCA must be defined by the maximum useful work or throughput that can be executed on a single computer. This also means that at least one computer must be measured on each physical tier of the service being tested.
Using Figure 1 as a hypothetical example, the maximum capacity for this computer is reached in iteration 7 at 80 operations (transactions) per second. Notice that we must continue adding load and iterating until we have test beyond the inflexion point for the transaction rate to begin to tail off. If we were to naively extrapolate based on the first four or five test iterations, we may be greatly overestimating capacity. On the other hand, if we only test under maximum utilization (100 percent), we would end up underestimating the computer's capacity by testing well beyond the point of exhaustion, where capacity may be significantly less than maximum capability.
Simply quantifying the maximum throughput for a particular computer and a particular operation does not provide a sound basis for extrapolating to new hardware or a changing usage pattern. For that, we need to carefully measure resource utilization and correlate it with peak throughput. This brings us to step 3.
Measuring the Cost
TCA measures the cost of a transaction in terms of utilization of important resources. With notable exceptions, most services tend to be constrained by CPU utilization regardless of whether they are Web pages or Web services. For this reason, measuring CPU is a strict requirement for TCA testing. However, measuring memory, disk I/O, and network usage are also of primary importance for gaining a full and accurate picture of the hardware requirements for each tier of the system. These resources are the “big four,” but each may need to be measured by several specific metrics. For example, it is sometimes valuable to measure %CPU for separate processes that are part of the application. In addition, some metrics can be vital for detecting system bottlenecks when CPU is not the constraining resource. For a system primarily constrained by disk I/O, measuring %disk read and %disk write may indicate the most effective way to optimize and perhaps eliminate the disk bottleneck. For a system where network utilization is a key factor, it may be necessary to measure specific network cards separately so that frontend traffic and backend traffic can be differentiated, while traffic created by collecting performance counters can be ignored.
Assuming a typical case, Figure 2 shows a common CPU utilization pattern for the Service Operation 1 example we started in Figure 1. For a service constrained by CPU, this is very common pattern of resource utilization. Sometimes referred to as a knee-curve, utilization grows very slowly at first, reaches an inflection point, and then skyrockets toward 100 percent utilization.
Figure 2. A typical pattern for CPU utilization
A deeper explanation of this utilization pattern is beyond the scope of this article, but many multithreaded, server-side Windows-based applications adhere to it. In some cases, as load begins to escalate, so do operating system–level activities such as (thread) context-switching, CPU consumption, virtual memory management (page faulting), and so on. For this discussion, it must suffice to note that when resource utilization skyrockets, throughput generally declines. Later, we shall see that average operation latency also skyrockets.
Identifying this inflection point is the key to developing an accurate predictive model. The data point of greatest interest in this trend is the one that corresponds to the point of maximum throughput. If we superimpose the throughput trend on the utilization trend, as shown in Figure 3, we can highlight this critical turning point where throughput and utilization become inversely related. By correlating the % CPU utilization at maximum throughput, we can determine a fairly precise and reliable cost per transaction.
In the example, suppose the server being tested is a 4-CPU, with 3.5 GHz processor speed. We can compute the theoretical maximum computational cycles available as 14B cycles/sec or 14,000 Mcycles. Figure 3 shows that when the server’s capacity to do useful work reaches the point of exhaustion, 66 percent of the available cycles have been used. We are now in a position to calculate the actual cost per transaction in CPU cycles at the point of peak throughput on this computer:
66% x 14,000 Mcycles/sec / 80 transaction/sec = 115.5 Mcycles/transaction
Figure 3. Resource Utilization Superimposed on Throughput
Similar calculations can be made to compute the transaction cost in terms of memory, disk I/O, and network utilization for each operation. By repeating a regimen of tests for each service operation (on each physical tier) we can measure these costs for the entire site. Although these other resources are sometimes ignored to streamline testing, it is done at the peril of overlooking existing or potential bottlenecks and critical thresholds in the system. If existing bottlenecks are detected, they can sometimes be eliminated with optimizations. If a potential bottleneck is recognized, it can often be avoided. For example, if it is known that a storage subsystem is operating near its maximum capacity, a faster system may be needed in production to ensure some headroom for growth.
After the cost for each operation is carefully measured in isolation, the next step is to calculate the total capacity of the host site, but to do this we will need to know the proportion of load that each operation represents with respect to site as a whole. Defining these relative proportions is the purpose of a usage profile, which is discussed next. However, before turning to the topic of usage profiles, it would be negligent to overlook some important conclusions.
This inverse relationship between throughput and utilization that typically occurs beyond the point of maximum throughput (and is illustrated in Figure 3) means that resource utilization is a poor predictor of capacity requirements. If the measured costs for a particular level of throughput are known, they are the most accurate predictors of capacity requirements. At its core, this is what makes TCA such a highly valuable methodology for capacity planning.
To drive this point home, consider predictive analysis as a point of comparison. In predictive analysis, raw resource utilization is monitored and recorded over time. For example, you might record %CPU at peak times over a period of days or weeks and then predict future CPU utilization resource needs in the future by assuming a continued linear growth pattern. With predictive analysis, maximum capacity is assumed to be some percentage of total capacity—for CPU we might assume 75 percent to leave some headroom as a margin of safety.
Although this approach can provide some rough and ready estimates, it is inherently unreliable. Figure 3 shows that resource utilization, though it may grow linearly at first, will grow exponentially after the point of maximum throughput is reached and the capacity to handle additional load has been exhausted. In fact, if a tier of servers are approaching this point, predictive analysis would yield a steep slope, suggesting a growth pattern far greater than what is actually occurring, and inferring much greater resource requirements than may actually be warranted. Moreover, the assumption that 75 percent CPU utilization will leave some headroom may be completely erroneous. For example, if the maximum throughput for the site is reached at 66 percent, as it is for Service Operation1, you should be concerned when %CPU is greater than 50 percent--because we know that we are nearing the point of maximum throughput. In short, we know that 66 percent CPU is the tipping point for this service.
A much better approach would be to examine site logs to analyze the growth of actual operation requests (load) and then predict future capacity requirements be extrapolating from the measured costs for these operations. This is essentially what TCA prescribes and precisely what makes it a compelling model for planning. When accurate, reliable forecasts are needed—as they often will be when a SaaS SLA is at stake for instance—the accuracy of TCA is tough to beat.
However, to achieve truly reliable forecasts of capacity, we must also either know or estimate the frequency that each service operation will be executed. This brings the discussion to the usage profiles and some important adaptations that will be needed for SaaS.
Essentially, a usage profile defines the relative frequency that each operation is likely to be executed during normal production operation of the site. In practical terms, a verification test must drive load against the site using all operations concurrently but in the expected proportions defined by the usage profile. In most cases, the profile can be defined as a simple spreadsheet. This article borrows the actual profile for the TCA study published for the Microsoft Commerce Server 2002 catalog system searching, as shown in Figure 4.
Figure 4. An Abbreviated TCA Usage Profile
The Frequency column defines the proportions of each type of search operation. For example, if we were to construct a verification test for this collection of operations, for example, we would construct a script that executed a Category search three times as often as a free-text search. Notice that only the key factors for each physical tier are shown. Disk I/O is not a factor for the Web server, but it could very well be for the SQL Server database based on the nature of its role. The theoretical prediction for CPU utilization on the Web server tier (IIS P3MC) would be calculated by averaging the Internet Information Services (IIS) CPU cost in Mcycles, weighted by frequency. If successful, we should expect that our measured CPU utilization closely approximate the estimated utilization when the point of maximum throughput is reached for the entire site.
The main purpose of verification is to validate the estimated maximum throughput with empirical testing. Newcomers to TCA are sometimes dismayed when they discover that a verification run does not closely approximate the theoretical prediction. Typically, when this happens, verification testing tends to produce less throughput than predicted. This sometimes leads to criticism of the methodology and loss of confidence in its accuracy. However, most often, this discrepancy creeps in because of added latency caused by lock contention as separate operations must wait for access to shared resources. This phenomenon, often referred to as the convoy phenomenon, is caused by an implicit queuing affect as operations wait for access. Actually, one of the most underestimated benefits of verification testing is chance to discover this hidden latency and reduce or eliminate it through various optimization techniques prior to production. Neither conventional stress testing nor predictive analysis can reveal that a site is underperforming its full capacity. Moreover, neither method can help define when optimization efforts have reached the point of diminishing returns.
If verification testing shows a disparity between estimated throughput and measured throughput, TCA provides an opportunity to optimize and rerun the verification tests. If the disparity is not reduced after retesting, it means that optimization efforts were ineffective. If the gap does close, verification testing provides a basis for determining whether further optimizations are warranted considering the law of diminishing returns.
When sufficient verification has been obtained, the usage profile becomes an indispensable baseline for predicting the impact of a variety of potential changes. Occasionally, TCA has been criticized because there is no basis for determining the frequency of each operation when a site has not yet been in production. However, an educated guess can often be made on the basis of activity on similar sites when available. In any case, one of the strengths of TCA is that capacity estimates can be easily adjusted without retesting after initial production patterns are logged. If a change in usage pattern is observed, the usage profile can be changed accordingly and a new estimate produced without retesting. If actual usage patterns differ greatly from those assumed in the usage profile, it may be advisable to rerun the verification test; but it is not necessary to retest individual operations because these costs remain unchanged.
For a new tenant to be hosted for a multi-tenancy service that is already in production, a reasonably accurate and confident prediction about impact on current capacity can be made with relatively little or no retesting—even if the new tenant has a substantially different usage pattern on the site.
As mentioned at the outset, TCA has been used most often for testing conventional B2C Web sites in the past. By theory, as well as by convention, therefore, capacity has typically been characterized by the number of concurrent user connections. The usage profile has been characterized in terms of the behavior of an “average user” in the course of a typical session. A typical user might be said to browse 10 product pages, search twice, add an item to the cart once, and checkout only .2 times during a 15-minute session. For applications such as Exchange, it might be important to characterize the size of e-mail messages sent and/or received as well as the frequency because these parameters would be important indicators of average network and disk i/o requirements.
However, services-in-the-cloud will be requested by users, computers, or both. Notions such as user behavior, think time, or typical session duration simply do not apply very well. Indeed, some services may not have any “user” connections at all.
Fortunately, these notions are not essential to TCA. Characterizing capacity in a user-centric way has been a convenient fiction which has served to elucidate the methodology of TCA and helped to communicate site capacity that is traceable to business concepts and requirements for conventional Web sites. These terms should be discarded because, in part, a usage profile can be defined adequately without them. User-centric language threatens to obscure the real purpose of a usage profile in TCA. Focusing, instead, on the transaction rate offers a generalization that can be applied equally well to B2C and B2B scenarios—an important generalization for SaaS.
After the cost and relative frequency of each transaction type is known, the expected transaction rate (the number of expected transactions bounded by time) is sufficient to accurately estimate capacity requirements without resorting to the concept of user behavior and without doing violence to the TCA methodology.
In the most general terms, latency encompasses the total amount of time it takes to execute a transaction. If you are buying a pair of new shoes, the total latency would include the time to find a pair you like and try them for size, but it also includes the time spent waiting in the checkout line. One of the common misconceptions about TCA is that it measures the entire latency of a transaction as it is experienced by the user. In fact, TCA only measures the latency from the moment the first byte of the request arrives at the site to the time the last byte of the response leaves the site. For traditional Web sites, whether they are intranet-based or Internet-based, the portion of latency that occurs between the user’s browser and the target site is not measured. In most cases, this latency cannot be predicted or controlled. Consequently, TCA is unable to offer a good estimate of average user response time for Internet services.
Figure 5. Measured Cost and Latency for a Traditional Web Application
In this illustration, which approximates the Commerce Server catalog search scenario tested in the earlier sample usage profile, notice that only those portions of cost and latency incurred on the Web server and database server are actually measured. This constitutes only a portion of the latency that makes up the total response time for a single transaction from the client perspective. To distinguish this predictable portion of latency in this discussion, it will be referred to as service response time. In the traditional Web application, TCA would seem to be capable of measuring average service response time, if not user response time.
Although it is important to understand this limitation, it should not be considered a theoretical defect of the TCA methodology because this latency between the browser and the site does not directly impact the cost of the transaction or the capacity of the site. The fact that the latency incurred during Internet transport cannot be predicted or controlled becomes especially obvious for a typical commercial internet site. The average bandwidth, even if it could be guessed, does not provide a highly meaningful basis for predicting a specific user’s response time. Moreover, the peaks and valleys of shifting bandwidth utilization on the Internet over time make it virtually impossible to accurately predict user response times even for a single user at a fixed location. If a specific user does experience slow response, there may be little or nothing that can be done about it—short of perhaps recommending that the user upgrade internet service and/or hardware.
This limitation, though not a defect, presents a difficult challenge for applying TCA to some important SaaS scenarios. To see why, this article looks more closely at a SaaS model as depicted in Figure 6 for a moment. In composition architecture, multiple services may be aggregated into a single composite service or Web application. In this architecture, unlike many traditional Web applications, the indeterminacy of the Internet now injects itself directly into the heart of the service response time we would like to measure. In either case, if any of the aggregated services are hosted in the cloud, service response time also becomes unpredictable. For example, in Figure 6, the Web application host cannot accurately predict its own service response times for the same reason that average user response times could not be predicted in traditional Web applications. Looked at another way, the SaaS service provider is in the same predicament with respect to the Web application (service consumer) as the Web application is with respect to the client (application consumer).
Figure 6. Measured Cost and Latency for SaaS Composition Architecture
In the SaaS composition scenario, at least, TCA is incapable of providing a reliable prediction of average service response time—much less average user response time. Therefore, it is unwise to rely on predictions about either. Attempts to estimate user response based on TCA results are really nothing more than guesswork masquerading as rigorous measurement. It is precisely at this juncture that we must resist the temptation to over-conclude from what TCA tells us. Such estimates of response time are inherently unreliable. In fact, considering the vagaries of many private network traffic patterns, response times can be unpredictable even for purely intranet-based scenarios. Defining a service-level assurance for response time is a risky business—especially when two or more service tiers are linked over the Internet.
Note that recognizing this limitation does not invalidate the cost measurements and capacity predictions achieved through TCA. Both the Web application provider and the service provider depicted in Figure 6 can still reliably measure costs and predict capacity. Even network utilization cost can be measured accurately for the purpose.
Latency becomes an especially important issue for services because average response time is frequently a key performance target for many SLA agreements. What our analysis really shows thus far is that response time is inherently unpredictable in composition architectures--especially when the composite services are hosted in the cloud.
However, as we shall see, TCA can furnish extremely important information about minimum latency and how it is affected by load. For example, TCA can tell us when specific response times cannot be met with the available (tested) resources.
The composition scenario depicted in Figure 6 is perhaps the simplest case. As SaaS gathers momentum, more complex scenarios will inevitably arise. For example, it is reasonable to expect that a single frontend service will sometimes aggregate multiple offsite services, magnifying uncertainties, and making response times even less predictable.
Such challenges notwithstanding, TCA can provide some valuable and accurate information about performance targets. To see how, we have to better understand the relationship between resource utilization and latency. In simplest terms, as resource utilization increases, the average latency per transaction rises dramatically.
This phenomenon may not amount to an iron clad rule, but in the author's experience, it is an almost universal pattern—at least with respect to the most constraining resource. The underlying principle at work in this pattern is queuing theory, which is as near iron clad as computer science generally gets. Jim Gray and Andreas Rueter have noted this relationship between utilization and latency in Transaction Processing and offered the following formula for predicting average response time, where the term p is utilization:
Average_Response_Time (p) = (1 / (1 – p )) * Service_Time
Although the original intent of this formula was to address the performance issues created by queuing, we can harness this observation to draw some of our own conclusions. Let’s assume, for the sake of our discussion, that our optimum Service_Time (when there is no waiting) for Service Operation 1 is 300 milliseconds (ms). If we apply this formula to the utilization pattern shown in Figure 3, we can project the Average_Response_Time pattern shown in Figure 7.
Figure 7. Latency versus Utilization
If, in this case, our target for average user response time is 800 ms at peak capacity, Figure 7 shows with certainty that this target cannot be met with the existing hardware resources—even if the added network latency of the Internet is zero.
For a SaaS provider, TCA can show that, even though the maximum throughput (transactions/sec) is sufficient to support peak traffic, performance targets may not be met under full load. In short, greater hardware resources or further optimizations may be needed to meet minimum performance targets. For example, if the controlling resource is CPU cycles, more or faster CPUs are required to reduce processing time.
Although we have used a formula to illustrate this point, a theoretical prediction is not necessary. At least for a root SaaS provider, average latency can be measured and recorded during isolation testing. For this reason, measuring latency should be considered essential for TCA when known response targets are involved. Measuring latency can show service developers when performance becomes unacceptably slow. Let’s look more closely why.
In addition to accurate measurements of transaction cost and capacity, TCA may also be able to offer service aggregators some practical information about response time. Although TCA cannot make average Internet latency predictable under any circumstance, it can at least help to estimate the minimum achievable service response time. Conceivably, this could be accomplished by spot testing against the actual service center to be used. For example, such estimates, combined with actual TCA measurements, could help determine whether it is even feasible to achieve reasonable response times when aggregating specific services. This information could prove invaluable when a choice between alternative service providers must be made.
It is interesting to note that the original formulation of TCA calls direct attention to this predictable relationship between latency and utilization. Performance targets were defined both in terms of minimum acceptable throughput and maximum acceptable transaction latency. To achieve accurate estimates of both, careful measurement of latency is a strict requirement of this methodology. This conclusion was strongly inferred in the original formulation, and we now draw explicit attention to it.
A maximum acceptable latency should be defined for each operation tested. If this limit is breached during testing, the throughput at that level should be considered the maximum capacity for the available hardware resources, regardless of whether the point of maximum throughput has been reached.
During the development of TCA, the theoretical minimum latency was established by carefully measuring the execution time of each phase of an operation. Then a constant was added to this value to account for the minimum acceptable wait time during concurrent execution. If average latency actually measured during testing did not meet this performance target, further optimizations were attempted to reduce execution time, wait time, or both.
Another practical way to establish performance targets for services, especially for services where an SLA targets are at stake, is to assume a reasonable average Internet latency and subtract this from the total acceptable response time. If measured latency exceeds this threshold, either more hardware or further optimization efforts are needed.
However, testing composite services will be subject to many of the practical difficulties of conducting TCA-style testing against a remote service host. Potentially, this might mean that such service hosts offer online test services that closely approximate the performance of actual production services. The more closely that test conditions approximate the anticipated production runtime conditions, the more accurate and reliable test results will be. At this point, we should turn our attention to a number of other practical considerations, such as the tools for measuring and instrumentation that are related to conducting TCA testing for services.
TCA originally recommended that services should be tested on the highest performance hardware available. This recommendation has led to one of the most common criticisms of this methodology—perhaps not without reason. The justification for this recommendation was the fact many applications simply could not scale linearly as more processors were added to a server. Moving from a two-way to a four-way SMP computer generally did not produce twice the capacity. Faced with this fact, it was reasoned that testing on the high-end servers would ensure that application throughput would scale linearly at worst on less capable servers. So it was thought that moving from a four-way computer to a two-way computer would guarantee at least half the throughput and still alleviate the need to test on dozens of different hardware configurations to achieve reliable capacity numbers.
This conclusion was typically quite true because less powerful hardware often performed super-linearly when compared to its bigger brothers. Less powerful servers often exhibited considerably better capacity than might be expected on the basis of these estimates. Many applications do not scale well vertically (on eight or more processors) because the threading model and locking strategy of the application was suboptimal on these high-end, SMP configurations. From the perspective of software vendors, who can neither predict nor dictate the hardware, this approach makes perfect sense. Testing and tuning on the highest performance hardware available helped to ensure that services would perform well at the high end of the spectrum and no worse in other areas. Unfortunately, this method of estimating also tends to undermine TCA’s stated purpose of minimizing hardware costs because such estimates would tend to result in purchasing more low-end servers than actually necessary to meet demand.
In accordance with Moore’s Law, computers have gotten vastly more powerful since this recommendation was first made. The most powerful servers available today can be extremely expensive. From a purely practical perspective, it would be nice to know that reasonably accurate capacity planning can be conducted without having to purchase millions of dollars worth of servers for a test bed--especially if you do not intend to use such equipment in production.
Far more importantly, PC architecture has also changed substantially since TCA was developed. As a result, the claim that estimates based on more powerful hardware will scale linearly at worst on less powerful hardware is simply no longer valid—at least when crossing boundaries between fundamentally different hardware and platform architectures. The most obvious example of this is evident with the availability of 64-bit Windows servers. Beginning with .NET Framework 2.0, it is possible to write services that will run on either 32-bit or 64-bit systems. The flat memory model of 64-bit Windows represents a significant improvement over the 32-bit Windows memory model, providing 16 terabytes of addressable virtual memory space per process versus the maximum 4 gigabytes (GB) previously supported. Often, the 64-bit memory model eliminates many impinging memory constraints or excessive CPU consumption due to page faults. When such fundamental differences in platform architecture are evident, more powerful servers may make poor indicators of performance on less powerful servers. Under these circumstances, it is no longer reasonable to expect the latter to perform linearly at worst in all cases. Conversely, it is no longer reasonable to assume that moving up the ladder will result in sub-linear scaling either.
As PC architecture has evolved, other innovations have been introduced, especially for enterprise class servers, to address issues that typically only arise when many processors and large amounts of physical RAM are available. Hardware diversity is increasing. Non-uniform memory architecture (NUMA) and crossbar bus architecture are both good examples of fundamental evolutionary changes in PC hardware architecture introduced from the mainframe world. Capacity measurements made on one hardware platform can easily get distorted when services are then run on these newer hardware architectures or vice-versa. Although these latter architectures may seem somewhat exotic compared to the more commodity x86 family of hardware, these high-end server are increasingly commonplace in service centers where Internet scalability is required—especially, though not exclusively, for the data tier. More to the point, they are becoming commonplace in the scenarios specifically targeted by SaaS, multi-tenancy services.
Considering the high cost of such servers, the need for accurate reliable capacity planning may be more urgent than ever. The fact that TCA cannot provide reliable estimates across these architectural boundaries becomes far less important than knowing whether such systems can meet the necessary capacity demands of a specific service, on a specific server, with a specific configuration. TCA is well suited to answer these questions. It can help determine which configuration will produce the optimum service capacity and, therefore, the best return on investment for these key hardware purchases. Most vendors understand the need to measure capacity and performance before buying; and they will often provide access to test servers in a lab environment or leases to support such testing and help minimize initial obstacles to adoption.
The distortion that may occur when attempting to extrapolate TCA capacity and performance estimates across hardware architectures does not vacate the value of TCA--far from it. Performing careful capacity planning is actually more urgent than ever on these specialized, high-end platforms. The value of a solid baseline and predictive model of capacity based on usage profiles can be partly gauged in proportion to the cost of these servers.
However, this conclusion means that we cannot blindly extrapolate upward or downward on the hardware power scale. Fundamental differences in hardware and operating system–level architecture must be considered. Here again, TCA remains valuable if we observe this limitation and steadfastly resist the temptation to over conclude.
For ISVs who are planning to deliver a software solution that implements services, the conclusion we have reached may prove profoundly unsatisfying. If we cannot arrive at capacity estimates for a solution independent from any hardware platform, what good are estimates to the vendor or the prospective buyer? One reasonable approach may be to test the solution on a specific hardware platform and configuration deemed suitable for the solution and publish those estimates along with the hardware profile used for testing—along with a disclaimer that “your mileage may vary” on different hardware. One advantage of TCA is that estimates made on a specific hardware profile with a documented usage profile are highly repeatable. In this sense, TCA produces very real, very reliable numbers. For the buyer who elects to buy and deploy this solution on the recommended hardware, estimates should be golden. If the buyer decides to buy a different array of hardware, a “trust, but verify” approach may be in order. In other words, the buyer should consider re-testing on the target hardware.
At this point, we should state the expectation that many SaaS services will be mega services, requiring high capacity and scalability. As implied earlier, this seems particularly likely for those designed for multi-tenancy. If one surveys the B2C success stories of the “Web 1.0” world, these characteristics are hallmarks. It is reasonable to assume that high throughput and very high scale will also be needed for multi-tenant services in the cloud. For service hosts in the “Web 2.0” world, there will be strong economic drivers for deploying services aimed at the long tail of small and medium size businesses as well as consumer-facing applications that leverage SaaS on the back end. Frederick Chong and Gianpaolo Carraro have written an article named Architecture Strategies for Catching the Long Tail, which makes a compelling case that economies-of-scale will be a key driver of SaaS architecture. Notice, of course, that it is the service hosts who are able to meet these high scale demands of the long tail that tend to become the colossi of this brave new world—the Monsters and Amazons of services.
For our discussion, this raises a number of important practical questions about testing at these great heights. How can we efficiently and cost-effectively generate this much load? How can we record, aggregate and analyze all the performance data necessary to ensure accurate measurement and planning of these services? Are the necessary tools for scripting, driving, and coordinating very high levels of load ready-to-hand? Are these tools suitable for testing SOAP-based Web services requests, which are fundamentally different in many ways from traditional Web page requests? And perhaps most importantly, how can we reliably test when aggregated services are hosted in a remote service center? Some practical strategy is clearly needed conducting accurate TCA testing over the Internet.
Generating load in sufficient quantity to push services designed for massive capacity beyond their point of exhaustion is no easy feat. Many clones are needed to exert such loads. A tool is needed to orchestrate testing and ensure that all test clones are using the same test scripts, uniformly generating load, starting and stopping in unison, and gradually incrementing load for each test run. In addition, a realistic test scenario usually necessitates juggling separate security contexts, parameter data, and possibly cookies.
Various tools capable or orchestrating test clones in this way have evolved for testing Web applications. Early studies often used the Web Application Stress Tool (WAST) because it was freely available. Later, Application Center Test (ACT) provided a more flexible and scriptable tool for this purpose. Other very capable, but sometimes quite expensive, tools were also readily available. Most of these tools installed a lightweight, HTTP proxy service on each clone. In addition to coordinating with its peers, the proxy service could generate the GET, PUT, and POST requests needed to drive load on a Web application.
Another phase of evolution was needed to produce a tool capable of driving load using the more complex vocabulary of Web services. To do this, a SOAP proxy (WSDL) for each service operation must be generated, scripted, and then deployed and executed on each test clone. Beginning with 2005, Visual Studio Team System provides a built-in load test capability specifically designed to support load testing against Web services at very high scale. This tool has undergone further improvements for testing Windows Communication Foundation (WCF) services in Visual Studio 2008.
The integration of load testing directly into the development environment makes it possible to incorporate spot testing against an established TCA baseline; this means developers can quickly ascertain the performance and capacity impact of a proposed code change. Used in this way, TCA can also help streamline optimization efforts and indicate when the point of diminishing returns is reached. TCA tends to produce the greatest value when it is treated as an integral part of application life cycle management (ALM).
If we put theoretical difficulties of predicting response time aside for the moment, composition and integration of services still pose some interesting practical problems for TCA. Let’s imagine a scenario like the one depicted in Figure 8, where a small or medium business named Contoso is a consumer of a service hosted at A Datum. Let’s also suppose that A Datum integrates its own, on-premise assets with those provided by another services host called B Datum. If all is well, the end-user at Contoso is blissfully unaware of the engines of cyberspace that are performing these services. But remember, Contoso is likely to demand an SLA. For A Datum, the challenge will be to conduct a TCA for these services so that it can forge an SLA with Contoso that both parties can live with. But how, in concrete practical terms can such testing be conducted?
Figure 8. A SaaS Scenario for TCA Over the Internet
What makes this different from many traditional TCA scenarios is that the assets to be tested may be owned by several different business entities, each with their own agenda and set of concerns. Let’s consider a few of the alternative strategies as well as some of the obvious pros and cons of each.
1. A Datum could conduct a TCA study against B Datum’s production services.
At first blush, this approach seems very risky for B Datum because it must expose its resources to high loads solely for the purpose of allowing a single tenant to test. Moreover, for multi-tenant services, testing in isolation from production loads may be impossible or highly inaccurate. However, for the single-tenancy model, this option may be feasible if A Datum is allowed to test directly against the unused infrastructure it will ultimately use in production. It might also be usable for the very first client on-boarded in a multi-tenant scenario.
2. B Datum constructs and maintains test services parallel to its production services and A Datum conducts a TCA study using the test services.
This option could be cost prohibitive for B Datum because it must operate and maintain a full, parallel service center for test purposes only. Even this option can produce somewhat misleading results if testing is done in isolation from the load that would be created by other tenants in a multi-tenancy scenario. Another drawback of this approach is that it would tend to serialize on-boarding for B Datum based on access to the test bed. Although it is always advisable to test under realistic conditions, this approach will also be subject to the limitations mentioned earlier about testing over the Internet. A Datum should not trust the average service response time figures.
3. B Datum conducts a careful and complete TCA study including test clones that access the service via the Internet. A Datum relies on these estimates and recalculates based on its own expected usage profile.
This may turn out to be the most practical option for both parties in many situations. However, the problem for A Datum is that this approach may not provide any useful estimates of its own hardware requirements. For example, in our hypothetical scenario A Datum must do some form of testing to determine the transaction cost of acquiring and serving its own data. Savvy customers may demand to know these results.
4. B Datum publishes a complete TCA study. A Datum conducts its own TCA study using the B Datum transaction cost estimates to emulate service latency and network utilization.
This is a highly practical option for many situations and A Datum can develop capacity estimates for its own services in conjunction with those of B Datum. However, one important drawback of this strategy is that A Datum must spend the time and effort to implement stub services that accurately emulate B Datum’s services (based on B Datum’s TCA estimates). Occasionally, this may also require changes to consumer (A Datum’s) code. A notable advantage of this strategy in multi-tenant scenarios is that B Datum’s estimates will hopefully reflect full multi-tenant loads. A Datum is required to trust the accuracy of B Datum’s published estimates instead of testing against a live service. In this scenario, TCA can function as a de facto standard.
Clearly, there is no perfect or universal strategy for TCA testing over the Internet under realistic conditions. Regardless of which avenue is chosen, some practical compromises will be necessary for conducting a study with composite services. By the way, note that other forms of load testing are susceptible to many of these same practical obstacles, but do not hold the promise of yielding reliable capacity planning or a predictive model. Conducting a full-blown TCA study may not be cost justified in every case. When truly high capacity is needed, as many multi-tenant services will, the cost of a careful and rigorous study can pay for itself many times over.
Of course, doing nothing is also always an option; but bear in mind that an SLA without a solid foundation in fact may prove to be a castle in the sand. Consumers who are planning to make a big bet on services-in-the-cloud would be wise to insist on full transparency about this information. Any service provider that has done due diligence should be able to provide these estimates and inform new tenants about how much of existing capacity will be absorbed at the time of on-boarding.
TCA has occasionally been criticized as a Windows-only methodology. There is no real theoretical basis for this charge. The general concepts and underlying principles that form the foundation of this methodology are not specific to Windows or any other platform. Ironically, one of the original references cited as helpful for understanding TCA was a book about capacity planning for Sun Solaris by Brian Wong that is still in print today. Not surprisingly, therefore, the basis for this claim is usually anecdotal observations. For example, the tools needed for generating load and measuring results all seem to be Windows-based. Or, the key counters seem to be Windows performance counters, WMI statistics, and so on. There is nothing strange about this because the TCA methodology was developed at Microsoft and it has been most widely, if not exclusively, used for Windows-based applications. However, this observation does focus attention on the practical necessity of good tools for generating load, managing test runs, capturing resource utilization, and analyzing results. In addition, these tools must evolve along with the changing protocols, runtime environments, and data exchange patterns of service-oriented solutions.
For many of the TCA studies on B2C Web applications, such as the Commerce Server study referenced earlier, the transaction rate is often measured by recording the Requests/Sec counters provided by IIS and/or ASP.NET. Some of these counters can also be used for testing ASMX Web services because these Web services are hosted exclusively in IIS; but, by the time we arrive at WCF hosted services, the picture changes in important ways. In WCF, HTTP service endpoints are managed by the intrinsic channel architecture of WCF itself. IIS many not even be used as the host process. Performance counters exposed by IIS or ASP.NET may be of little use for WCF. Fortunately for our purposes, WCF exposes a plethora of its own counters, including most, if not all, of the ones needed for accurate TCA measurements. Delving further into these details is beyond the scope of this article, but more information about the WCF counters is available here.
However, the fact that these counters are available doesn’t completely solve our problem. On Windows servers, performance counters can only be monitored over SNMP, which is not an Internet-friendly protocol. Moreover, TCA recommends that these counters be collected on a computer other than the servers being tested in order to minimize the “observer effect.” Co-locating a “collection” server with the hosted services would seem an obvious solution for this dilemma, but this can be difficult in practice because of the need to stop and start data collection and test runs simultaneously. Starting and stopping in concert makes it much easier to isolate and analyze the measurements specific to each test run. There is a tacit assumption within TCA, that test clones, tested servers, and data collection points are all readily accessible during testing. In some SaaS scenarios, such as the one depicted in Figure 8, this may be difficult or impossible.
At first glance, collecting performance counters via HTTP seems like a tantalizing way to resolve this issue. But it would probably result in amplifying the observer effect. Transporting counter data over the same protocols, ports, and network are likely to add a “tax” onto some of the same resources being measured. Using Remote Desktop Protocol (RDP) access hosted on a local collection server is one fairly elegant solution because native RDP can be tunneled via HTTP, but I/O can be optimized, permitting coordination but minimizing interference during test runs.
Even with this solution in place, additional performance monitoring will still be needed for the clone computers that generate test load. During a TCA, test clones must also be monitored to ensure that service throughput is not being artificially limited by clones that have reached their own point of exhaustion. Therefore, some provision is also necessary to coordinate the basic measurements of test clones.
TCA unquestionably yields its greatest value when it is an integral part of the development process. This point can hardly be overemphasized. When code is developed with the expectation that it will be tested with this methodology, it can be fully instrumented to support accurate testing. Regardless of which monitoring tools are used or how many native counters and system events are available in the run-time environment, some additional custom instrumentation is usually needed to supplement them.
One of the misconceptions about TCA is that testing and analysis should be conducted only after the code complete milestone has been reached. Much to the contrary, testing is far more effective if it is done as part of unit testing. With respect to services, each operation can be tested in isolation as soon as it is functionally complete so that a solid baseline for utilization and latency measurements can be established. Testing services under high levels of load as early as possible can alert developers to issues with scalability and performance and furnish opportunities to efficiently tune and optimize a specific operation.
Occasionally, early testing will reveal a fundamental design defect that will cause a specific operation to reach the point of exhaustion (as we have defined it) even though utilization of key resources are still at relatively low levels. For example, peak throughput is reached and/or latency is unacceptably high, but %CPU is 20. Perhaps the most common cause of this phenomenon is that concurrent threads of execution are serialized by a shared resource. Dramatic increases in queue lengths are another indication that serialization is occurring.
When this phenomenon is detected, it is vitally important to realize that simply adding hardware resources is not an effective solution. In this case, poor performance and throughput are caused by excessive waiting instead of slow execution. As painful as it may be, discovering and rectifying this problem early in the development cycle is invaluable; finding and addressing it prior to production will generally pay handsome dividends and will usually significantly reduce hardware costs. In some cases, the remedy may not require extensive change, but it may yield great improvements in scalability, performance, or both.
When TCA is an integral part of the development process and developers code with testing in mind, the quality of code often improves as well. Developers write code with a heightened awareness of the need for efficient sharing of scare resources and the danger of serialization—in effect, scalability and performance are in the forefront of their minds.
There is no question that devising and executing a successful TCA study takes generous portions of hard work and ingenuity. Yet most people who have experience with this methodology will tell you that the results are usually well worth the effort. Regardless of the type of Internet services that you plan to place under the microscope, there are always numerous practical obstacles that must be overcome to achieve reliable numbers. If you are a newcomer to TCA, it is always advisable to seek the help of someone with real world experience using this methodology.
To be effective, the tools and techniques used to implement TCA studies have to be continually adapted to the evolution of Internet service protocols and technologies. Despite any theoretical limitations or practical difficulties, TCA continues to offer a valuable tool for accurately measuring capacity and performance characteristics of these services, but we must be careful to adapt it the specific challenges posed by SaaS architectures, including service integration, composition and multi-tenancy.
In light of the fundamental changes and increased diversity of specialized hardware architectures that have emerged, we must be on guard against extrapolating estimates from one platform to another. The diversity of hardware and platform architectures invalidates the claim that less powerful hardware will always perform linearly at worst.
The unpredictable nature of network latency on the Internet prevents us from accurately forecasting the average user response time, but TCA can still provide us with important information about the minimum achievable latencies for services under peak load. Because Internet latency lies at the heart of many SaaS services, TCA cannot reliably define average response times in these cases. For this reason, service consumers and providers should be cautious about service level assurances based on this metric. Nevertheless, TCA can provide very important information for determining the minimum achievable latency at a given level of load.
SaaS architectures present both theoretical and practical challenges, but achieving credible capacity estimates and performance information using TCA is possible if this methodology is used within these prescribed limitations. Used in this manner, TCA can yield a predictive model of capacity that is particularly valuable for multi-tenant services because a well formed usage profile can help adjust capacity estimates without retesting. This model of testing and analysis yields its best results when used early and used as an integral part of the development process.
Dedication & Acknowledgements
This article is dedicated to Jim Gray, a Microsoft Distinguished Engineer and a fellow sailor, whose work and many extraordinary contributions continue to influence and inspire so many people at Microsoft and beyond. I got a copy of Jim’s book, Transaction Processing many years ago and I still have occasion to refer to it and find value in it today.
I would also like to acknowledge the helpful comments, insight and support of Morgan Oslake and Hilal Al-Hilali, both of whom were members of the original brain trust that developed TCA and first used it to improve the quality, performance and scalability of various Microsoft services and server products. As always, I take full credit for any mistakes or defects that might remain.
- “Capacity Model for Internet Transactions”, Morgan Oslake, Hilal Al-Hilali, David Guimbellot, © 1999, Microsoft Corporation http://technet.microsoft.com/en-us/commerceserver/bb608755.aspx
- “Commerce Server 2002 Performance and Transaction Cost Analysis for the Catalog System”, Microsoft TechNet Commerce Server Tech Center, http://technet.microsoft.com/en-us/commerceserver/bb608745.aspx © 2008 Microsoft Corporation.
- Improving .NET Application Performance and Scalability, patterns & practices, © 2004 Microsoft Corporation
- Increased page faulting is more common on 32-bit servers due to the segmented memory architecture. For more detail about 64-Bit windows see “Windows Server 2003 x64 Editions Deployment Scenarios”, Charlie Russel, © 2005 Microsoft Corporation
- Transaction Processing: Concepts and Techniques, Jim Gray and Andreas Reuter
- “Architecture Strategies for Catching the Long Tail”, Frederick Chong and Gianpaolo Carraro, © 2006, Microsoft Corporation
- Configuration And Capacity Planning For Solaris Servers by Brian L. Wong, © 1997 Sun Microsystems, Inc.