by Dan Rogers and Ulrich Homann
Summary: Energy is an increasingly scarce and expensive resource. This reality will continue to have a profound effect on how IT solutions are designed, deployed, and used, particularly at the data center level. While virtualization and other power-saving technologies may go part of the way to solving the problem, virtualizing inherently inefficient applications has obvious limits.
This paper presents ideas for how to design power-efficient applications. Reducing the resources required to get work done will soon not only save companies money by reducing the need to build new data centers, it will be an absolute necessity for continued growth.
Theory of Constraints
Challenges and Realities
Which Constraints to Optimize?
A Note on Power Cost Models
Scale Unit-Based Architecture and Design
Microsoft’s Investment in System Center
Companies around the world are either already facing, or are close to reaching, hard limits in the amount of power they can afford or are allowed to consume. In Figure 1, we see that server management and administration costs appear to consume the largest part of data center costs, and we would think of them as the limiting factors in data center growth. However, if you cap the power spend, it draws a box around the growth chart, with power becoming the limiting factor — and that is exactly what the survey data on the right of the figure shows.
Figure 1. Data Center Costs (Click on the picture for a larger image)
With world governments looking at energy and the U.S. Congress mulling capping data center expenditures as a part of gross national power, available power and cooling are indeed key constraints to future growth. To deal with these constraints, organizations are attempting to limit or reduce the amount of energy they use. However, today’s most utilized approaches — primarily focused on infrastructure optimization — may be too narrow to deal with the power challenges of tomorrow. Methods of optimizing infrastructure usage are required that run the entire ecosystem, spanning the disciplines of application architecture and design, data center management, and IT operations.
A useful starting point is Dr. Goldratt’s Theory of Constraints (TOC), familiar to many readers. Essentially, TOC stipulates that identifying and managing system-level constraints (or interdependencies) is a more effective route to optimization than trying to optimize each subsystem individually. Originating in factory production management, TOC has been extended to general application and serves us well as an introduction to green IT. The most relevant principle of TOC here is throughput optimization, which assumes that you achieve maximum throughput by controlling system bottlenecks — that is, by arranging bottlenecks in the system so that they maximize the possible throughput of the available resources. In the case of IT, the constraints are infrastructure components — servers, images, virtual machines, memory, bandwidth, and so on — that control the speed with which valuable business output can be produced.
There exist well-respected and proven application and solution design patterns and behaviors in use today that conflict with the constraints of today and will only become more so as time goes on.
Synchronicity is dead. Today’s application developers and architects came of age in a world where resources were typically available whenever they were needed. Synchronicity of needs and resources has been a core staple of our trade since we left the mainframe. As power capacity becomes increasingly scarce, governments have begun discussing hard limits on how much energy data centers can consume. This indicates that in the future developers will be forced to expend significant additional time optimizing application power consumption, simply because this costs less than exceeding the limits on consumption.
Envisioned application usage success as a design tenet. Most solutions today are designed around a hypothetical end state rather than as an organic system that grows to meet demand. For example, a company with 150,000 users may deploy a SharePoint solution with 50 GB of disk space for each user. The solution design, capacity, and deployment plan assume “success” — that is, that all 150,000 users will actually collaborate and use their 50GB of disk space immediately.
Figure 2 shows a typical load profile for a fictitious usage of a deployed application/solution with load ramping up over two years. Extra capacity may be reassuring, but it is typically wasteful: Resources — power, storage, and so forth — are being allocated for the ultimate usage profile on Day One of the solution deployment rather than on a just-in-time basis that conserves precious resources and saves costs.
Figure 2. Load profile for a deployed solution (Click on the picture for a larger image)
What is required? And when is it required? In the mid-’90s, Microsoft IT used to run an HR application with a separate payroll solution that required substantial number of servers — servers that ran for only two days per month and sat idle the rest of the time. Frugal managers pointed this out, but the IT staff didn’t feel comfortable reusing the systems for other tasks. Given that everyone likes getting paid, the IT folks won every time. Today, this has changed. The first response is no longer, “Can we add another computer or data center?” Instead, the approach is, “Can we design it to use fewer resources?” To do that, fluctuations in capacity demands need to be well understood — that is, one needs to ask, “What is required when?”
Design by seat of your pants (SOYP). Planning by SOYP is a sure way to introduce inefficiency into a system. Yet even major installations are often designed today using what is fundamentally guesswork based on past experience and vague capacity data. The combination of input factors generally leads to a solution design that is targeted at peak load plus n% [20% < n < 50%]. The same algorithm used to plan project timelines gets applied to capacity because nobody wants to be the one that messes up the project by underestimating.
‘Belt-and-Suspenders’ solution design approach. Early client/server books taught us to be over-cautious. Because our system solutions were not quite as robust as needed, we therefore went to great lengths to avoid risk, embarrassment, or exposure: If the belt should break, the suspenders will keep the pants on. In the application design arena, we devised application and infrastructure partners with multiple recovery solutions for every possible failure. The phrase “single point of failure” became the impetus for countless spending sprees: three Web servers where two would do, two load balancers because the one might just smoke someday, two databases on clustered, redundant storage area networks (SANs), and so on.
The need for recovery and robustness solutions doesn’t go away, but needs to be balanced with environmental objectives of the organization. As power use grows in importance, application architects will be required to pay more attention to balanced designs — we can’t count on virtualization or hardware advancements to keep the focus off of our discipline. Everyone will have to pitch in for more energy-efficient IT.
With the law of synchronicity challenged by power constraints, we have to ask what optimizations will allow us to provide desired services while running applications and systems at ever-lower cost. As Figure 3 shows, the levers are energy spend, service units available, and service units consumed.
Figure 3. Servers use vital resources whether on or off.
The way to read this figure is to recognize that it is a four dimensional graph. (In this rendering, the cost unit is the data center. If your scale is based on smaller units, adjust appropriately.) The small blocks represent scale units, meaning that they represent the fixed capacity gained as a result of a fixed amount of capital spent. You cannot add less than one unit, although you could vary the size of the units added. The lesson is that capacity does not move up in smooth fashion to match demand but rather in graduated jumps, leaving a gap between service units consumed and energy spend. The overall spend is your key optimization factor. You need to “run them full” — the corollary to the IT department’s mantra of “turn them off.” To maximize energy efficiency, more work must be accomplished without simply adding units.
Power is paid for differently based on how much you use. Smaller customers pay for actual power used. They will typically try to reduce the ratio of power used to services rendered, and also to shut down non-required services. Large power consumers commit to a certain level of consumption at a given rate for a given time period. They will try to reduce the amount of power for which they contract or try to increase the number of services they can provide based on the power they have purchased. They will not typically think first of shutting off resources since they do not get any immediate benefit from it.
Now that we understand what drives optimization and what the application architecture and design-specific challenges are that stand in the way of this optimization, it is time to look at things we can actually do to effect energy efficient design.
Without the ability to gather and analyze data on the behavior of systems and their components, optimization is guesswork. The following guidelines are from a Microsoft-internal paper by James Hamilton summarizing learning from our large-scale Internet properties:
A simple approach is best. Decide which results you want measured, and then measure them. Don’t settle for approximations or statistical sampling alone. The key to successful measurement is using throughput measures instead of point-in-time samples.
A useful tool to support your efforts is the logging application block of the Enterprise Library released by the Microsoft patterns & practices team. This building block, among others, can support instrumentation investments in application code.
Composition is a wide field in the land of application architecture. For our purposes, we simply wish to encourage architects and designers to challenge their assumptions about components, services, and system design rather than debating the relative merits of specific models.
One of the challenges identified earlier is the “what is required and when” approach. This problem is as cogent for code as it is for hardware. Application code, components, and modules can be deployed and consume resources despite being necessary only a fraction of the time — or not at all. Even systems highly suitable for partitioning are often run as monoliths. This approach may be the least risky from an availability standpoint, but it makes the dangerous assumption that resources are unlimited. The well-known approaches and methods of composition and factorization can help arrive at a resource-friendly design that still meets user needs.
Factorization and composition are powerful tools in the quest to minimize resource usage. The goal is to achieve the minimal default application surface that can be run as a stand-alone unit. Yet it is important not to over-optimize for too limited a set of goals. To allow for selective deployment of functionality, your application code should be factored into functional groups. This approach allows components to execute at specific points in time without sacrificing overall function. Work can then be scheduled in well-identified, resource-friendly units.
Windows Server 2008 follows this model. Figure 4 shows the taxonomy used by the Windows Server development team to help them work toward the goal of enabling customers to optimize based on their different respective constraints.
Figure 4. Composition and Dependencies for a Single Server or Workload (Click on the picture for a larger image)
Windows Server Manager uses roles to deploy and manage features. Deployment choices can be made on the feature and role level, no longer bound tightly together in unnecessarily large chunks such as workload, solution, or product. (Components in the Windows Server world are effectively invisible.) All Windows applications released with Windows Server 2008 follow this taxonomy. Other Microsoft products such as Exchange 2007 follow a similar role-based factorization approach.
While this taxonomy might appear specific to Microsoft and Windows, it works for other operating systems and applications as well. In fact, it is a general composition framework for software development — for applications, operating systems, databases, or line-of-business solutions such as ERP or CRM systems.
The model could be too rich for some applications, but the basic concept of roles is familiar to most developers. Even a one-role application is typically dependent on other roles (a database, for example) or interacts with other roles or applications (such as a Web site interacting with an ERP system). In traditional client/server applications, however, the relationships are hidden in the compiled bindings. To enable predictable and automated resource management, these dependencies must be surfaced as modeled elements included with the application bits.
The taxonomy in Figure 4 describes composition and dependencies for a single server or workload. Most solutions in today’s data centers are distributed and involve multiple tiers. In Figure 5, we add dimensions to the model allowing us to describe dependencies across servers and tiers. Server/Workload and Server Role are the same as in the Windows Server taxonomy in Figure 4, enabling us to seamlessly join the models.
Figure 5. Distributed Application Model Dimensions
This kind of information allows the operations team to understand critical resource dependencies and usage. It also enables automated provisioning of resources aligned with functional and non-functional requirements as well as operational patterns such as user or transaction load at any given point in time.
The application architect’s job is to structure the application ito allow functional elements to be deployed on demand, just-in-time, whether this is within a single server or across multiple servers. We believe that this provides operators with the flexibility to optimize for total spend no matter which energy purchasing model is in effect. It also enables operators to clearly understand and differentiate required functionality and dependencies within applications and across applications. In this way, the deployment will meet the user requirements with minimal usage of resources.
Earlier, we discussed the problem of systems designed around “success” — a hypothetical maximum number of users. Building an application to serve every user that will accrue over time is a sure path to waste; an oversized system also increases the risk of point failure and may be fiscally unfeasible at the outset.
To take a simple example, you can’t buy a load balancer that scales dynamically. They all have a fixed number of ports. The capacity of your physical equipment becomes a scale limit that has a huge impact on cost. You can buy a 500-port balancer, but if you only need 100 ports the first year, the waste will eat the money you saved through buying the more capacious hardware. Just as you wouldn’t build a wall with one huge brick, you don’t want to design your IT deployments for envisioned maximum capacity.
Smart scaling requires using an approach based upon scale-units. Scale-units trigger “managed growth” based on well-defined “growth factors” or “growth drivers.” Combining this approach with instrumentation and composition/factorization described earlier deployed in a just-in-time fashion, aligns resource use with the actual user demand without compromising the architecture. This approach is practiced in large-scale Internet properties.
This is where scale-unit planning comes in. A scale unit is a defined block of IT resources that represents a unit of growth. It includes everything needed to grow another step: computing capacity, network capacity, power distribution units, floor space, cooling and so on. A scale unit must support the initial resource needs while enabling you to support the necessary delta of key growth factors, be they transactions, number of users, ports, or what have you. The size of the scale unit is chosen to optimize the trade-off between resources needed immediately and those that can grow over time.
Growth drivers are generally hidden in capacity planning documents for any given product. For example — and speaking generally — the Microsoft SharePoint Server planning guide tells you to plan for resources (servers, storage, bandwidth, and so forth) based on the number and sizes of SharePoint sites, desired responses per second, and expected number of active users. These are the growth drivers for SharePoint Server. Based on the “success as a design tent” model, capacity projections are typically matched to architecture which is then transformed into resource requests and deployment. Our challenge is this: We don’t want to deploy for “success.” We want to deploy just-in-time to use resources most efficiently.
The design process is actually quite simple and straightforward:
Effective scale-unit deployment requires effective monitoring, deployment (provisioning) that is automated (or at least well-managed), and a clear cause-and-effect relationship between growth drivers and scale-units.
Here is an example using SharePoint. Let’s assume a fictional SharePoint Server deployment for our much-beloved Contoso manufacturing company. The desired end state is supposed to be reached in 2010, as business units across the globe adopt the solution. Scale-units and growth drivers for the solution are shown in Table 1.
Table 1. Scale Units and Growth Drivers for Example SharePoint Deployment
|Number of active users putting pressure on responses per second||SharePoint application server|
|1 TB limit on size of all content databases for a single SQL Server instance||SQL Server instance|
Figure 6 shows the scale-units that grow the deployment based on the growth drivers identified in Table 1 as well as the eventual maximum deployment.
Figure 6. Scale-Units Grow (Click on the picture for a larger image)
The solution is initially deployed as an end-to-end solution with two SharePoint application server instances and one SQL Server instance. While the graphic appears to show sequential triggering, the growth drivers are independent of each other, although the deployment of one scale-unit might affect some other aspects of the deployed solution.
To examine the effectiveness of this approach, let’s combine some of the ideas we’ve discussed and revisit the model we used earlier to discuss scale-out units and capacity planning.
Look at the scale labeled “Service Units Consumed,” represented by the red line on the right hand side of the graph in Figure 3. At time T-0, the data center has been built and no capacity is being used. We begin to consume the resources available in the data center and as time passes, use outstrips the capacity. The area below the purple line is used capacity. The area above the purple line is unused capacity. Effectively managing unused capacity means real power savings.
For application architects, virtualization is not the starting point for greener IT — it is a tool that helps efficient applications use fewer resources. At the start of capacity growth, virtualization enables us to match hardware resources more precisely to user needs. This reduces power spend by reducing provisioned but unused hardware capacity. In the future, virtualization will allow you to shut down excess resources that are already in place. Such a solution requires switchable, computer-controlled power circuitry; dynamic network configuration; and software designed to scale up and down in parts that can be composed and separated as necessary. The solution must be linked to and driven by monitoring systems that not only track application health but “understand” capacity drivers and capacity consumption. With those core elements, computing resources (and therefore power) can be throttled based on how well the solution is meeting its service level agreement (SLA). If it is performing within the necessary parameters, spare capacity can be shut down.
This is the core scenario driving a massive industry shift toward virtualization. It’s amazingly complex, requiring more control and more accurate models (capacity models, consumption models, monitoring on multiple dimensions, safety margins, and power plans, to name a few) than we’ve ever seen in software systems.
Even if your work is not measured in units the size of data centers, the challenge remains: Use less to get the same amount of work done. This is where the architectural partitioning of applications becomes critical. Today, the problem is one of excess resources: “Find the unused ones and turn them off.” It is a problem that can be solved by IT pros in the trenches. In the future, “more” will not even be an option. Only designers and architects can solve that problem — and it’s one that must be solved before software is ever deployed or even built. Instead of building applications that rely on synchronicity — the “get more to get more” approach that uses virtualization to solve the problem of waste in off-peak hours — we can drive the wave to the optimization side of the resource equation by enabling the applications that are used in the business to consume resources as fully and effectively as possible. Instead of thinking ,”buy a lot of computers,” constrain the problem by asking yourself how to get the job done on a third or a quarter you think is necessary. This is the world that will be a reality in the next 20 years — a world where five servers worth of software has to be virtually scheduled onto two computers worth of capacity. The way to do this is to make sure you can use all of the waste — the space above the purple line, so that you are using all of the resources available all of the time — efficiently and effectively.
Architects must design applications the parts of which can function independently without losing the ability to share data and generate work product in the proper order. Returning to the example of the payroll system that is only active two days in a month, our challenge is to find a way to use those resources during the other 28 days. Instead of dedicated software configurations on fixed hardware, we need to design applications which can be stored on the virtual shelf in pieces, ready to be pulled out and activated in various combinations just-in-time.
Achieving this requires a powerful, flexible set of control software. The control software and the application have to work together to track and profile loads across functional components (such as server roles) as well as topology (such as sites and server groups). We have to track and profile static usage, as well as dynamic usage, introducing time as a key resource management dimension. Figure 7 shows a simplified load profile over a 24-hour period. Using time enables us to break the synchronicity principle and increase the service output despite a fixed resource budget.
Figure 7. Simplified Load Profile over 24 Hours
Managing individual applications is a critical step in the right direction. However, the real power is in static and dynamic resource management across the entire the portfolio of applications that provide the desired service output. The control software has to be enabled to track resource usage, identify scale-units that can be managed on a scheduled basis (that is, not running at all times) or scale up and down dynamically based on the managed load profile across all applications. Properly factored, documented, and instrumented applications are the key to this new dimension of management which will enable us to optimize resources in ways we have not yet seen in distributed environments.
Just as we have begun componentization of the operating system, Microsoft has been pushing monitoring, software distribution, problem diagnosis, and virtualization strategies. A key investment is in management, infrastructure control, and automated operations, embodied in the Microsoft System Center product family. Today, Microsoft Operations Manager, Virtual Machine Manager, and Configuration Manager are already on a path to merge and enable dynamic and flexible control systems necessary to efficient use of resources. Looking forward, anticipate a rapid convergence on total system management solutions that let customers optimize to run more using fewer resource units — be they floor space, cooling, or compute capacity.
How does this fit with the building block approach? System Center will control the work within the dynamic operating environment. From a design and software development perspective, a Management Pack — the System Center control metadata — will define what to look for (monitoring data provided by application instrumentation that exposes key capacity and resource consumption information) and provide the formulae that determine what health or capacity impact these measures have on the critical system components. System Center will know what is running and how well and balance workloads over time.
Management packs will become a key control component for each of the compartmentalized elements of your systems. The management pack architecture is simple — an XML file that follows a particular set of schemata — but powerful because it is metadata that describes the condition and health of factored components. What works for large synchronous applications also works well for factored applications. In fact, the best management pack designs today — for example, those for Windows Server 2008, SharePoint, and Dynamics applications—exhibit layering and componentization out of the gate.
As System Center evolves, management packs will become core parts of business applications, infrastructure, and foundation elements in the stack. Since System Center is already capable of monitoring computer health for Linux and Unix servers, the foundation is laid for a revolution in application flexibility — one that will let you manage your systems with substantially fewer computing resources, but still lets you choose the operating system and software suppliers. No matter what technology you build on, as long as you enable the shift to fractional application design, you or someone else will be able to describe the monitoring and capacity utilization surface as a management pack. That will enable your applications to be managed economically and efficiently, unbound by the hardware constraints that limit application efficiency today.
Application architects can design distributed systems for more effective resource management. First, we can align well-defined optimization goals with factorization. Scale-units enable us to group and align well-factored solutions with minimized deployments without compromising the overall deployment architecture. Scale-units combined with instrumentation will enable control software like System Center to manage a portfolio of applications based on the established optimization goal and approach. These capabilities are not far off. In a world with absolute limits on power consumption, the sooner we can bring them to bear, the better.
Measuring National Power in the Postindustrial Age, Ashley J. Tellis et. al, RAND Corporation, 2000
Wikipedia - Theory of Constraints
Dan Rogers joined Microsoft in 1998 and has worked on a number of products including BizTalk, Visual Studio and MSN. He currently works in the System Center product team and focuses on monitoring strategies for enabling dynamic IT solutions. While in MSN, he helped design monitoring solutions for large scale dynamic properties such as popular storage and email services, and been a member of a number of projects involving lights out operations for a global data center initiative.
Ulrich Homann is the lead solutions architect in the strategic projects group of Microsoft worldwide services. As part of the office of the CTO, this group is responsible for the management and enterprise architecture of key strategic projects all over the world.