Project Genome: Wireless Sensor Network for Data Center Cooling
by Jie Liu, Feng Zhao, Jeff O’Reilly, Amaya Souarez, Michael Manos, Chieh-Jan Mike Liang, and Andreas Terzis
Summary: The IT industry is the one of the fastest growing sectors of the U.S. economy in terms of its energy consumption. According to a 2007 EPA report, U.S. data centers alone consumed 61 billion kWh in 2006 — enough energy to power 5.8 million average households. Even under conservative estimates, IT energy consumption is projected to double by 2011. Reducing data center energy consumption is a pressing issue for the entire IT industry now and into the future. In this article, we argue that dense and real-time environmental monitoring systems are needed to improve the energy efficiency of IT facilities.
Only a fraction of the electricity consumed by a data center actually powers IT equipment such as servers and networking devices. The rest is used by various environmental control systems such as Computer Room Air Conditioning (CRAC), water chillers, and (de-)humidifiers, or simply lost during the power delivery and conversion process. The data center Power Usage Effectiveness (PUE), defined as the ratio of the total facility power consumption over the power used by the IT equipment, is a metric used by The Green Grid to measure a data center’s “overhead.” A higher figure indicates greater energy “overhead” while a lower figure indicates a more efficient facility. For example, average data center has PUE ~ 2, indicating that half of the total energy is used for IT equipment. However, the PUE can be as high as 3.5.
One key reason for the high PUE is the lack of visibility in the data center operating conditions. Conventional wisdom dictates that IT equipment need excessive cooling to operate reliably, so the AC systems in many data centers use very low set points and very high fan speed, to reduce the danger of creating any potential hot spot. Furthermore, when servers issue thermal alarms, data center operators have limited means to diagnose the problem and to make informed decisions other than further decreasing the CRAC’s temperature settings.
Given the data centers’ complex airflow and thermodynamics, dense and real-time environmental monitoring systems are necessary to improve their energy efficiency. The data these systems collect can help data center operators troubleshoot thermo-alarms, make intelligent decisions on rack layout and server deployments, and innovate on facility management. The data can be particularly useful as data centers start to require more sophisticated cooling control to accommodate environmental and workload changes. Air-side economizers bring in outside air for cooling. Dynamic server provisioning strategies, such as those presented by Chen et. al at NSDI 2008 (see Resources), can turn on or shut down a large number of servers following load fluctuation. Both techniques are effective ways to reduce data center total energy consumption, but variation in heat distribution across space and time may also cause thermo-instability.
Wireless sensor network (WSN) technology is an ideal candidate for this monitoring task as it is low-cost, nonintrusive, can provide wide coverage, and can be easily repurposed. Wireless sensors require no additional network and facility infrastructure in an already complicated data center IT environment. Compared to sensors on motherboards, external sensors are less sensitive to CPU or disk activities, thus the collected data is less noisy and is easier to understand.
At the same time, data center monitoring introduces new challenges to wireless sensor networks. A data center can have several adjacent server colocation rooms. Each colocation room can have several hundred racks. In practice, we have observed up to 5°C temperature variation across a couple of meters. Multiple sensing points per rack means thousands of sensors in the facility. Not only is the size of the network large, the network density can also be high. Dozens of sensors are within the one hop communication range of the radio (10 to 50 meters for IEEE 802.15.4 radios, for example), leading to high packet collision probabilities. In stark contrast to the scale and reliability requirements of the data center monitoring application, current wireless sensor network deployments comprise tens to hundreds of motes and achieve 20 ~ 60% data yields.
This paper presents the architecture and implementation of the Microsoft Research Data Center Genome (DC Genome) system, with a focus on RACNet, a large-scale sensor network for high-fidelity data center environmental monitoring. The overarching goal of the project is to understand how energy is consumed in data centers as a function of facility design, cooling supply, server hardware, and workload distribution through data collection and analysis, and then to use this understanding to optimize and control data center resources. We report results from ~700 sensors deployed in a multi-mega-watt data center. This is one of the largest sensor networks in production use. Contrary to common belief that wireless sensor networks cannot maintain high data yield, RACNet provides over 99 percent data reliability, with 90 percent of the data being collected under the current soft real-time requirement of 30 seconds.
There are many data center designs, from ad hoc server cabinets to dedicated containers. However, most professional data centers use a cold-aisle-hot-aisle cooling design. Figure 1 shows the cross section of a data center room that follows this design. Server racks are installed on a raised floor in aisles. Cool air is blown by the CRAC system into the sub-floor and vented back up to the servers through perforated floor tiles. The aisles with these vents are called cold aisles. Typically, servers in the racks draw cool air from the front, and blow hot exhaust air to the back in hot aisles. To effectively use the cool air, servers are arranged face-to-face in cold aisles. As Figure 1 illustrates, cool and hot air eventually mixes near the ceiling and is drawn back into the CRAC. In the CRAC, the mixed exhaust air exchanges heat with chilled water, supplied by a chilled water pipe from water chillers outside of the facility. Usually, there is a temperature sensor at the CRAC’s air intake. The chill water valve opening and (sometimes) CRAC fan speed are controlled to regulate that temperature to a set point.
Figure 1. An illustration of the cross section of a data center. Cold air is blown from floor vents, while hot air rises from hot aisles. Mixed air eventually returns to the CRAC where it is cooled with the help of chilled water, and the cycle repeats.
Data center operators have to balance two competing goals: minimizing the energy the CRAC consumes while at the same time ensuring that server operation is not negatively affected by high temperatures. However, setting the CRAC’s parameters is a non-trivial task, because the airflow and thermodynamics of a data center can be fairly complicated. The underlying reason is that heat distribution depends on many factors such as chilled water temperature, CRAC fan speed, the distances between racks and the CRACs, rack layout, server types, and server workload. Figure 2 illustrates the end result of the complex interplay of all these factors for a particular data center cold aisle. The thermal image shows that the temperature across racks and across different heights of the same rack varies significantly. Heat distribution patterns also change over time. Without visibility into the patterns of heat distribution, data center operators have little choice but to over-cool the entire data center.
Figure 2. The thermal image of a cold aisle in a data center. The infrared thermal image shows significant variations on intake air temperature across racks and at different heights.
Some data centers use Computational Fluid Dynamics (CFD) simulations to estimate heat distribution and guide their cooling management strategies. Such CFD simulations are useful, particularly during a data center’s design phase. They provide guidelines about room size, ceiling height, and equipment density. However, there are limitations to their usefulness: Accurate thermal models for computing devices are difficult to obtain; as soon as the rack layout or server types change, the current CFD model becomes obsolete; and updating CFD models is a time-consuming and expensive process.
Figure 3 depicts the architecture of the DC Genome system. Both the physical and cyber properties of a data center are measured to produce models and tools for facility management and performance optimization. Key components include:
- Facility layout: The rack, CRAC, and power distribution layout not only provide a basis for data presentation, but also affect cooling efficiency and, ultimately, data center capacity.
- Cooling system: The cooling system includes equipment such as the CRAC, water chillers, air economizers, and (de-)humidifier which are typically monitored by the building management system through a Supervisory Control and Data Acquisition (SCADA) system. The cooling equipment consumes a majority of the non-critical electrical load (IT equipment is the critical load) of a data center. Other factors such as outside weather conditions can also affect cooling efficiency.
- Power system: Besides non-critical power consumed by the cooling and power distribution system, detailed monitoring of the power consumed by various IT equipment is essential.
- Server performance: Server activities are typically represented by the utilization of key components such as processors, disks, memories, and network card. Measuring these performance counters is key to understanding how heat is generated by various servers.
- Load variation: Server and network load can usually be measured by the network activities for online service hosting. With application-level knowledge, more meaningful indicators of system load, such as queries per second or concurrent users, can be derived.
- Environmental conditions: Physical properties, such as temperature distribution, have traditionally been difficult to collect at a fine granularity. The RACNet system tackles this key challenge.
Figure 3. Overall architecture for the Data Center Genome system. Data collected from physical and cyber systems in data centers is correlated and analyzed to provide models and tools for data center management and performance optimization.
Data collected from various sources can be used to build models that correlate the physical and performance parameters. Thus derived, the “Genome” of data centers is a rich family of models, being useful for various purposes. Deriving and applying these models relies on building algorithms and tools for analysis, classification, prediction, optimization, and scheduling. The data and tools can be used by data center operators, facility managers, and decision makers to perform various tasks, such as:
- Real-time monitoring and control: Examples include resolving thermal alarms, discovering and mitigating hot spots, and adaptive cooling control.
- Change management: Given a small number of servers to be deployed, a data center operator can make informed decisions about their placement depending on the available space, extra power, and sufficient cooling.
- Capacity planning: Given a load growth model and an understanding of resource dependencies, one can analyze the capacity utilization over various dimensions to decide whether to install more servers into existing data centers, to upgrade server hardware, or to build new data centers to meet future business need.
- Dynamic server provisioning and load distribution: Server load can vary significantly over time. The traditional philosophy of static server provisioning and even load distribution will cause the data center to run at the worst-case scenario. Recent studies show significant energy saving benefits by consolidating servers and load. Controlling air cooling precisely to meet dynamic critical power variations is difficult, but the inverse strategy of distributing load according to cooling efficiency is promising.
- Fault diagnostics and fault tolerance: Many hardware faults in data centers are caused by either long term stressing or an abrupt changes in operating conditions. On the other hand, modern software architecture can tolerate significant hardware failures without sacrificing software reliability or user experience. This is changing the game of data center reliability. One should consider the total cost of ownership including both acquiring hardware and maintaining their operating conditions.
In the rest of the article, we focus on RACNet, the wireless sensor network aspect of the DC Genome system. It fills in a missing piece in holistic data center management.
The design of RACNet faces several technical challenges that must be resolved in order to achieve high-fidelity environmental monitoring across an entire data center.
- Low cost of ownership: There may be thousands of sensing points in a data center. The cheaper we can implement the system — in terms of hardware, infrastructure, installation labor, and maintenance — the more likely the technology will be adopted.
- High data fidelity: The DC Genome system relies on continuous data streams from the sensors for high-level modeling, analysis, and decision making. We set 30-second sampling rates on temperature and humidity sensing, and we require 99 percent data yield. To facilitate real-time monitoring and control, we also require over 90 percent data to be received by the deadline (when the next samples are taken).
- Seamless integration: Environmental sensors are organic parts of the overall data center management system. The sensor network should be integrated with the rest of the infrastructure, such as facility management, asset management, and performance management. This requires us to have an open interface for the sensor data, while hiding the complexity of managing thousands of devices.
We tackle these challenges by innovative hardware, protocol, and system designs.
Genomotes are sensor motes we specifically developed for the DC Genome project. To meet the requirements of low cost of ownership, we chose IEEE 802.15.4 wireless technology over wired and WiFi (see Resources). Wireless nodes give the advantage of easy installation and ammunition to network administrative boundaries. Compared to WiFi, 802.15.4 radio is lower power, has a simpler network stack, and require fewer processing cycles. Thus we are able to reduce total cost by using simpler microcontrollers.
Although other similar wireless sensor designs are available on the market (such as SUNSPOT, Tmote, and SynapSense nodes), Genomotes are customized to simplify the installation process and reduce the number of wireless nodes in the network without sacrificing flexibility and scalability. Specifically, as shown in Figure 4, we designed two classes of Genomotes, master motes and slave sensors.
Figure 4. MSR Genomotes with the relative size of a U.S. quarter; master mote (left) and slave sensor (right).
A Genomote master has a CC2420 802.15.4 radio, 1 MB flash memory, and a rechargeable battery. It is typically installed at the top of a rack. In addition to the radio, it has a RS232 socket. Each slave node has two serial ports, which are used to connect multiple slaves to the same head, forming a daisy chain that spans the height of a rack. Slave sensors are equipped with various sensors, such as temperature, humidity, and so on. Since they share the same serial communication protocol, different kinds of sensors can be mixed and matched on the same chain.
Without a radio, external memory, nor battery, slaves are about half the cost of the master motes. The master periodically collects the slaves’ measurements using a simple polling protocol and stores them in its local flash. DC Genome gateways then periodically retrieve stored measurements from each master using a reliable Data Collection Protocol (rDCP). A chain of four nodes can be powered at any mote via a single server’s USB port, thanks to the low power circuit design.
This hierarchical design has several benefits. First, separating data acquisition and forwarding means that the master can work with slaves covering different sensing modalities. Second, because the ratio of slaves to masters is high, simplifying the slave’s design minimizes the overall deployment cost, especially for large-scale networks. Finally, the design reduces the number of wireless nodes in the network that compete for limited bandwidth, while allowing individual racks to be moved without tangling wires.
Our system faces several challenges for reliable data collection. Low power wireless radios like IEEE 802.15.4 are known to have high bit-error rates compared to other wireless technologies. At the same time, data centers impose a tough RF environment due to the high metal contents of servers, racks, cables, railings, and so on. Furthermore, the high density of wireless nodes in RACNet — several dozen within the same communication hop — increases the likelihood of packet collisions.
RACNet’s innovative rDCP data collection protocol achieves high throughput and high reliability using three key technologies:
- Channel diversity: IEEE 802.15.4 defines 16 concurrent channels in the 2.4GHz ISM band. Although the radio chip can only tune into one channel at any given time, rDCP can coordinate among multiple base stations to use multiple channels concurrently. The number of nodes on each channel is dynamically balanced to adapt to channel quality changes. Using multiple channels reduces the number of nodes on each channel, reducing the chances for packet collision.
- Adaptive bidirectional collection tree: On each wireless channel, a collection tree is dynamically built to adapt to link quality changes. Due to the large number of nodes in a one-hop communication range, viable links are abundant. Choosing the right links for high quality communication is key to improving hop-to-hop success rates.
- Coordinated data retrieval: Thanks to the flash memory on board, each master node caches data locally before it is retrieved. To avoid losing data due to packet collision, data is polled by the base station, rather than pushed by the sensors. Only one data retrieval stream exists on an active channel at any given time.
Figure 5 shows the data collection yield over three days from 174 wireless master motes, driving 522 additional sensors. The data yield is computed as the ratio between the actual collected data entries and the theoretical result of 120 samples per sensor per hour. It is shown that over 95 percent of the sensors give higher than 99 percent data yield constantly. (The over 100 percent data yield is an artifact of in-network time synchronization: When local time proceeds faster and must occasionally be adjusted back to the global time, multiple samples are taken at roughly the same time stamp.)
Figure 5. Data yield percentage from 174 wireless nodes (696 sensors) in a production data center. It shows the minimum, 5th percentile, median, 95th percentile, and maximum hourly yield from all sensors.
Figure 6 further shows the data collection latency, defined as the time differences between when the data is sampled and when they are entered into the DC Genome central database. When using three wireless channels concurrently, over 90 percent of sensor data is collected before the 30 second deadline.
Figure 6. Data collection latency distribution of 10,000 data samples using three wireless channels.
These unprecedented results show that a wireless sensor network can be used to reliably collect environmental data in data centers with low hardware cost and easy installation and maintenance.
We have deployed thousands of Genomotes in multiple production data centers. In this section, we present some results that provide new insights to data center operations and workload management.
Figure 7 presents heat maps generated from 24 sensors in the front and back of a row. In the cold aisle, the temperature difference between the hottest and coldest spots is as much as 10°C. It is evident that the racks’ mid sections, rather than their bottoms, are the coolest areas, even though cool air blows up from the floor. This counter-intuitive heat distribution is observed in almost all data centers and is driven by Bernoulli’s principle. This principle states that an increase in fluid (e.g. air flow) speed decreases its pressure. Fast moving cold air near the floor creates low pressure pockets which draw warm air from the back of the rack. The high temperature at the top right corner is due to uneven air flow which prevents cool air from reaching that area. As a consequence, hot air from the back of the rack flows to the front.
Figure 7. Heat map of a cold aisle and a hot aisle, generated from sensor data.
Heat maps like these can be useful in many ways. For example, if cool air can reach the top right corner by slightly increasing the CRAC’s fan speed, then the overall temperature of the supplied air can be increased. Moreover, these measurements can guide the CRAC control system. Instead of using the temperature at the CRAC’s return air point to control the amount of cooling, we can regulate the chill water valve opening based on the maximum air intake from all active servers. However, designing optimal control laws remains a significant challenge, as changes at the single cooling supply point can affect different data center locations disproportionally.
Thermal runaway is a critical operation parameter, which refers to the temperature changes when a data center loses cool air supply. Predicting thermal runaway transients through simulations is difficult because their accuracy depends on the thermal properties of IT equipment, which are difficult to obtain. On the other hand, RACNet collected actual thermal runaway data during an instance when a CRAC was temporarily shut down for maintenance.
Figure 8 plots the temperature evolution at various locations across a row of ten racks during the maintenance interval. The CRAC was turned off for 12 minutes. The midsections — normally the coolest regions — experienced rapid temperature increases when the CRAC stopped. In contrast, temperature changed moderately at the two ends of the row, especially at the top and bottom of the rack. This is because those racks have better access to room air, which serves as a cooling reserve. This is an important finding because large temperature changes in a short period of time can be fatal to hard drives. For example, 20°C/hr is the maximum safe rate of temperature change for the Seagate SAS 300GB 15K RPM hard drive, according to its specifications. Notice that, in the middle of rack 7, the rate of temperature change is almost 40°C/hr in the first 15 minutes of CRAC shutdown. This implies that storage intensive servers need to be placed carefully if the data center has a high risk of losing CRAC power.
Figure 8. Intake air temperature from a row of ten racks, labeled from 1 to 10, during a thermal runaway event. Each rack has three sensors at the top, middle, and bottom, respectively. Temperature changes depend on locations. (Click on the picture for a larger image)
Thermal stability challenges in dynamic server provisioning
Dynamic server provisioning and virtualization can effectively adjust the number of active servers based on server work load, reducing the total energy consumption during periods of low utilization. Given that many servers are functionally equivalent, which servers should be shut down to minimize energy consumption? Moreover, can turning off servers result in uneven heat generation and cause thermal instability?
We answer these questions by performing sensitivity analysis over the collected measurements. Assume — as is true for many commodity servers — that a server’s fan speed is constant over time, independent of the server’s workload. Then, the difference, ∆T, between the exhaust temperature and the intake temperature is proportional to the amount of heat a server generates. Ideally, a CRAC responds to the generated heat and provides cold air to the rack intake. So, the greater ∆T is, the lower the in-take air temperature should be. Figure 9 presents the results from one such example. The figures show scatter plots between ∆T and intake air temperatures at middle sections of racks 1 and 7, as well as linear trend lines. The corresponding R² metrics show how well the linear regressions are. We observe that the CRAC responds to the temperature changes at rack 7 much better than those at rack 1. In fact, an increase of ∆T at rack 1 is uncorrelated with its in-take air temperature. Such CRAC sensitivity variations create additional challenges for dynamically shutting down servers. Consider a scenario in which locations A and B rely on the same CRAC to provide cool air. However, the CRAC is extremely sensitive to servers at location A, while not sensitive to servers at locations B. Consider now that we migrate load from servers at location A to servers at location B and shut down the servers at A, making ∆TA = 0. The CRAC then believes that there is not much heat generated in its effective zone and thus increases the temperature of the cooling air. However, because the CRAC is not sensitive to ∆TB at location B, the active servers despite having extra workload have insufficient cool air supply. Servers at B are then at risk of generating thermal alarms and shutting down.
Figure 9. Sensitivity analysis across the middle sections of racks 1 and 7. The CRAC provides cooler air when rack 7 generates more heat, compared to rack 1. (Click on the picture for a larger image)
The RACNets presented in this paper is among the first attempts to provide fine-grained and real-time visibility into data center cooling behaviors. Such visibility is becoming increasingly important as cooling accounts for a significant portion of total data center energy consumptions.
This practical application challenges existing sensor network technologies in terms of reliability and scalability. The rDCP protocol tackles these challenges with three key technologies: channel diversity, bi-directional collection trees, and coordinated data downloading. This is the first reported result of maintaining higher than 99 percent data yield in production sensor networks.
Collecting cooling data is a first step toward understanding the energy usage patterns of data centers. To reduce the total data center energy consumption without sacrificing user performance or device life, we need an understanding of key operation and performance parameters — power consumption, device utilizations, network traffic, application behaviors, and so forth — that is both holistic and fine-grained. With such knowledge, we will be able to close the loop between physical resources and application performance.
DC Genome is a multi-year collaboration between Microsoft Research and Microsoft Global Foundation Services. The authors would like to thank Darrell Amundson, Martin Anthony Cruz, Sean James, Liqian Luo, Buddha Manandhar, Suman Nath, Bodhi Priyantha, Rasyamond Raihan, Kelly Roark, Michael Sosebee, and Qiang Wang, for their direct and indirect contributions to the project.
2.4 GHz IEEE 802.15.4 / ZigBee-ready RF Transceiver, Texas Instruments
“An analysis of a large scale habitat monitoring application,” Robert Szewczyk, 2nd ACM Conference on Embedded Networked Sensor Systems (SenSys 2004), Baltimore, Md., November, 2004
“Energy-aware server provisioning and load dispatching for connection-intensive internet services,” Gong Chen et. al, 5th USENIX Symposium on Networked Systems Design & Implementation (NSDI 2008), San Francisco, Calif., April 2008
EPA Report on Server and Data Center Energy Efficiency, U.S. Environmental Protection Agency, ENERGY STAR Program, 2007
“Fidelity and yield in a volcano monitoring sensor network,” Geoff Werner-Allen et. al, 7th Symposium on Operating Systems Design and Implementation (OSDI ‘06), Seattle, Wash., November 2006
“FireWxNet: a multi-tiered portable wireless system for monitoring weather conditions in wildland fire environments,” Carl Hartung et. al, 4th International Conference on Mobile Systems, Applications, and Services (MobiSys 2006), Uppsala, Sweden, June 2006
“The green grid data center power efficiency metrics: PUE and DCiE,” The Green Grid
“In the data center, power and cooling costs more than the it equipment it supports,” Christian L. Belady, ElectronicsCooling, February 2007
“RACNet: Reliable ACquisition Network for high-fidelity data center sensing,” Chieh-Jan Liang et. al, Microsoft Research Technical Report (MSR-TR-2008-145), 2008
“Smart cooling of data centers,” C. D. Patel et. al, Proceedings of International Electronic Packaging Technical Conference and Exhibition (Maui, Hawaii, June 2003)
Project Sun Spot
“Wireless sensor networks for structural health monitoring,” S. Kim et. al, 4th ACM Conference on Embedded Networked Sensor Systems (SenSys 2006), Boulder, Colo., November 2006
About the Authors
Dr. Jie Liu is a senior researcher in the Networked Embedded Computing Group of Microsoft Research, whose work focuses on understanding and managing the physical properties of computing. His contributions have generally been in modeling frameworks, simulation technologies, program/protocol design, resource control, and novel applications of these systems. He has published extensively in these areas and filed a number of patents. His recent work has been in large-scale networked embedded systems such as sensor networks to large-scale networked computing infrastructures such as data centers.
Feng Zhao (http://research.microsoft.com/~zhao) is a principal researcher at Microsoft Research, where he manages the Networked Embedded Computing Group. He received a Ph.D. in electrical engineering and computer science from MIT. Feng was a principal scientist at Xerox PARC and has taught at Ohio State and Stanford. He serves as the founding editor-in-chief of ACM Transactions on Sensor Networks, and has written and contributed to over 100 technical papers and books, including a recent book, Wireless Sensor Networks: An information processing approach, with Leo Guibas (Morgan Kaufmann). He has received a number of awards, and his work has been featured in news media such as BBC World News, BusinessWeek, and Technology Review.
Jeff O’Reilly is a senior program manager in Microsoft’s Data Center Solutions Research and Engineering group. He has worked in the data center industry for the past 10 years in management and operations. Jeff’s recent focus is on implementation and integration of SCADA systems for the purposes of optimizing data center operations.
Amaya Souarez is a group manager within the Datacenter Solutions Group of Microsoft Global Foundation Services, leading the DC Automation & Tools team.
Michael Manos is a seasoned information systems management executive with over 15 years of industry experience and technical certifications. In his current role, Michael is responsible for the world-wide operations and construction efforts of all Internet and enterprise data centers for Microsoft Corporation. In addition to his responsibility for the ongoing administrative and technical support of servers, network equipment, and data center equipment residing within these facilities, his role includes the design, construction, and facility-related technology research as it relates to data center architecture.
Chieh-Jan Mike Liang is a computer science Ph.D. student at Johns Hopkins University. He is a member of the HiNRG group. His current research focuses on large-scale sensor networks—specifically, problems related to efficient network architecture and robust sensor network platform.
Andreas Terzis is an assistant professor in the Department of Computer Science at Johns Hopkins University, where he heads the Hopkins InterNetworking Research (HiNRG) Group. His research interests are in the broad area of wireless sensor networks, including protocol design, system support, and data management. Andreas is a recipient of the NSF CAREER award.