Thermal management

This topic covers recommendations for thermal management for devices that run Windows 10. Heat management plays an important role in delivering safe systems with good performance even with a high energy workload. Machines are becoming more mobile and compact, making hardware design for thermal dissipation more challenging. At the same time, users’ expectations for performance and capabilities continue to grow, increasing the heat generation of system components. The impact of good thermal design now is more critical than ever.

Form factors such as tablets and clamshells, generally have the following problematic thermal characteristics:

  • Small, compact space with little room for air flow
  • Fanless or small fans
  • Large battery and display (relative to overall form factor)
  • Drive as much performance as possible from the hardware

Because of these characteristics, a centralized unit is generally required on devices to manage thermal constraints throughout the platform. Systems with or without fans may require reducing performance or even functionality of various pieces of the platform to reduce the temperature in over-heated locations while also preserving the user’s experience (for example, watching a video). Like power management, heat management must be done on a platform-wide basis, orchestrating local device thermal constraints within the scope of global heat conditions.

The main contributors to thermal spikes in the platform are:

  • CPU and GPU (for example, the SoC itself)
  • Battery charging
  • Display backlight

What is a "good" thermal solution?

What does it mean to design a system with a good thermal solution? This overarching question is vague and ambiguous and needs to be broken down into many more specific questions. What does it mean to “perform at their best?” Is it realistic for all components to deliver 100% at all times? For short periods of time? How short? How hot is too hot? Is "hot" different for each device?

This section only covers the overview and basics of thermal management. For more information, refer to Thermal management in Windows.

Thermal management guidelines

  • Safety – Regardless of how the customer uses the device, it should never become too hot to handle.
  • Operating range – The system operates over the normal operating range of ambient temperatures.
  • Full-performance experiences – The performance of core Windows experiences is not compromised under normal operating conditions.
  • Quiet while idle – When the machine is in a low power state such as Connected Standby, users should not hear or see the fan turn on.
  • Diagnosability – We recommend that any cooling mitigations, active or passive, to be initiated through OS provided mechanisms so that issues can be identified in the field and reported back via telemetry.

The following sections cover a more focused direction to tackling the thermal issue that is driven by user experience.

Thermal User Experience

From a user’s perspective, thermal management should be invisible. Slates and other SoC-based systems should have the hardware and software energy efficiency necessary to stay cool under normal use. The system should stay on and fully functioning without any thermal mitigations for as long as possible. This goal depends on good hardware design. Even when the system is undergoing thermal mitigations, users should not experience any disruptions as long as the machine is still usable. Only when the machine can no longer operate should users be notified of the thermal situation.

Thermal Scenarios

You can group thermal challenges in three scenarios:

  1. The system is on, but producing too much heat to be sustainable.

    The system is fully functioning when it detects it is getting warm. Thermal mitigations, i.e. passive cooling and active cooling, must begin to reduce the heat produced to a sustainable rate. If thermal mitigations are not successful, #3 occurs.

  2. The system is on, but unusable due to thermal constraints.

    The user notices this in the following ways:

    1. The system display is on but does not respond to the user’s inputs.
    2. The machine is too hot to continue running and must initiate shutdown or hibernate now.
    3. The machine is much too hot and firmware cuts the power to the system.
  3. The system is off, and cannot boot due to thermal constraints.

    When a user tries to turn on a machine in an environment that is too hot or too cold, the machine fails to boot.

Thermal mitigations experience

While the system is on and in use, the user should not notice any thermal mitigations. In other words, the user would never know when their machine is or is not thermally mitigated. The goal of thermal mitigations is to allow the customer to use the machine for as much and as long as possible without disruption.

IHV/OEM partners have a great deal of control in their design of the system to reduce the need for thermal mitigation by designing hardware that is effective at dissipating heat. For more information, refer to Thermal management in Windows.

Critical Thermal "shutdown" experience

In extreme conditions, where the ambient temperature is much higher above the targeted temperature range, heavy workloads might cause the system temperature to continue rising even with the throttling mechanisms engaged. When the critical shutdown threshold is met, the system immediately starts the critical thermal action. By default, systems will hibernate if enabled and shutdown if hibernate is not available. This thermal shutdown path is the shortest shutdown path available and usually completes within 1 second. No notification is given to apps or services, so apps do not have the chance to autosave or close.

Furthermore, each system must have knowledge of its hardware failsafe temperature, which is determined in ACPI by the IHV. This value is very important because the firmware must be able to detect thermal constraints before the OS is loaded and prevent the system from causing damage to itself and the user. If at any time while the system is on, the system reaches its hardware failsafe trip point, it immediately cuts off the power and turns off. The failsafe trip point is usually slightly higher than the critical shutdown trip point. This way, the system can hibernate or shutdown before the hardware failsafe is reached. It is possible to reach the failsafe trip point before the system completes a critical hibernate or shutdown. However, if you choose a good critical threshold, this should seldom occur.

Thermal boot experience

Systems have thermal constraints during boot as well. In firmware, each system must check the temperature before boot to ensure that boot can complete successfully. Two major concerns:

  • Ambient temperature is too hot. Boot is often a resource intensive action and produces a lot of heat. If too much heat is produced during boot, the system may hit the hardware failsafe temperature and kill power to itself.
  • Ambient temperature is too cold. Battery may not be able supply enough resources to the system.

Systems designers should model and predict the thermal characteristics of boot, in particular temperature increases.

Firmware must check that there is enough headroom for the system to boot without crossing the hardware failsafe trip point.

For example, if the system temperature consistently rises 5°C during boot and the failsafe temperature is 40°C, the system should be (at the very most) 35°C at boot. This is necessary in firmware since the Windows Thermal framework will not have been loaded at boot to mitigate the issue.

In the case where a machine is too hot or cold to boot, the system should provide user feedback to give the user an opportunity to alleviate the issue and re-attempt boot when appropriate.

Thermal management requirements

Hardware

The following are good thermal hardware design practices.

  • Meet applicable industry standards (for example, IEC 62368) around consumer electronics safety.
  • Hardware must have failsafe temperature trip point that shuts down the system or prevents boot.
  • Sensor hardware must be accurate to +/- 2° C.
  • Sensor hardware must not require software polling to determine if a threshold temperature is exceeded.
  • While operating, the system display brightness is never thermally limited to less than 100 nits.

HLK test requirements for InstantGo systems

All InstantGo systems have requirements regarding thermal regardless of processor architecture and form factor. These requirements are tested in the Windows Hardware Lab Kit (HLK).

  1. All InstantGo systems must have at least one thermal zone.
  2. Each thermal zone must report actual temperature on the sensor.
  3. At least one thermal zone must have critical shutdown temperature defined. An exception is made for Intel® Dynamic Platform and Thermal Framework.
  4. All InstantGo systems with fans must expose its activity to the OS. The fan needs to notify the OS of its activity at all times, including during idle resiliency in InstantGo. Currently, this notification causes no action from the OS. The main purposes of this are diagnosability and telemetry. The fan notification can be integrated with existing tracing tools, including Windows Performance Analyzer. System designers can use these tools to tune the platform design.
  5. All InstantGo systems with fans must keep the fan off while in “Sleep” or Standby (S0 Low Power Idle) state. The HLK test here runs a realistic InstantGo workload that should not trigger the fan turning on. If the thermal solution had this requirement in mind, it should have no problem passing this test.

For implementation details regarding these requirements, refer to Thermal management in Windows.

Thermal management solutions

The Windows Thermal Framework based on ACPI is the recommended thermal management solution for all systems. The major benefits include the ability to easily diagnose thermal related issues with inbox tools and valuable telemetry gathered in the wild. For details on Windows Thermal Framework, refer to Thermal management in Windows.

However, alternative solutions to the Windows Thermal Framework are acceptable if the above requirements are met. In addition, each core silicon or SoC partner may have their own proprietary thermal solutions compatible with and supported by Windows (for example, Intel® Dynamic Platform and Thermal Framework).

Battery class

Windows offers battery support through two driver options: a native miniclass driver and an ACPI-based Control method battery driver.

Control method battery

The Windows Control Method Battery miniclass driver (CMBATT.SYS) uses the Thermal Cooling Interface. This necessitates a new ACPI control method to set the thermal limit for charging. This method is implemented as a Device Specific Method (_DSM), as defined in the ACPI 5.0 specification, Section 9.14.1.

Charging thermal limit device specific method

UUID Revision Function Description
4c2067e3-887d-475c-9720-4af1d3ed602e 0 1 Set Battery Charge Throttle

 

  • Arguments
    Arg0: UUID: 4c2067e3-887d-475c-9720-4af1d3ed602e

    Arg1: Revision: 0

    Arg2: Function: 1

    Arg3: Thermal Limit (Integer value from 0 to 100)

  • Return
    No return value

The firmware is responsible for taking the current thermal limit into account when engaging charging.

Display class

Windows' built-in display driver (MONITOR.SYS) uses the Thermal Cooling Interface. It maps the thermal limit directly to display brightness, which is limited to the Thermal limit percentage, unless the operating system-requested brightness is already below this level.

Processor class

Windows' Processor Power Management infrastructure supports the Thermal Cooling Interface. Processor drivers are notified via this interface (instead of a direct call) when a thermal limit must be applied. Processor drivers will also report the new thermal limit back to the Kernel so future Performance state requests are based on the new limit.

Graphics class

Display miniport drivers use the Thermal Cooling Interface directly as described. If the PEP handles passive cooling for the Graphics stack, it must filter the stack and respond to the Interface Query on behalf of the miniport.

Thermal management in Windows

Windows Hardware Lab Kit

 

 

Send comments about this topic to Microsoft