ARM Architecture Support

Other versions of this page are also available for the following:

Windows Mobile Not SupportedWindows Embedded CE Supported

8/28/2008

This topic covers support for ARM architectures in Windows Embedded CE 6.0.

Cache

On ARMv4 and ARMv5 processors, cache is organized as a virtual-indexed, virtual-tagged (VIVT) cache in which both the index and the tag are based on the virtual address. The main advantage of this method is that cache lookups are faster because the translation look-aside buffer (TLB) is not involved in matching cache lines for a virtual address. However, this caching method does require more frequent cache flushing because of cache aliasing, in which the same physical address can be mapped to multiple virtual addresses.

Note

Throughout this topic, the term flush is used for writing back and invalidating cache lines.

On ARMv6 and ARMv7 processors, cache is organized as a virtual-indexed, physical-tagged (VIPT) cache. The cache line index is derived from the virtual address. However, the tag is specified by using the physical address. The main advantage is that cache aliasing is not an issue because every physical address has a unique tag in the cache. However, a cache entry cannot be determined to be valid until the TLB has translated the virtual address to a physical address that matches the tag. Generally, the TLB lookup cost offsets the performance gain achieved by avoiding cache aliasing.

CE 6.0 supports both VIVT and VIPT caching by automatically detecting the architecture ID and using that information to control cache flushing.

Cache flushes are categorized as one of the following:

  • TLB flush
  • Instruction cache (I-cache) flush
  • Data cache (D-cache) flush

Cache flushing is generally done in the following ways:

  • User-initiated cache flush, by using the CacheRangeFlush function
  • Turning the device off and back on
  • Page acquisition or release, by using internal OEM functions to get a page and then free it
  • Process deletion
  • Uncaching a page
  • API call return to a server other than the current server
  • Thread switching to a process other than the current active process

For ARMv6 and ARMv7 processors, cache flushing in thread switching to a process other than the current active process is limited to the following instances:

  • The hardware does not support VIPT I-cache: In ARMv6 and ARMv7, it is optional for I-cache to be VIPT. Data cache is VIPT or physically-indexed and physically-tagged (PIPT) in MPCore systems. If the hardware does not support VIPT I-cache, the OS flushes the I-cache.
  • The system is out of address-space identifiers (ASIDs) for each virtual address: In this case, the OS flushes the whole TLB.

This means that, whereas on ARMv4 and ARMv5 processors the whole cache, I-cache, D-cache, and TLB, is flushed on every thread switch to a different process. On ARMv6 and ARMv7 processors, the D-cache is never flushed on thread switch. The I-cache is flushed only if the processor does not support VIPT cache. The TLB is flushed only if all 255 supported ASIDs have been used. This reduction of cache flushes should improve overall system performance.

In addition, moving to VIPT has performance advantages for the following OS features in CE 6.0:

  • Memory-mapped files: On an ARMv4 or ARMv5 system, all read/write views are marked as uncached to prevent aliasing. Marking the views as uncached affects overall system performance. However, in VIVT, you must prevent aliasing. On ARMv6 and ARMv7 systems, views are marked as cached.
  • VirtualAllocCopyEx: In CE 6.0, if a kernel mode driver creates an explicit alias in which two virtual addresses map to the same physical address by using VirtualAllocCopyEx, the OS marks both the source and destination addresses as uncached to avoid cache aliasing on ARMv4 and ARMv5 systems. On ARMv6 and ARMv7 systems, source and destination addresses are marked as cached. Even though this function can be called only from kernel mode, this affects both kernel-mode and user-mode drivers. Device Manager copies the data only for user-mode drivers.

L2 Cache

In ARMv7 processors, L2 cache is considered part of the architecture. For earlier versions, it was optional. In Windows Embedded CE, support for L2 cache is limited to enabling L2 cache write-back and discard on L1 cache write-back and discard. Hardware support for L2 cache is dictated by the OEM-specific code. Enabling L2 cache support in the OEM adaptation layer (OAL) resembles other cache operations in the OEMCacheRangeFlush function implementation. Typically, if a device supports L2 cache, OAL code is updated to honor L2 cache flush flags passed to OEMCacheRangeFlush and discard or write back L2 cache accordingly.

Virtual Memory

All virtual addresses are translated to physical addresses using page table entries. CE 6.0 introduced the following changes for ARMv6 and ARMv7 page table entries:

  • Non-global (NG) bit: This is marked on the page table entry for a non-kernel mode address. In Windows Embedded CE, this is for any address that has a value less than the shared heap address (0x70000000).
  • Separate page directories for global and non-global addresses: ARM v6 and ARMv7 has different page directory entries for global and non-global addresses. On ARMv6 and ARMv7 processors, this reduces the number of predefined pages needed to store non-global mappings to two pages, instead of four.
  • Execute-never (XN) bit: This is set for page table entries referring to non-execute pages. In CE 6.0, this is set only for the function trap area, the range starting with 0xf0000000. Any attempt to execute code from these pages causes an error that the OS uses to trap and route the function call appropriately. Windows Embedded CE only uses this bit to trap addresses. Potentially, in future releases of Windows Embedded CE, this bit could be used to prevent code execution from stack, heap, and any other non-code pages.

ASID

In ARMv6 and ARMv7, a unique ASID is used to tag TLB. CE 6.0 supports a maximum of 255 ASIDs. Therefore, it does not have to flush the TLB on thread context switching unless all 255 ASIDs have been used.

The rules for invalidating TLB entries are affected in the following ways:

  • If the ASID of the current process does not match the ASID of the TLB entry to be invalidated, the OS invalidates the whole cache. This is because the TLB is tagged with an ASID.
  • This condition applies only to user-mode addresses on which the NG bit is set, because ASID is used to tag only non-global TLB entries. For global addresses, invalidating the TLB invalidates only the specified address range.

Unaligned Data Access

In CE 6.0, access to unaligned data is currently supported only on ARM v6 and ARMv7 processors. Individual applications can enable or disable unaligned access. If the underlying design does not support unaligned access, an error is returned. The IOCTL_KLIB_UNALIGNENABLE I/O control enables and disables unaligned access.

See Also

Concepts

Kernel Overview
Memory Architecture