Optimizing for Tile-based rendering for Windows Phone 8
[ This article is for Windows Phone 8 developers. If you’re developing for Windows 10, see the latest documentation. ]
Windows Phone 8 devices use GPUs that leverage a tile-based rendering approach. This rendering approach strives to improve GPU performance and reduce power consumption by leveraging an internal tile cache for memory intensive rendering operations, avoiding many of the off chip memory transactions that would otherwise be required. Conceptually, tile-based rendering involves partitioning a render target into tile cache sized pieces, i.e. tiles, and replaying the rendering commands directed at a render target multiple times, once for each tile. The driver for a tile-based GPU is smart about determining which vertices need to be rendered on a given tile, clipping triangles and line segments where necessary, using a “binning” process. However, optimizing for tiled rendering for application scenarios involving multiple or composite render targets is more challenging for the driver, primarily because it’s not necessarily apparent to the driver when an application has finished with a tile from the current render target. By default, when moving to a new render target tile, the driver will preserve tile contents, i.e. resolve the tile back to memory before loading the next tile, potentially incurring unnecessary memory overheads.
The tile cache architecture of a tile-based GPU is highly optimized to undertake in-cache graphics memory transactions very efficiently. For example, alpha-blending is effectively free on tile based GPUs by design, whereas with a traditional GPU there would be additional overheads associated with the required read-modify-write transactions with external graphics memory. Similarly, multi-sample anti-aliasing (MSAA) render targets incur a greatly reduced overhead on tile based GPUs when compared to traditional GPUs, again due to the optimized tile cache architecture.
Direct3D11 (D3D 11) includes APIs that enable applications to provide hints to the GPU driver that help the driver avoid unnecessary tile-related memory transactions. In the sections that follow, existing and new D3D 11.1 GPU hinting APIs are explained in the context of common application scenarios and tile based rendering.
In general, applications employing multiple render targets should aim to complete the processing of one render target for a given scene before moving onto the next, to avoid unnecessary tile resolves and the performance penalty involved. For example, applications employing one or more off-screen render targets that generate textures that will be sampled when rendering the backbuffer for the current frame, should complete rendering each texture before rendering to the backbuffer.
For application scenarios that involve compositing several texture layers, incremental merges should be avoided to avoid “incremental rendering”, which involves unnecessary performance robbing external memory transactions, and unnecessary D3D11 calls. For example, consider the consequences of incrementally merging four texture layers rather than deferring the merge:
This method results in 4 calls to set the render target.
This method results in 6 calls to set the render target.
In the preferred method shown above, note that the Clear is called immediately after switching to each render target. This signals to the GPU driver that “incremental rendering” for a given render target is not required, i.e. it is not necessary to read each render target tile prior to rendering. See also alternative to Clear in the next section. In the inefficient example, in addition to more calls to OMSetRenderTarget, the tiles for render target D must also be read back into memory multiple times.
For scenarios where an application knows in advance that every pixel in an existing render target resource will be replaced, i.e. will not be required for subsequent rendering operations, two new “Discard” methods have been added to D3D 11.1 that make it possible to signal this to the GPU driver. Previously the application had two alternatives in this scenario:
Draw the scene without clearing, which is expensive on a tile-based GPU because the GPU must restore each tile, i.e. read each tile into the tile cache, before drawing. Clearly if the application will not use the previous contents, these restores represent significant unnecessary overhead.
Clear before drawing, which allows the driver to avoid restoring tiles, but does require render target write operations to execute the clear command. Again, these are unnecessary if the previous contents will not be used by the application.
The new D3D 11.1 Discard methods provide applications with a more efficient third option, as shown in the table that follows:
Signals to the GPU driver that the current contents of the resource and all views are no longer defined, and do not need to be preserved on a subsequent rendering pass.
Essentially the same operation as DiscardResource but scoped to a specific render target view.
Note that once an application has called Discard on a render target resource or view, it is expected that Draw calls will follows that result in every pixel being updated. Render target contents after Discard are undefined, so an application cannot alpha blend over them or just draw partially covered shapes.
Apps that know in advance that only scoped portions of a render target will be updated for a given scene should make use of the ID3D11DeviceContext::RSSetScissorRects method, so that the GPU driver can determine in advance which tiles will not need to be processed.