A Roadmap for DirectX

Direct3D has always been the most performant APIs for rendering graphics on the Windows platform. The release of Windows Vista® pushes these technologies to the next level. Direct3D is now integrated directly into Vista which uses the Direct3D graphics pipeline for desktop composition, the Picture Viewer and the Windows Presentation Foundation.

This article summarizes some of the new Direct3D Vista features with an eye toward explaining how they can benefit a DirectX developer. Each section also indicates where additional information about the feature can be found.


Windows Vista introduces two new flavors of DirectX: Direct3D 9Ex and Direct3D 10.

Direct3D 9Ex is an extended version of DirectX designed for applications that use Direct3D 9, but also want to use some of the new Vista driver features.

Direct3D 10 is a new DirectX technology that has been completely rebuilt for Vista. It provides the most advanced graphics rendering features available.

Additionally, Direct3D 9 and legacy DirectX API's still work on Vista allowing backward compatibility for legacy code.

The Windows Vista Display Driver Model (WDDM)

Both Direct3D 9Ex and Direct3D 10 are built on the new Windows Display Driver Model (WDDM). Graphics hardware must implement the WDDM in order to take advantage of Direct3D on Vista.

WDDM provides increased stability, virtualized memory allocation, security and the ability to multitask with the graphics processor (GPU).

Increased stability is accomplished in WDDM by splitting drivers into two components; a streamlined kernel mode driver, and a user mode driver. By taking most of the driver code out of the kernel, WDDM allows the system to recover from fatal crashes and hangs without taking down the entire operating system.

WDDM now allows for "virtualized" video memory. Virtualization abstracts video memory so that it is no longer necessary to think about creating a resource in either video or system memory. Just specify what the resource is going to be used for and the system will place the memory in the best place possible. Additionally, virtualization allows for the allocation of more memory than actually exists on the hardware. Memory is then paged into the correct hardware as needed.

Memory allocations are limited to the application address space primarily for security reasons. WDDM provides increased security by isolating applications and their resources from each other.

The previous driver model (XPDM), was designed for a single application that made heavy use of the graphics hardware. This was done for performance reasons, but made it difficult to share the GPU. Most users experienced situations where alt tabbing out of a game would force a lost device state in the application. When focus returned to the game, it would have to re-create all resources and reset the device which in the best case took time and in the worst case caused crashing bugs.

WDDM now provides a GPU memory manager and scheduler that allow multiple applications access to the GPU simultaneously. Because Direct3D applications no longer require exclusive access to the GPU it is possible to switch focus between applications with little penalty. Under WDDM Direct3D devices are only lost during driver upgrades, physical removal of the device, GPU reset and unexpected errors.

WDDM mimics the behavior of earlier driver models when running legacy versions of DirectX.

For more about WDDM, see http://msdn.microsoft.com/windowsvista/default.aspx?pull=/library/en-us/dnlong/html/WinVistaDisplayDriverModel.asp

Direct3D 10

The Direct3D 10 Layered Runtime

Direct3D 10 is built in four layers: one thin core layer and three optional layers that add functionality.

Layer Name



Provides all the basic functionality and is the only layer that is required to access Direct3D 10.


Adds parameter and consistency validation and a great deal of debug output.


Provides methods that allow an application to retrieve shader or effect information from the device.


Allows multi-threaded applications to access the device. This layer is enabled by default, but does not have a performance impact on single threaded applications.

Different combinations of layers can be specified when the device is created.

See "API Layers" in the Direct3D 10 documentation for more detail.

The Direct3D 10 Graphics Pipeline

Versions of Direct3D previous to Direct3D 8 provided a handful of predefined algorithms called the Fixed-Function graphics pipeline. Memory resources like geometric models and textures were run through these algorithms to generate the final displayed image. There were many different ways to configure these algorithms to generate different effects, but a developer could not handcraft the pipeline to create new graphic effects.

Direct3D 8 introduced the programmable pipeline model of graphics programming which breaks the pipeline up into a number of stages each of which is responsible for a particular type of resource processing. Direct3D 10 further generalizes and increases the capacity of the programmable pipeline. Direct3D 10 has seven stages three of which are programmable. Programmable pipeline stages allow a developer to create the algorithm that the stage uses by writing a shader.

The following table lists the pipeline stages:

Stage Name


Input Assembler Stage

NEW: Supplies geometric data (triangles, lines, points) to the pipeline.

Vertex Shader Stage

Programmable stage that process individual vertices of geometric data. Moving and lighting the vertices are common operations.

Geometry Shader Stage

NEW: Programmable stage that processes primitive geometry (entire triangles lines or points). This stage can create and output new geometry based on the input primitive. Shaders in this stage can be used to create entire shapes programmatically.

Stream Output Stage

NEW: Writes vertex data generated from the stages that come before it into memory. This can be used to feed the vertices back into the pipeline for another pass or to allow them to be copied for processing by the CPU.

Rasterizer Stage

Sets up the geometry so that it can be mapped to the 2D view that is finally displayed.

Pixel Shader Stage

Programmable stage that calculates a pixel value for each rendered primitive. Shaders in this stage can be used to generate advanced per-pixel lighting or reflection effects.

Output Merger Stage

Combines the various types of output data with depth/stencil buffers and writes the result to the render target.

See the "Pipeline Stages" section in the Direct3D 10 documentation for more information.

Direct3D 10 Shaders

A Shader is a set of commands that accept graphics resources as input, run a series of instructions on the resources and outputs the result. Direct3D 10 provides three different kinds of Shaders that correspond with the three programmable pipeline stages: Vertex Shaders, Geometry Shaders and Pixel Shaders. All of the shaders are based off a common shader core and provide the same base functionality. In addition to the base functionality, each type of shader offers some unique functionality specific to its particular stage. Shaders are written using the High Level Shading Language (HLSL) which allows the programming of shaders at an algorithm level. There is no limit to the number of instructions in a shader.

Shader information can now be retrieved through reflection using the ID3D10ShaderReflection interfaces if the Shader-Reflection layer is included in device creation.

See the "Shader Features" section of the Direct3D 10 documentation for more information.

Geometry Shaders

New to DirectX, Geometry Shaders can create geometric objects for the 3D scene. This can be used to generate a wide variety of effects. Shadow Volumes, Fur, Fins, procedural geometry, detailing, GPU particle systems, displacement effects and programmatic vegetation (see the "Pipes GS Sample" in the Direct3D 10 documentation) are just a few examples of what can be accomplished with this powerful new shader type. Because the shader runs on the GPU, CPU resources are not consumed, freeing it up for other tasks.

See "Tutorial 13: Geometry Shader" section of the Direct3D 10 documentation for an example of geometry shaders in action.

The Stream Output Stage

New to DirectX, the Stream Output stage writes vertex data out of the graphics pipeline into a buffer. This allows vertex data generated by the Vertex Shader stage or the Geometry Shader stage (depending on how the pipeline is configured) to be read back into the pipeline for another pass. This can be accomplished through the DrawAuto command without engaging the CPU. Alternately, the data can be copied into a resource for further processing by the CPU.

See the "Stream-Output Stage" section of the Direct3D 10 documentation for more information.

cbuffers and tbuffers

HLSL provides two types of buffers for storing constant variables; cbuffers and tbuffers. A cbuffer acts like a C struct in Effect Files.

Here is an example of a Constant Buffer that declares a world matrix, an object position and an array index:

cbuffer MyObject
   float4x4 matWorld;
   float3 vObjectPosition;
   int arrayIndex;

In Direct3D 10 all shader constants must be placed in Constant Buffers. You can place all Constants in one buffer or many buffers, organizing them in any way. Often it is performant to organize constants into buffers based on how frequently they are updated and used. By organizing the constants by frequency of update, you can reduce the number of calls and updates necessary, which maximizes the efficiency of CPU to GPU bandwidth. Latency when accessing these buffers from a shader is reduced.

The tbuffer in Direct3D 10 are buffers of shader constants that read from texture loads. This gives them better performance for arbitrarily indexed data.

See "Shader Constant Variables" in the Direct3D 10 documentation for more information.

Device State

Direct3D 10 organizes device state into five State objects.

State Object


Input Layout

Sets the format and size of the geometric data in the input buffer that streams primitives from memory into the pipeline.


Sets the state of the Rasterizer Stage including the fill or cull modes, enabling scissor rectangles for clipping and mapping geometric primitives in 3D space to the 2D viewport.


Configures the depth buffer and sets up stencil testing.


Sets how the Output Merger stage blends Pixel Shader output with the current render target.


Sets a texture sampler used by shaders to filter textures in memory.

See the "State Objects" section of the documentation in the Direct3D 10 documentation for more information. "Tutorial 14: State Management" demonstrates some of the interesting things that you can do with state objects.

Putting It Together: Effects

The Effects system for Direct3D 10 has been completely re-written. The FX10 system is much faster than FX9 and memory usage is lower. FX10 specifically targets Direct3D 10 hardware.

Effects allow developers to create one file that contains all the shaders and device states that are needed for a specific graphics approach. This could include displacement mapping, reflection rendering, geometry creation or literally the result of any combination of shaders and device state. Effects are useful because they encapsulate a specific graphics technique which allows a developer to use the same Effect file in content creation applications, different games and other applications to achieve the same result.

Effects can be compiled into byte code for improved runtime performance. Otherwise they will be compiled when they are loaded into memory.

Many of the samples in the Direct3D 10 documentation use FX files. See "Tutorial 3: Shaders and Effect System" for a comprehensive discussion about using effect files.


Direct3D 10 has redesigned resource architecture. Resources are an area in memory that can be accessed by the graphics pipeline. Resources typically contain things like textures, input geometry, shader resources, vertex buffers, index buffers and constant buffers.

For each resource that is created, the developer can balance the amount of functionality that the resource requires with the performance of the resource. In general, accessing Direct3D 10 resources have a much lower overhead than in previous versions of DirectX.

Direct3D 10 introduces Texture Arrays to simplify texture management. A texture array contains one or more textures that can be indexed from within the application or by shaders. A whole array can be also be set as a render target.

Resources can now be stored typeless. When using a typeless resource it must be interpreted as a specific type by obtaining a view. A view allows the data in a resource to be reinterpreted in a different format. This changes its presentation, but not the actual underlying data. This approach increases performance because resources do not have to be type-validated on each draw call.

Views can enable different interpretations of the resources at different bind locations (in the graphics pipeline). For instance, a shader resource view can interpret a Texture2DArray with six textures as a cube map.

See the "Resources" section of the Direct3D 10 documentation for a comprehensive discussion of resources.

Direct3D 10 and Caps

Previous versions of DirectX use capability flags or CAPS to determine which features were supported by the hardware. With Direct3D 10, all functionality is guaranteed to perform on Direct3D 10 compatible video hardware, eliminating the need to query the device for CAPS.

Previous applications that depend on device caps can work around the fact that they no longer exist by overriding device enumeration and forcing device CAPS to sensible values.

Direct3D 9Ex

Direct3D 9Ex includes a number of improvements that allow it to take advantage of some WDDM features.

One of the biggest new advantages in Direct3D 9Ex is its ability to take advantage of WDDM's GPU memory manager and scheduler to reduce the number of situations in which the device is lost. Because many applications can share the GPU under WDDM, fewer situations now cause a device lost state in Direct3D 9Ex. The device is only lost when the hardware is reset due to a hang, or when the device driver stops. Mode changes no longer cause all video memory resources and swap chains to be discarded.

Direct3D resource sharing between devices or processes is also now allowed through Direct3D 9Ex.

For more information about Direct3D 9Ex, see the "DirectX for Windows Vista" section of the Microsoft Windows SDK documentation.


A sample demonstrating new features in Direct3D 9Ex can be found in the Windows Platform SDK at the following path:

../Microsoft SDKs/Windows/v6.0/Samples/multimedia/Direct3d/Direct3D9Ex

This sample creates two DirectX devices; one runs in a low priority background thread and is responsible for rendering a number of cubes, the other runs in a higher priority thread and composites the background rendering with a foreground cursor. The background rendering is copied to a shared surface to allow the foreground device to access it. This allows the curser to continue to behave fluidly when the background is overwhelmed by rendering.

Community Additions