MSDN Magazine > Issues and Downloads > 2001 > June >  DirectX 8.0: Enhancing Real-Time Character Anim...
From the June 2001 issue of MSDN Magazine.
MSDN Magazine

DirectX 8.0: Enhancing Real-Time Character Animation with Matrix Palette Skinning and Vertex Shaders

Benjamin Freidlin
This article assumes you�re familiar with C++, DirectX 8.0, and 3D Graphics
Level of Difficulty     1   2   3 
Download the code for this article: Matrix.exe (37KB)
Browse the code for this article at Code Center: Soft-Skinning Shader
SUMMARY DirectX 8.0 allows the creation of smooth and realistic character movements that are more life-like than simple articulated structure animations. This is made possible by its improved support for vertex tweening and blended vertex deformations, also known as soft-skinning.
      After a brief history of the use of these techniques in DirectX, soft-skinning using the fixed function pipeline is discussed. This is followed by the use of matrix palettes from within vertex shaders to create a customized soft-skinning solution that takes advantage of the benefits of vertex shaders, such as hardware acceleration and custom lighting algorithms without the limitations of fixed-function solutions.
As the world of real-time 3D graphics continues to evolve, developers move closer to achieving the types of realistic images seen in high-budget feature films. Computer games stand to gain the most from these advancements as more powerful hardware joins forces with greatly improved APIs, enabling developers to create real-time 3D imagery that's as interactive as it is visually stunning. Microsoft® DirectX® 8.0 is at the forefront of these advancements and manages to break through the chains of what was once a fixed-function programming model. The new API exposes an unprecedented level of control and flexibility that allows you to reach down to every vertex and pixel that passes through the graphics pipeline.
      Perhaps one of the greatest impacts of this new programmable pipeline will be on the development of more powerful character animation techniques. DirectX 8.0 provides more ways than ever to create smooth and realistic character motion beyond simple articulated structure animations, with improved support for vertex tweening and blended vertex deformations (hereafter referred to as soft-skinning). While the technique of tweening continues to be popular, soft-skinning offers greater dynamic control as well as increased realism and is therefore the focus of this article. I'll start with a brief history and comparison of these techniques as they relate to DirectX and then move on to the fundamental concepts behind soft-skinning. Finally, I'll show you two ways to implement soft-skinning using DirectX 8.0 support for fixed function indexed vertex blending and vertex shader matrix palettes.

Looking Back

      Prior to the release of DirectX 8.0, game developers had several ways of animating interactive 3D characters. One of the earliest methods was a technique commonly known as articulated body-part animation, whereby an application would use a hierarchical set of interconnected mesh pieces that formed the body of a character. The classic example of this technique would be a head connected to the torso, the torso to the upper arm, the upper arm to the lower arm, and so forth. The benefits of this technique were that it was simple to implement and it could be executed relatively quickly in software, an obvious prerequisite of early real-time 3D applications. Furthermore, you could obtain the various transformations that comprised a character's motion from an animation package, then use any number of interpolation techniques to generate the intermediate rotations for each piece of the body. From then on it was as simple as setting the matrix for each mesh piece before calling one of the DrawPrimitiveXxx functions. If executed carefully, an application could animate the different pieces of the body individually and create a reasonable facsimile of human motion—at least, reasonable enough for games of that era.
      One of the drawbacks to this method is the result of an unpredictable and unpleasant folding effect when limbs are rotated too far about a joint, such as when the lower leg bends completely in at the knee. In addition to this, it is difficult to achieve smooth shading across the boundaries of the mesh pieces because of the anomalous perturbation of vertex normals surrounding a joint.
      This problem forced 3D modelers to spend the bulk of their time trying to perfect the placement of vertices where a character's limbs connected at the joint, or otherwise covering up troublesome joints by modifying the character mesh to include clothing accessories such as belts and long sleeve shirts, all in an attempt to hide anomalies. This detracted from the illusion of a believable human appearance, and characters ended up looking like they were made of solid material rather than soft skin.

Enter Bugs Bunny

      In a nod to Mickey Mouse and Bugs Bunny, vertex tweening (also known as vertex key framing) borrows from the very same techniques used to bring these venerable characters to life. In fact, the 3D version of the technique doesn't stray very far from the original. An artist models and animates a character from within 3D software, such as Alias-Wavefront's Maya or Softimage's XSI, as they would normally do. At this point the artist can specify the key frames of the animation and save the vertex data that comprises them for consumption by the application. The app can generate the missing frames by interpolating the vertex positions between the key frames (hence the term tweening). An artist might create the initial, middle, and end pose for a combat sequence in the animation software and export these meshes to a file. The app then loads the meshes and, given a point-in-time, uses interpolation functions to generate the in-between positions of the vertices.
      This technique produces much smoother animations because the key frames are generated by the animation software in a way that allows the artist to eliminate the folding. Once the animation software's method of creating the different key poses has eliminated the creasing, the problem is out of the way for good and the application can generate smooth in-betweens that are guaranteed not to distort. You might be wondering how the artist's animation software was able to eliminate the creasing. The answer lies in a soft-skinning technique similar to the one I'll be covering later in the article, except that I will show you how to perform a simplified version of the technique in real time.
      The drawback to tweening is that it requires the application to hold multiple copies of the mesh in memory, one for each key frame. As the number of different character animations increase, so does the memory usage. In the past it was somewhat cumbersome to implement tweening because the in-between mesh for each frame was created dynamically, and would need to be copied over the bus from system memory to the GPU before being drawn. This had to be done in small-sized chunks in order to take advantage of the parallelism between the GPU and CPU, and thus prevent the pipeline from stalling. To be fair, however, some of these limitations can be sidestepped in DirectX 8.0 by performing vertex tweening with a programmable vertex shader.
      Perhaps more of a concern than these performance issues (which most developers have been able to work around) is the fact that tweening can place a limit on the level of interactivity and realism of an application. This is because it's not conducive to the dynamic generation of animations at runtime, since each key frame is predefined by the artist's use of the animation software well before the application ever sees it. A more flexible solution would allow the artist to set up the meshes for animation as they normally do, by attaching them to what's called a skeletal system and then letting the game perform the skeletal deformation itself. This would enable the application to create unique animation sequences based on user interaction, instead of interpolating from pre-canned poses. Besides offering increased realism, this technique can be more easily integrated with various physics simulators. The technique of skeletal animation is essentially what real-time soft-skinning is based on, and will be the focus of the rest of this article.

I Regret to Deform You

      Now that you understand the concepts behind articulated structure animation and vertex tweening, let's dive into the world of soft-skinning. Before I evaluate the past and present support in DirectX for this technique, I'll take a closer look at the fundamental principals it embodies. For this, you need to go back to the concepts of articulated structure animation and pay close attention to what is actually happening to the vertices of the mesh. Let's take the upper and lower leg as an example. The application considers two groups of faces that comprise the upper and lower leg and a transformation matrix to go with each. Each transformation is referred to as a bone because, like a real bone in a human body, it influences the shape of the outer flesh as the limb rotates about a joint. (Note that when discussing soft-skinning, you'll sometimes hear the term bone and matrix used interchangeably. For the sake of clarity, I will combine the two as one term, "bone matrix," for the remainder of the article.) Now let's say you want to position the character's left leg so that it is slightly bent at the knee, as if he were walking. If the character was originally posed with his feet parallel to his shoulders, the upper-leg transformation might rotate roughly 35 degrees, while the lower-leg transformation might rotate -20 degrees and then be concatenated with the upper leg's transformation.

Figure 1 Ow, My Knee Hurts
Figure 1 Ow, My Knee Hurts

      Let's stop right here. You know from the previous discussion that this exact process is likely to generate an unwelcome folding effect. The vertices in the lower leg that form the knee joint with the vertices of the upper leg are going to fold in on each other, and the deformation around the inner knee will be hard and unrealistic. Figure 1 shows an example of this anomaly. This is happening because the application is doing exactly what it was told to do; it was given a transformation matrix that rotated the vertices of the lower leg -20 degrees, which is good for most of the leg (say roughly 75 percent of the vertices that are furthest from the knee joint), but not for the remaining 25 percent, which are placed on or near the joint. The solution lies in allowing the vertices near the joint to be influenced not only by the lower-leg transformation, but to a certain extent by the upper-leg transformation as well. You can achieve this by assigning two bone matrices to this set of vertices along with a pair of weight values for each vertex. The weights (also referred to as beta values) will determine the degree of influence that each bone matrix will have on the vertex. With this information, the application can do a linear blend of the vertex transformed by each of the influencing bones. The following formula supports the case I've stated so far.

      In this formula V is the vertex in model space to be transformed by the matrix M0, the result of which is scaled by the blend weight b0, and summed with the result of the same process applied again using the matrix M1 and the blend weight b1. The outcome of this formula is a vertex V' in world space that's been influenced by two different bone matrices.
      Figure 2 shows the result of this formula applied to the character. You'll see that the folding has been alleviated and the bend of the leg appears more realistic. It's worth noting that even though two bone matrices are being applied to the vertices, the blend weight for the second matrix should be calculated so that the weights are normalized.

Figure 2 Ah, Much Better
Figure 2 Ah, Much Better

      In other words, b0+b1=1 for all vertex beta values; therefore, b1 doesn't have to be stored explicitly—it can be calculated as 1.0-b0. This is because unless you are employing an alternative blending formula that will provide a meaningful interpretation of the weight's influence, a set of non-normalized weights are not likely to produce the results you are seeking. The added benefit of normalized blend weights is that you'll be sending one fewer floating-point component over the bus for each vertex. The following formula is the previous formula rewritten to use normalized blending weights.


Figure 2.5

Gain Weights and Look Good

      Although you may be impressed by how the character's leg deforms more realistically, it has probably occurred to you that most humans have more than two bones or ligaments that influence a given area on a body. This should tell you that by adding additional bone matrices in the right places as well as modifying the vertices of the mesh to accept the proper influence from them, you could achieve an even more realistic deformation. Certain cases, such as the human face, can benefit from as many as 15 bones. The following is a generalized formula for cases with a variable number of bone matrices.


Figure 3

      Now is a good time to point out that although I've been likening the bone matrices that influence a mesh to the bones in a real skeleton, they aren't necessarily anatomically correct. While it is certainly possible to create a hierarchy of bone matrices that have a one-to-one relationship to the bones and ligaments in a human skeleton, it's not likely to be the fastest way to perform real-time soft-skinning. You should remember that as more bone matrices influence a vertex, more calculations are necessary, slowing your animation. Ultimately, you should rely on the goals of your application to help determine how many bone matrices are appropriate for your characters.

Blending it All Together

      Now that you understand how soft-skinning works, let's take this knowledge and see how it relates to DirectX. Prior to DirectX 8.0, there was limited support for soft-skinning. DirectX 7.x exposed several render states on the device interface from which an application could perform a linear blend of up to four bone matrices. Clearly this is a limitation because not only is the realism of the deformation constrained, so is application performance. To efficiently perform soft-skinning on the whole of a character (which presumably needs more than four bone matrices), the application would have to batch calls to the DrawPrimitiveXxx functions based on which four bone matrices influenced a given mesh piece. This is necessary in order to refrain from resetting the same bone matrix multiple times and to minimize the number of DrawPrimitiveXxx calls, thereby increasing the number of vertices drawn in each call.
      DirectX 8.0 alleviates these limitations and provides an application with two ways to perform soft-skinning. I will start by explaining how to implement the easier and somewhat more limited method of indexed vertex blending and then move on to the more powerful technique of vertex shader matrix palettes.

Skinning the Easy Way

      Indexed vertex blending might be best described as an expansion of the geometry-blending functionality found in DirectX 7.0. It's exposed through a new renderstate added to the fixed function pipeline. It works by allowing the application to preset up to 256 bone matrices into which the vertices of a mesh can index, although each vertex can reference a maximum of four of these bone matrices. When the geometry is drawn, the final blended vertex is automatically calculated by the DirectX runtime using the formula previously discussed. The indices to the bone matrices themselves are stored in the vertex as 8-bit values packed into a DWORD. This technique is an overall improvement over geometry blending because instead of each polygon being influenced by a maximum of four bone matrices, each vertex can now be influenced by as many as four bone matrices. This means that a single polygon can be influenced by as many as 12 different bone matrices, providing greater control over its final position. For those of you familiar with geometry blending in DirectX 7.0, this might seem confusing at first because even though only four bone matrices could influence a given polygon, they were still specified per vertex due to the way the DrawPrimitive functions worked. So let it be clear that this restriction has been lifted in the indexed vertex blending technology found in DirectX 8.0.
      As a bonus to transforming a vertex by a greater number of bone matrices, a greater number of the mesh's vertices can be transformed during a single call since an application can now preset more of the bone matrices required for a given mesh at once. Finally, if you provide normals with your vertices and signify this with the D3DFVF_NORMAL flag, DirectX 8.0 will automatically compute the inverse transpose of the matrices that are necessary for real-time lighting calculations.
      There are a few details to consider before implementing indexed vertex blending. First of all, remember when I said that the number of bone matrices that can be indexed into is 256? While this number is accurate when the blending is done in software, it will be much lower when it's done in hardware. To find out the exact number, your application should inspect the MaxVertexBlendMatrixIndex member of the D3D8CAPS structure. If the value is 0, then this indicates that the device does not support indexed vertex blending at all. Another consideration is how an application describes the layout of its vertex structure to DirectX. For instance, if an application will be using flexible vertex flags, it will need to define its custom vertex using one of the D3DFVF_XYZBn flags and the DWORD containing the packed indices must be present immediately following the last blend weight.
      The following is an example of a custom vertex format for indexed vertex blending.
struct BLENDEDVERTEX
{
    float x, y, z;
    
    float weight, weight2;
    DWORD matrixIndices;
    
    float normal[3];
};

#define D3DFVF_CUSTOMVERTEX (    
                    D3DFVF_XYZB3 | 
                    D3DFVF_LASTBETA_UBYTE4 |  
                    D3DFVF_NORMAL
                    );
Notice that although the code defines a vertex structure with the D3DFVF_XYZB3 flag, there are only two blend weights present in the BLENDEDVERTEX structure itself. To understand why, recall that the number of weights present in the vertex should always be one fewer than the number of influencing bone matrices. As I pointed out earlier, this is because indexed vertex blending in DirectX 8.0 requires normalized blend weights, and as the formula prescribes, the last blend weight is calculated as the difference of 1.0 and the sum of all blend weights present in the vertex. Note that by implementing soft-skinning using vertex shader matrix palettes (which I'll discuss shortly), you can decide youself whether or not to use non-normalized weights.
      The last important consideration relates to application performance. In order to minimize unnecessary bus traffic, your application should batch its calls to DrawPrimitiveXxx based on the number of bone matrices that influence a given mesh piece (for example, separate the character into multiple vertex buffers, one containing all vertices influenced by only one bone matrix, one containing all vertices influenced by two bone matrices, and so on). If this isn't done, you'll be forced to pad vertices with extra zero weights of a quantity equal to the difference of the greatest number of weights used in the mesh and the number used in the vertex in question. Since these zero weights would have no effect on the final blended vertex, they would be a waste of precious bus bandwidth as well as processor cycles, and may result in lower polygon throughput. To be fair, this is a limitation in both indexed vertex blending and vertex shader-based matrix palettes, because there is no early out within either path of computation.
      Taking these considerations into account, let's look at the code in Figure 3, which demonstrates how to render a mesh using indexed vertex blending. It starts by checking the device for proper support, then sets the bone matrices that will influence the vertices in the mesh using the D3DTS_WORLDMATRIX(n) macro. At this point it enables the D3DRS_VERTEXBLEND state and sets the D3DRS_INDEXEDVERTEXBLENDENABLE renderstate to TRUE. Finally, it renders the geometry with a call to IDirect3DDevice8::DrawIndexedPrimitive.
      You may have noticed that besides enabling the D3DRS_INDEXEDVERTEXBLENDENABLE renderstate, the code also enables the D3DRS_VERTEXBLEND renderstate by specifying the number of weights. This makes sense if you think of indexed vertex blending as an additional state of the device that expands the power of regular geometry blending. It's worth noting that if you enable the D3DRS_VERTEXBLEND renderstate but not the D3DRS_INDEXEDVERTEXBLENDENABLE state, the weights in the vertices will simply reference into the first four matrices, which may produce some funny-looking deformations, unless your character has only four bones.
      When it's all said and done, indexed vertex blending is a convenient way for applications to perform skinning on meshes that don't have complex skeletal structures, so long as the application doesn't require any vertex processing formulas that fall outside of the capabilities of traditional fixed-function renderstates. Despite the improvements over regular geometry blending, indexed vertex blending isn't as flexible as you might like. One of the drawbacks is that the number of bone matrices that can influence a vertex is still limited to four, and when processed in hardware this number may continue to vary from one graphics card manufacturer to another. Additionally, going forward there may not be widespread hardware support for indexed vertex blending. For applications that want the most control over the final geometry transformation, the answer lies in performing the weighted blending yourself with the help of a vertex shader and matrix palettes.

What Is the Matrix Palette?

      Now that I've covered the fundamentals of soft-skinning and how to implement it using the fixed function pipeline, let's examine the more powerful and advanced approach of using matrix palettes from within vertex shaders. Simply speaking, by using matrix palettes you'll be implementing a customized soft-skinning solution through the use of the DirectX 8.0 vertex shader technology. Besides being free of the rules and limitations of a fixed-function solution, you'll be able to take advantage of all the other benefits that come with vertex shaders, such as custom lighting algorithms and hardware acceleration. In the end you'll be able to customize every aspect of the soft-skinning code path to take maximum advantage of your application's requirements. If you've already developed an impressive effect using a vertex shader, with a little thought you'll be able to modify it to work alongside a matrix palette shader. The possibilities are plentiful.
      There are two important considerations to discuss before preparing to implement a skinning shader. First, the DirectX 8.0 vertex shader specification guarantees that the device's constant register file will be comprised of at least 96 four-component floating-point vectors. This is important because you will store the bone matrices in the constant registers so that they can be accessible to the skinning shader. This range of registers will make up the palette of bone matrices along with any other needed parameters. It should be obvious that by using standard 4�4 matrices, the shader will be limited to a maximum of 24 bone matrices, and that's without using any other parameters necessary for lighting. With a little ingenuity, however, you can make good use of these constant registers, providing a respectable palette of eight bone matrices while still having room left for lighting and displacement parameters. For starters, you can save register space by using 4�3 matrices, and already you've increased the available number of registers by eight. To determine the exact number of vertex shader constant registers available on the device, check the MaxVertexShaderConst member of the D3DCAPS structure. While future versions of DirectX might increase the number of constant registers available to vertex shaders, for now your implementation will need to work with the 96 that are guaranteed to be available.
      Second, you need to take into account whether the device that your application is using supports address registers. An address register enables a shader to address into the device's constant register file using a relative offset. Without it, the shader would only be able to access a constant register using a literal numerical value known at compile time. To use an example more familiar to you, consider an array declared in C++. Imagine that your application had declared an array of 96 vectors called arrVectors, and that you weren't able to address into it using code like arrVec[i], where i was some runtime-determined value. Without that ability, your application would have to know at compile time the exact address into the array for the element it sought. At first this might seem like a huge limitation for vertex shaders; however, many interesting effects can be accomplished without address registers. Unfortunately, matrix palette skinning is not among them.
      To understand why matrix palette skinning can't be performed without an address register, just consider that the vertex shader won't know the value of a bone matrix index until it unpacks it from the vertex stream, at which time it will use this value to address into the constant register file to obtain the bone matrix in question. An application can check the VertexShaderVersion member of the D3DCAPS structure to determine if the device in question supports address registers. If the value of DirectX is 1.1 or higher, then a single address register is supported, namely a[0]. When working with a device running in software, at least one address register will be supported; how many more than that will depend on future versions of DirectX.
      Now I can begin to discuss the implementation of the skinning shader. Let's start by examining a bit of pseudocode for those who haven't written assembly in a while (myself included), since the vertex shader assembler macros are a restricted type of assembler in both form and function. The following are the operations to perform on each vertex that passes through the pipeline.
  1. Unpack the index to the first matrix from the vertex's index component.
  2. Transform the vertex by the bone matrix in question.
  3. Scale the vertex by the first blend weight.
  4. Store the vertex in a temporary register.
          Perform steps 1 through 4 until bone n is reached.
  5. For bone n, perform steps 1 and 2, then scale vertex by (1 minus the sum of all weights).
  6. Add all the blended vertices that were obtained during steps 1 through 5.
  7. Transform to homogeneous clipping space.
  8. Output the final vertex position.
      You've probably noticed that these steps follow the variable blend weight formula discussed earlier.
      Before examining the code for the soft-skinning vertex shader, let's first discuss the shader's declarator. Declarators are nothing more than an array of DWORDs that bind the vertex stream data to the vertex shader's registers, and you can use the declarator preprocessor macros to help you create one. Figure 4 shows a declarator created specifically for the matrix palette skinning shader, which will accept one vertex, eight blend weights, a diffuse color, and one set of texture coordinates. The data will be available to the shader from stream 0. (Note that just prior to declaring the vertex stream bindings, I have declared one constant register. As you'll see later on, I'll use this to help calculate the fourth and final blend weight. You can ignore it for now.)
      Note that even though you're blending with eight different weights, the declarator stores them using two registers, as specified by the D3DVSDT_FLOAT4 and D3DVSDT_FLOAT3 macros. You might be wondering: why not use the D3DVSDT_FLOAT1 macro for each of the eight weights? The reason has to do with the way the macros expand. When using any of the D3DVSDT_FLOATx macros, they will all expand into a four-component vector, taking up a full register and leaving the remaining components unused. For instance, if you had used a D3DVSDT_FLOAT1 macro to bind the first weight (which is a single scalar value), you'd be wasting the y, z, and w components, which could easily be used to store weights 2 through 4. Therefore, if you were to use one D3DVSDT_FLOAT1 macro for each of the seven weights (remember, so far the eighth is being calculated automatically), you'd be wasting a total of 21 component slots, the equivalent of roughly five registers. The sample declarator I've presented in Figure 4 dispenses with the D3DVSDE_xxx macros and reassigns the vertex registers to the vertex stream elements of my choosing. This is simply a matter of preference, although if you don't use the D3DVSDE_xxx macros, you'll need to pass a 0 as the FVF parameter when creating the vertex shader via the IDirectD3Device8::CreateVertexShader function.
      Additionally, note that I've used D3DVSDT_UBYTE4 as the register type for binding the bone matrix index data. This enables the application to pass the indices as packed bytes inside a DWORD. They will be automatically unpacked once they reach the vertex shader and spread out over the individual components of the register. While in the end this won't save you any register space, it will help reduce the bus traffic between the CPU and the GPU because the bone indices will be compacted.
      However, I should mention that while writing this article it came to my attention that graphics cards based on NVIDIA's NV20 chipset will not be supporting D3DVSDT_UBYTE4. Although you can expect this to be an exception and not the norm for the forthcoming generation of graphics hardware for DirectX 8.0, there are two strategies I suggest you take. First, since support for D3DVSDT_UBYTE4 is reflected in the D3DCAPS structure, make sure your application checks for it by inspecting the D3DVTXPCAPS_NO_VSDT_UBYTE4 member. In the case of the NV20, you can take advantage of its support of 32-bit packed RGBA components. Simply pack the bone matrix indices into a D3DCOLOR value. Then perform a multiply on the vertex register containing the indices and a constant value of (255 * n) + .1, where n is the number of rows in your matrix. For example:
mul r1,v2.zyxw,c37.wwww
The constant component c37.www should be set to the sum of (255 * n) + .1, where n is the number of rows in your matrix. It's important to note that the components of the register containing the indices must be swizzled z to x, as in the above example. This workaround will operate in one instruction, and r1 will now contain the indices to the bone matrices. With cards that don't support D3DVSDT_UBYTE4, your application will have to store each weight in a single register component.
      When developing with a small and finite set of registers, the techniques I've just discussed are important because you'll need to save as much space as possible for other shader parameters. If you're careful with your shader design, you'll get by quite well.

Down to Business

      Now that you have a shader declarator, you can begin to implement the shader itself. Figure 5 shows the complete code for a matrix palette skinning shader that blends each vertex using eight bone matrices. The example uses eight bone matrices so that it's easy to follow. It's trivial to add additional bone matrices, as long as enough constant registers remain. Aside from performing the soft-skinning, the shader will pass along unmodified texture coordinates and a diffuse color. No additional operations, such as lighting (which requires the inverse transpose of the bone matrices to be applied to the vertex normals), will be performed.
      As you can see, the code for the shader follows the blending formula I've discussed earlier in this article. Before transforming the vertex by each bone matrix, the bone's index is moved into the address register, a0. From this point the shader can use the address register to retrieve the appropriate bone matrix from the constant register file and transform the vertex. It does this by performing a dot product between each row vector of the bone matrix and the x, y, and z components of the vertex, then storing the result in a temporary register to be blended later. The astute among you may notice that unless a certain condition is met, the method used by the shader to address into the constant register file will only work for the first bone matrix—namely, bone 0. This is because each bone matrix is made up of three vectors. Once you get past bone 0, you'll have to add the value 3 to each index (or however many rows are in your bone matrices). Unfortunately, there are no conditional instructions available to a vertex shader and, therefore, there is no way to determine at shader runtime whether the index value is equal to or greater than 0.
      There are two ways to solve this problem, one being to multiply the index value by 3 and store the result in a temporary register. Then the shader could add this temporary value to the row number it wants to obtain from the matrix. The problem is that this will require an extra multiplication operation for each bone matrix used. The better alternative is to pre-multiply each bone matrix index by the number of rows in the matrix, and use this result as the new value of the index. Your application would then pass the vertex data to the shader with the correctly scaled index already in place. This index pre-multiplication step could be performed by the application as it loads the mesh into memory, or the modeling software could do it before the geometry is exported.
      The last steps of the process scales each temporary vertex register by the appropriate blend weight and adds them together to arrive at the final vertex. You can obtain the last blend weight by doing a dot product of the register containing the blend weights and a constant register set to (-1,-1,-1,1) assuming you've temporarily preset the fourth weight to a value of 1, a dot product of (-1,-1,-1,1) and (b1,b2,b3,1) will result in the correct fourth weight value, since one plus the negative sum of the first three blend weights results in a positive difference.
      That said, the example I just gave wastes two components by declaring three of the four components to have the same value, -1. As I've been saying, you'll want to save register space to allow you to pass additional parameters to your shader. Luckily, DirectX 8.0 provides you with the swizzle modifier, enabling you to write the dot product instruction like this:
dp4 r16.w, 416, c[16].xxxy
      Now the value of the x component will be substituted for the y and z value, saving you two components. Besides the space savings, if the vertex shader is running in software, the swizzle might result in a slight speed advantage, since the component's value is obtained only once. In hardware that won't make any difference.
      Once the vertex is blended, it's transformed by the world-view-projection matrix into homogenous clipping space and passed onto the shader's vertex output register, oPos. The vertex shader code in Figure 5 transforms the vertex by explicitly using the dot product 4 instruction for each row of each bone matrix. If you prefer, you can use the matrix macro instructions provided by DirectX 8.0, which would allow you to specify the transformation in just one line of code. You could use the m4�3 macro in place of three dp4 instructions. This may make your code neater, but the transformation will still require three dot products.
      With the custom vertex type, the declarator, and the shader taken care of it's now just a matter of animating the bone matrices that influence the character's body. The way you go about doing that is really up to you; however, you'll get the smoothest motion from using quaternion and spherical linear interpolation. For a more in-depth treatment of this subject, I suggest you read the first volume of Allan Watt's book 3D Computer Games Technology (Addison-Wesley, 2000). After you've updated a character's bone matrix transformations during each frame, it's simply a matter of invoking IDirect3DDevice::SetVertexShaderConstant, which accepts an array of four floating-point component vectors, a DWORD specifying the constant register address that the data should be loaded into, and the number of constants in the array. If you're using a matrix class with proper operator overloading (such as the D3DXMATRIX class in the Direct3DX class library) you can simply pass a pointer to the matrix class itself along with the register address and the number of constants to load, which for a 4�3 matrix will be three—one for each row.
      This snippet shows a simple example of setting the bone matrix.
// animate the bone matrix, m_pSomeBone
animateBoneMatrix( m_pUpperLegBone, m_pLowerLegBone );

// load the bone matrices into the constant registers
m_pD3DDevice->SetVertexShaderConstant( 0, m_pUpperLegBone, 4 );
m_pD3DDevice->SetVertexShaderConstant( 0, m_pLowerLegBone, 4 );
At this point, the application can call the appropriate DrawPrimitiveXxx function and the soft-skinning shader will use the bone matrices to transform the character's geometry.

Optimize, Optimize, Optimize

      Now you have all the necessary information for implementing your own matrix palette skinning solution. As you're no doubt aware, every processor cycle counts in the world of real-time graphics software, especially when a program will touch every vertex in a mesh. Throughout this article I've suggested performance and optimization tips wherever I thought it wouldn't complicate the fundamental concepts being explained. That said, there are still more ways to optimize the matrix palette solution than I've presented.
      One good way to begin optimizing is to look for alternate ways of representing your skeletal transformations. When trying to find other areas for improvement in your soft-skinning shader, let the requirements of your application be your ultimate guide. You're likely to find different optimizations depending on the structure of the character skeleton and the architecture of your application.
      It's an exciting time in the world of real-time 3D graphics, and there's no doubt that the future is bright for both game developers and fans of DirectX. Soft-skinning provides developers a new avenue to explore in the realm of character animation, and DirectX 8.0 is a vehicle for your journey. As always, no one technique is perfect and you should always use the right tool for the job. A good application will use soft-skinning where it provides the most impact, and fall back on tweening when the situation calls for it. With that said, I wish you luck on your experimentation and look forward to seeing what kind of great applications you create. Feel free to drop me a line any time if you have any questions.
For related articles see:
Programmable Shaders for DirectX 8.0
Using Vertex Shaders: Part 1

For background information see:
Allan Watt's book, 3D Computer Games Technology (Addison-Wesley)
DirectX 8.0 SDK

Benjamin Freidlin is the Technical Editor at MSDN Magazine. When he's not listening to Yankee baseball on the radio or playing video games, he's usually fiddling with the latest DirectX bits. He can be reached at benfr@microsoft.com.

Page view tracker