Understanding XML Digital Signature

Article
04/03/2007

Rich Salz
DataPower Technology

July 2003

Applies to:

Web Services Specifications (WS-Security Specification, et al.)
Microsoft® .NET Framework

Summary: This article looks at the XML Digital Signature specification, explaining its processing model and some of its capabilities. It provides a more detailed, lower-level understanding of how the WS-Security specification implements its message security feature. (12 printed pages)

Introduction
Digital Signature Cryptography with No Real Math
Format of a Signature
Conclusion

Introduction

Digital signatures are important because they provide end-to-end message integrity guarantees, and can also provide authentication information about the originator of a message. In order to be most effective, the signature must be part of the application data, so that it is generated at the time the message is created, and it can be verified at the time the message is ultimately consumed and processed.

SSL/TLS also provides message integrity (as well as message privacy), but it only does this while the message is in transit. Once the message has been accepted by the server (or, more generally, the peer receiver), the SSL protection must be "stripped off" so that the message can be processed.

As a more subtle point, SSL only works between the communication endpoints. If I'm developing a new Web service and using a conventional HTTP server (such as IIS or Apache) as a gateway, or if I'm communicating with a large enterprise that has SSL accelerators, the message integrity is only good up until the SSL connection is terminated.

As an analogy, consider a conventional letter. If I'm sending a check to my phone company, I sign the check—the message—and put it in an envelope to get privacy and delivery. Upon receipt of the mail, the phone company removes the envelope, throws it away, and then processes the check. I could make my message be part of the envelope, such as by gluing the payment to a postcard and mailing that, but that would be foolish.

An XML signature would define a series of XML elements that could be embedded in, or otherwise affiliated with, any XML document. It would allow the receiver to verify that the message has not been modified from what the sender intended.

The XML-Signature Syntax and Processing specification (abbreviated in this article as XML DSIG) was a joint effort of the W3C and the IETF. It's been an official W3C Recommendation since February 2002. Many implementations are available; within the .NET Framework, System.Security.Cryptography.Xml implements it. Within the WS-Security triad—authentication, content integrity, and content privacy—XML DSIG provides integrity and can be used to provide sender authentication.

Digital Signature Cryptography with No Real Math

Before we can really understand XML DSIG, we need to have an understanding of some basic cryptography. I'll cover the concepts in this section, but don't panic: no complex math is involved.

A digital signature provides an integrity check on some content. If a single byte of the original content has been modified—an extra zero added to a price, a "2" changed to a "4", or a "No" to a "Yes"', and so on—then the signature will fail to verify. Here's how it works.

The first step is to ''hash'' the message. A cryptographic hash takes an arbitrary stream of bytes and converts it to a single fixed-size value known as a digest. A digest is a one-way process: it's ''computationally infeasible'' to recreate a message from the hash, or to find two different messages which produce the same digest value.

The most common hash mechanism is SHA1, the Secure Hash Algorithm. It was created by the US Government and released as a standard in 1995; the full specification is available at http://www.itl.nist.gov/fipspubs/fip180-1.htm. SHA1 takes any message up to 2**64 bytes in length and produces a 20-byte result. (So that means there are 2**160 possible digest values; by comparison, current estimates put the number of protons in the universe at around 2**250).

So if I generate a message M, and create a digest, (written as H(M), for "the hash of M"), and you receive M and H(M), you can create your own digest H'(M), and if the two digest values match, we know that you got what I sent. To protect M against modification, I only need to protect H(M) from being modified.

How do we do that? There are two common approaches. The first is to mix a shared secret into the digest. In other words, create H(S+M). When you get the message, you use your own copy of S to create H'(S+M). This new digest is called an HMAC, or Hashed Messsage Authentication Code.

When we use an HMAC, the strength of the integrity protection depends on the (in)ability of the attacker to figure out S. Therefore, S should be something not easily guessed, and something that should be changed often. One of the best ways to meet these requirements is to use Kerberos. In Kerberos, a central authority distributes "tickets" that contain a temporary session key whenever two entities want to communicate. This session key is used as the shared secret. When I want to send you a signature, I get a ticket to talk to you. I open my part of the ticket to get S, and I send you the message, its HMAC, and your part of the ticket. You open the ticket (using the password that you originally registered with Kerberos) and get S and information about my identity. You can now take the message, M, generate your own H'(S+M), and see if they match. If they do match, you know that you received my message intact, and Kerberos told you who I am.

Another method to protect the digest is to use public-key cryptography, such as RSA. In public-key cryptography, there are two keys, a private key, known only to the holder, and a public key, accessible to anyone who wants to communicate with the key holder. In public-key cryptography, anything encrypted with the private key can be decrypted with the public key, and vice versa.

Let's look at a simple example that demonstrates how public-key cryptography works. In this example, we'll limit our messages to the letters a through z, and assign them the values one through 26. To encrypt, we'll add the value of the private key; in this case it's +4:

Letter	h	e	l	l	o
Numeric Value	8	5	12	12	15
Private Key	4	4	4	4	4
Encrypted Value	12	9	16	16	19

To decrypt, we add the public key, which will be +22; if the result is outside the number range, we add or subtract 26 until it's valid. (Put another way, to decrypt we add the public key, and take the result modulo 26).

Encrypted Value	12	9	16	16	19
Public Key	22	22	22	22	22
Raw decrypted value	34	31	38	38	41
Normalized value	8	5	12	12	15
Plaintext	h	e	l	l	o

RSA works the same way, except that instead of addition we use exponentiation and the numbers are hundreds of digits long.

Using RSA, I generate a digest, H(M), and encrypt it with my private key, {H(M)}private-key, which is the signature. When you receive the message, M, you generate the digest, H'(M), and decrypt the signature using my public key, getting the H(M) that I generated. If H(M) and H'(M) are the same, then we know that M is the same. Further, you know that whoever has the private key—that is, me—is the sender of the message.

Format of a Signature

XML-DSIG uses a single namespace, and we'll assume the following declaration is present in our examples:

    xmlns:ds="http://www.w3.org/2000/09/xmldsig#"

A top-level <ds:Signature> element is fairly simple. It has information about what is being signed, the signature, the keys used to create the signature, and a place to store arbitrary information:

    <element name="Signature" type="ds:SignatureType"/>
    <complexType name="SignatureType">
      <sequence> 
        <element ref="ds:SignedInfo"/> 
        <element ref="ds:SignatureValue"/> 
        <element ref="ds:KeyInfo" minOccurs="0"/> 
        <element ref="ds:Object" minOccurs="0" maxOccurs="unbounded"/> 
      </sequence>  
      <attribute name="Id" type="ID" use="optional"/>
    </complexType>

We'll look at these in increasing order of their complexity.

     Id
     ds:SignatureValue
     ds:Object
     ds:SignedInfo
     ds:KeyInfo

The ds:Signature/@Id attribute

The global Id attribute allows a document to contain multiple signatures, and provides a way to identify particular instances. Multiple signatures are common in business policies, such as when both the manager and the Travel Office must approve a trip application.

The ds:SignatureValue element

This element contains the actual signature. As signatures are always binary data, XML DSIG specifies that the signature value is always a simple element with Base64-encoded content:

    <element name="SignatureValue" type="ds:SignatureValueType"/> 
    <complexType name="SignatureValueType">
      <simpleContent>
        <extension base="base64Binary">
          <attribute name="Id" type="ID" use="optional"/>
        </extension>
      </simpleContent>
    </complexType>

In order to interpret the SignatureValue, it's necessary to understand the content in the SignedInfo element, which we'll discuss below. Until then, it's just an opaque string of bytes:

    <SignatureValue>
        WvZUJAJ/3QNqzQvwne2vvy7U5Pck8ZZ5UTa6pIwR7GE+PoGi6A1kyw==
    </SignatureValue>

The ds:Signature/ds:Object element

As we'll see below, an XML DSIG can cover multiple items. An item will often be able to stand on its own, such as a Web page or XML business document, but sometimes an item is best treated as metadata for the "true" content being signed. For example, the data might be a "property" of the signature, such as a timestamp for when the signature was generated.

The ds:Object element can be used to hold such data within the Signature:

    <element name="Object" type="ds:ObjectType"/> 
    <complexType name="ObjectType" mixed="true">
      <sequence minOccurs="0" maxOccurs="unbounded">
        <any namespace="##any" processContents="lax"/>
      </sequence>
      <attribute name="Id" type="ID" use="optional"/> 
      <attribute name="MimeType" type="string" use="optional"/>
      <attribute name="Encoding" type="anyURI" use="optional"/> 
    </complexType>

The Id attribute allows a signature to have multiple objects that can be independently addressed. The MimeType is used to identify the data so that other processors can use it; it has no meaning to the DSIG processor.

The Encoding specifies how to pre-process the content; currently only base-64 encoding is defined.

Here are two objects (of identical content) that could be used as a simple indicator of when a document was signed. A service that provided this in its signatures might be useful to online contests, auctions, or other activities that have submission deadlines:

    <ds:Object Id="ts-bin" Encoding="http://www.w3.org/2000/09/xmldsig#base64">
        V2VkIEp1biAgNCAxMjoxMTowMyBFRFQgMjAwMwo
    </ds:Object>
    <ds:Object Id="ts-text">
        Wed Jun  4 12:11:06 EDT
    </ds:Object>

The ds:SignedInfo element

Have you ever heard the aphorism, "Any problem in computer science can be solved with another level of indirection"? Well, as we're about to see, XML DSIG is a prime example of that.

The content of ds:SignedInfo can be divided into two parts, information about the SignatureValue, and information about the application content, as we can see from the following XML Schema fragment:

   <element name="SignedInfo" type="ds:SignedInfoType"/> 
   <complexType name="SignedInfoType">
     <sequence> 
       <element ref="ds:CanonicalizationMethod"/>
       <element ref="ds:SignatureMethod"/> 
       <element ref="ds:Reference" maxOccurs="unbounded"/> 
     </sequence>  
     <attribute name="Id" type="ID" use="optional"/> 
   </complexType>

XML is fairly lax about its syntax. For example, the order of attributes and how the values are quoted doesn't really matter. As far as XML processing software is concerned, the following two examples are completely equivalent:

    <a foo='yes' boo="no"/>
    <a boo="no" foo="yes"  ></a>

(Careful readers might try to find the other two differences I added.) But signatures require message digests, and such differences matter a great deal.

In order to work around this, the content must be canonicalized. Canonicalization, or C14N, is the process of picking one path through all the possible output options, so that sender and receiver can generate the exact same byte value, no matter what intermediate XML software might be involved. C14N is a deep subject, worthy of its own article.

The ds:SignedInfo/ds:CanonicalizationMethod element specifies how to reconstruct the exact byte stream. The ds:SignedInfo/ds:SignatureMethod element specifies what type of signature—e.g., Kerberos or RSA—is used to create the signature. Taken together, these two elements tell us how to create the digest, and how to protect it from modification.

Here's an example:

    <ds:SignedInfo>
        <ds:CanonicalizationMethod
             Algorithm="http://www.w3.org/2001/10/xml-exc-c14n"/>
        <ds:SignatureMethod
             Algorithm="http://www.w3.org/2000/09/xmldsig#rsa-sha1"/>
        ...

The ds:Reference element

The ds:SignatureValue element contains a signature that only covers the ds:SignedInfo element: only the content of ds:SignedInfo is included in the signature digest. So how do we actually sign other content? The trick—and the power—is in the ds:Reference elements.

As we can see from the schema definition of ds:SignedInfoType, above, a signature can have multiple references. This allows a single XML DSIG to cover multiple objects—all the parts in a MIME message, an XML file and the XSLT script that converts it to HTML, and so on.

The ds:Reference element refers to other content. It contains a digest of the content, an indication of how that digest was generated (e.g., SHA1), and a specification of how the content should be transformed before the digest is generated. The transformations provide amazing flexibility to XML DSIG. Here is the Schema fragment:

   <element name="Reference" type="ds:ReferenceType"/>
   <complexType name="ReferenceType">
     <sequence> 
       <element ref="ds:Transforms" minOccurs="0"/> 
       <element ref="ds:DigestMethod"/> 
       <element ref="ds:DigestValue"/> 
     </sequence>
     <attribute name="Id" type="ID" use="optional"/> 
     <attribute name="URI" type="anyURI" use="optional"/> 
     <attribute name="Type" type="anyURI" use="optional"/> 
   </complexType>

The Type attribute can provide a processing hint, but isn't generally useful.

The URI points to the actual content being referred to. Since it's a URI, the full power of the Web is available. For example, I can sign the contents of the MSDN home page:

    <ds:Reference URI="https://msdn.microsoft.com">
        <ds:DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
        <ds:DigestValue>HB7i8RaV7ZvuUlaTzZVx0S3POpU=</ds:DigestValue>
    </ds:Reference>

I can also refer to content within the XML document, such as the timestamp shown above:

    <ds:Reference URI="#ts-text">
        <ds:DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
        <ds:DigestValue>pN3j2OeC0+/kCatpvy1dYfG1g68=</ds:DigestValue>
    </ds:Reference>

And, of course, I can have both references inside the same signature.

The URI fragment is most commonly used with WS-Security to sign a SOAP message:

    <SOAP:Envelope xmlns:SOAP="https://schemas.xmlsoap.org/soap/envelope/">
        <SOAP:Header>
           <wsse:Security>
                   xmlns:wsse="https://schemas.xmlsoap.org/ws/2002/07/secext">
               ...
               <ds:Signature>
                    ...
                    <ds:SignedInfo>
                        <ds:Reference URI='#Body'>
                            ...
                        </ds:Reference>
                        ...
                    <ds:SignedInfo>
                    ...
               </ds:Signature>
               ...
           </wsse:Security>
        </SOAP:Header>
        <SOAP:Body Id='Body'>
            ...
        </SOAP:Body>
    </SOAP:Envelope>

As you probably expect, the ds:DigestMethod specifies the hashing algorithm, and ds:DigestValue is the Base64 value of the hash of the content.

The most powerful part of the ds:Reference element is the set of transforms that may appear. The ds:Transforms is simply a list of ds:Transform elements, each of which specifies a processing step. The Schema defines an array of transforms, with one—ds:XPath—that has a defined structure:

   <element name="Transforms" type="ds:TransformsType"/>
   <complexType name="TransformsType">
     <sequence>
       <element ref="ds:Transform" maxOccurs="unbounded"/>  
     </sequence>
   </complexType>

   <element name="Transform" type="ds:TransformType"/>
   <complexType name="TransformType" mixed="true">
     <choice minOccurs="0" maxOccurs="unbounded"> 
       <any namespace="##other" processContents="lax"/>
       <element name="XPath" type="string"/> 
     </choice>
     <attribute name="Algorithm" type="anyURI" use="required"/> 
   </complexType>

The content in a transform will depend on the Algorithm attribute. For example if simple XML is being signed, then there will most likely be a single transform that specifies a C14N algorithm:

  <ds:Transforms>
    <ds:Transform Algorithm="http://www.w3.org/2001/10/xml-exc-c14n"/>
  </ds:Transforms>

XML DSIG defines several transforms, including an XPath transform that makes it easy to sign a portion of a document, such as ignoring all the text, signing only the markup:

    <ds:Transforms>
        <ds:Transform Algorithm="http://www.w3.org/TR/1999/REC-xpath-19991116">
            <XPath>not(self::text())</XPath>
        </ds:Transform>
        <ds:Transform
                Algorithm="http://www.w3.org/TR/2001/10/xml-exc-c14n/>
        </ds:Transforms>

Other defined transforms include an embedded XSLT stylesheet, decrypting encrypted data, and so on.

The ds:KeyInfo element

At this point, we know how to refer to content, transform and hash it, and create a signature that covers (protects) that content. Recall that content is protected by using indirection: the ds:SignatureValue covers the ds:SignedInfo, which contains ds:References that contain the digest values of the application data. Change any of those things, and the chain of math computations is broken, and the signature won't verify.

The only thing left to do is to identify the signer, or at least the key that generated the signature (or, more cryptographically, the key that protects the digest from being modified). This is the job of the ds:KeyInfo element:

   <element name="KeyInfo" type="ds:KeyInfoType"/> 
   <complexType name="KeyInfoType" mixed="true">
     <choice maxOccurs="unbounded">     
       <element ref="ds:KeyName"/> 
       <element ref="ds:KeyValue"/> 
       <element ref="ds:RetrievalMethod"/> 
       <element ref="ds:X509Data"/> 
       <element ref="ds:PGPData"/> 
       <element ref="ds:SPKIData"/>
       <element ref="ds:MgmtData"/>
       <any processContents="lax" namespace="##other"/>
       <!-- (1,1) elements from (0,unbounded) namespaces -->
     </choice>
     <attribute name="Id" type="ID" use="optional"/>
   </complexType>

As we can see, XML DSIG supports a wide variety of key types and key infrastructures, and WS-Security goes further. We'll only look at two: a simple name, and an X.509 certificate. Using ds:KeyName is worthwhile when building a custom application for a closed environment:

    <element name="KeyName" type="string"/>

It's up to the process verifying the signature to map the name into its internal store and fetch the appropriate key. Common values of ds:KeyName include e-mail address or a directory entry.

X.509 certificates are supported through the ds:X509Data element. This element allows the signer to embed their certificate (in Base64), or any of several alternative forms of identifying the certificate: a subject's name, the issuer's name and serial number, the key identifier, or other format. The signer can also include a current copy of the Certificate Revocation List (CRL), to show that the signer's identity was valid at the time the document was signed. The Schema fragment below shows the different ways to identify an X.509 certificate:

    <element name="X509Data" type="ds:X509DataType"/> 
    <complexType name="X509DataType">
      <sequence maxOccurs="unbounded">
        <choice>
          <element name="X509IssuerSerial" type="ds:X509IssuerSerialType"/>
          <element name="X509SKI" type="base64Binary"/>
          <element name="X509SubjectName" type="string"/>
          <element name="X509Certificate" type="base64Binary"/>
          <element name="X509CRL" type="base64Binary"/>
          <any namespace="##other" processContents="lax"/>
        </choice>
      </sequence>
    </complexType>

    <complexType name="X509IssuerSerialType"> 
      <sequence> 
        <element name="X509IssuerName" type="string"/> 
        <element name="X509SerialNumber" type="integer"/> 
      </sequence>
    </complexType>

Because different applications will store and retrieve certificates using different schemes, XML digital signatures often include multiple names for the same key by embedding them within the same ds:KeyInfo element. In this example, we provide both a user-friendly name (useful for a GUI application to do a pop-up), and a unique identifier in the form of issuer and serial number (useful for a directory search):

    <ds:KeyInfo>
        <ds:KeyName>
            rsalz@datapower.com
        </ds:KeyName>
        <ds:X509Data>
            <ds:X509SubjectName>
                cn=Rich Salz, o=DataPower, c=US
            </ds:X509SubjectName>
            <ds:X509IssuerSerial>
                <ds:IssuerName>
                    ou=Development, o=DataPower, c=US
                </ds:IssuerName>
                <ds:SerialNumber>32</ds:SerialNumber>
            </ds:X509IssuerSerial>
        </ds:X509Data>
    </ds:KeyInfo>

Conclusion

We've taken an extensive walk through the XML DSIG specification using the schema definition to describe the features that are available and the processing that is required to generate and verify an XML DSIG document. We started with the basic signature element (ds:SignedInfo), looked at how it incorporates references to application content to protect that content, and finished with looking at part of the ds:KeyInfo element to see how an application can verify a signature, and perhaps validate the signer's identity. These three aspects provide the most basic and low-level components of protecting the integrity of XML (and other) content. Not surprisingly, their flexibility means that using them directly can be quite complicated.

One of the most strategic uses for XML DSIG is certainly going to be with the WS-Security specification. This provides a more application-oriented view of data protection and user authentication. To learn how XML DSIG is used within WS-Security, see Understanding WS-Security.