Controlling White Space with the DOM (Windows Embedded CE 6.0)

1/6/2010

When a text file is opened with the xmlDoc.load or xmlDoc.loadXML methods (where xmlDoc is an XML DOM document), the parser strips most white space from the file unless specifically directed otherwise; the parser notes within each node whether one or more spaces, tabs, newlines, or carriage returns follow the node in the text by setting a flag. This method is efficient, reducing both the size of each XML file and the number of calculations required to redisplay the XML in a browser. However, because this information is lost, an XML document stored in this manner can lose formatting information in its content. Tabs, in particular, can be lost because they are not formally recognized in the default mode as anything but white space.

XSLT uses the XML DOM, not the source document, to guide the transformation. Because the white space has already been stripped to process the XML into the DOM, white space characters are lost even before the transformation takes place. Most of the XSLT-related methods for specifying white space in the source data document or style sheets are applied too late to make a difference in formatting.

The preserveWhiteSpace property tells the XML parser whether or not to convert white space from the initial source file that acts against the XML DOM. If explicitly set (the default is FALSE), it must always be set prior to loading a file; otherwise, the default is to strip white space characters and reduce the file to the smallest possible stream. When preserveWhiteSpace is set to TRUE, the XML document retains all of the characters within the file when converted into a DOM; when set to FALSE, the white space characters are stripped from the file.

If you set the preserveWhiteSpace property from TRUE to FALSE then back to TRUE for a given DOM document, the spaces will not reappear — setting the property to FALSE actually removes the space from the DOM, which cannot reconstruct it.

If you are working with XML as a data format streamed to some other process, disable preserveWhiteSpace by setting it, or allowing it to default, to FALSE. If retaining positional information is important, for example, in conversions to non-XML formats like tab-separated data, set preserveWhiteSpace to TRUE. Be aware this option increases the number of characters and places more demands on the browser.

To demonstrate how white space can be programmatically controlled using the DOM, see the following three documents:

  • striporpres.xml
    The XML source document, which includes elements that contains different kinds of white space, including tags, spaces, and newlines.
  • striporpres.xsl
    An XSLT style sheet that makes invisible white space visible for demonstration purposes.
  • striporpres.htm
    An HTML file that contains Microsoft JScript® that loads the striporpres.xml and striporpres.xsl documents into the separate DOM document objects, and then alternately sets the preserveWhitespace property of the DOMDocument object to TRUE or FALSE. The JScript also applies the striporprres.xls style sheet to the striporpres.xml document using the transformNode method and assigns the resulting string to the innerHTML property, which is used to display the results in Internet Explorer.

striporpres.xml

The following shows the code for the striporpres.xml document. In this document, each of the <whitespace> elements includes different kinds of white space, including tabs, spaces, and newlines.

<whitespaceTest>
    <whitespace>Tabs[]</whitespace>
    <whitespace>Spaces[  ]</whitespace>
    <whitespace>Newlines[

    ]</whitespace>
</whitespaceTest> 

In the striporpres.xml style sheet, there is no <?xml-stylesheet?> processing instruction. Instead, a style sheet is applied programmatically to the document using the transformNode method. Unlike a "pure XSLT" solution, this technique allows you to set and reset the preserveWhiteSpace property.

striporpres.xsl style sheet

The striporpres.xsl style sheet consists of two template rules. The first rule applies to the source document root node, instantiating an HTML <pre> element node in the result tree. The transformation of the <whitespace> elements are placed in this node, which is handled by the second template rule. Like the <xsl:preserve-space> and <xsl:strip-space> Example, the XML Path Language (XPath) translate() function is used to make the invisible white space visible.

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="/">
    <pre><xsl:apply-templates /></pre>
  </xsl:template>
  <xsl:template match="whitespace">
    <!-- Use translate() XPath function to convert character X to
      character Y. -->
    <xsl:value-of select="translate(.,' &#10;&#13;&#9;','-NRT')"/>
  </xsl:template>
</xsl:stylesheet>

striporpres.htm

The striporpres.htm file contains a JScript function, preserveStripPreserve, that is called when the file is loaded. The preserveStripPreserve function first sets the preserveWhiteSpace property, and then loads striporpres.xml into the objSrcTree DOMDocument object.

<html>
<head>
  <title>Demo: Controlling White Space Output via the DOM</title>
    <script language='JScript'>
      <!--
        function preserveStripPreserve() 
        {
          // Associate the result tree object with any element(s) whose
          // id attribute is "testResults."
          var objResTree = document.all['testResults'];
          // Declare two new Msxml DOMDocument objects.
          var objSrcTree = new ActiveXObject('Msxml2.DOMDocument');
          var objXSLT = new ActiveXObject('Msxml2.DOMDocument');

          // Load the two DOMDocuments with the XML document and the
          // XSLT style sheet.
          objSrcTree.preserveWhiteSpace = true;
          objSrcTree.load('stripOrPres.xml');
          objXSLT.load('stripOrPres.xsl');
          // Use the transformNode method to apply the XSLT to the XML.
          strResult = objSrcTree.transformNode(objXSLT);

          // Now reset preserveWhiteSpace to false, and re-load the source...
          objSrcTree.preserveWhiteSpace = false;
          objSrcTree.load('stripOrPres.xml');
          // ...and rerun the transform. Note the concatenation of the
          // this transformation's result tree to the one created when
          // preserveWhiteSpace was true.
          strResult = strResult + objSrcTree.transformNode(objXSLT);
          // Assign the resulting string to the result tree object's
          // innerHTML property.
          objResTree.innerHTML = strResult;

          return true;
        }
      -->
    </script>
    <body onload='preserveStripPreserve()'>
    <div id='testResults'></div>
  </body>
</html>

Results

In Internet Explorer, the striporpres.htm file appears as follows.

This block shows the results from setting the preserveWhiteSpace property to TRUE.

    Tabs[TT]

    Spaces[--]

    Newlines[NNT]

This block shows the results from setting the preserveWhiteSpace property to FALSE.

Tabs[TT]Spaces[--]Newlines[NNT]

In the first block, the contents of each of the original <whitespace> elements appear on a separate line. In the second block, where the preserveWhiteSpace property is set to FALSE, all contents of these elements appear on a single line. In addition, the first block is indented, while the second block is not. The indents and line breaks in the first block are a result of newline and tab characters in the source document between the boundaries of the <whitespace> elements. This white space is affected by the preserveWhiteSpace property.

The white space within the <whitespace> elements is not affected by the value of the preserveWhiteSpace property. To remove white-space-only text children of elements, use the XML Path Language (XPath) normalize-space() Function.

Note

Both of the "Newlines" blocks in the result tree contain a newline-newline-tab sequence, NNT, even though the corresponding <whitespace> element in the XML source document appeared to include only a pair of newlines. The extra tab is in the source document between the second newline and the ] character.

See Also

Reference

DOMDocument

Concepts

Controlling White Space with XSLT
Using the Normalize-space() Function
transformNode Method
preserveWhiteSpace Property