Real-World XML
Manipulate XML Data Easily with Integrated Readers and Writers in the .NET Framework
Dino Esposito
This article assumes you're familiar with XML and the .NET Framework
Level of Difficulty
1
2
3
SUMMARY
In the .NET Framework, XmlTextReader and XmlTextWriter provide for XML-driven reading and writing operations. In this article, the author discusses the architecture of readers and how they relate to XMLDOM and SAX parsers. He also shows how to use readers to parse and validate XML documents, how to leverage writers to create well-formed documents, and how to optimize the processing of large XML documents using functions to read and write Base64 and BinHex-encoded text. He then reviews how to implement a stream-based read/write parser that combines the functions of a reader and a writer into a single class.

Contents
About three years ago, I left a software conference believing that no future programming would be possible without a strong understanding of XML. XML has indeed come a long way since the early days, finding its way into even the deepest recesses of common programming frameworks. In this article, I'll review the role and internal characteristics of the Microsoft® .NET Framework API that deals with XML documents and then I'll move on to address a few open points.
From MSXML to XML in .NET
Prior to the advent of the .NET Framework, you used to write XML-driven applications for Windows® by exploiting the services of MSXML—a COM-based library. Unlike the classes in the .NET Framework, the MSXML library is more of a bolted-on API than a piece of code that is fully integrated with the underlying operating system. MSXML can certainly communicate with the rest of your application, but it doesn't really integrate with the surrounding environment.
The MSXML library can be imported in Win32® and even common language runtime (CLR)-targeted code, but it remains an external black box acting as a server component. .NET Framework-based applications, on the other hand, can use XML core classes right alongside the other namespaces of the .NET Framework so that the final code is well integrated and easy to read.
As a self-contained component, the MSXML parser provides advanced features such as asynchronous parsing. This feature is apparently lacking in the XML classes of the .NET Framework. By integrating XML classes with other classes in the .NET Framework, however, you can easily obtain the same functions and gain even more control over the process.
A library of XML functions should provide at least a set of basic services including parsing, querying, and transformations. In the .NET Framework, you find classes that support XPath queries and XSLT transformations, as well as classes to read and write XML documents. In addition, the .NET Framework incorporates classes that perform XML-related tasks such as object serialization (the XmlSerializer and the SoapFormatter classes), application configuration (the AppSettingsReader class), and data persistence (the DataSet class). In this article, I'll focus on the classes that accomplish basic I/O operations.
The XML Parsing Model
Since XML is a markup language, a tool capable of analyzing and understanding the lexical syntax is needed to effectively use the information stored in the documents. This tool is the XML parser—a sort of black box component that reads in markup text and returns platform-specific objects.
All the XML parsers available to programmers fall in one of two main categories, regardless of the underlying platform: tree-based and event-based processors. These two categories are commonly identified with their two most popular and concrete implementations, the Microsoft XML Document Object Model (XMLDOM) and Simple API for XML (SAX). The XMLDOM parser is a generic tree-based API which renders XML documents as an in-memory structure. The SAX parser provides an event-based API for processing each significant element in a stream of XML data. In general, a Document Object Model (DOM) implementation can be loaded from a SAX stream, so the two types of processors are not mutually exclusive.
Conceptually speaking, a SAX parser is diametrically opposed to an XMLDOM parser and the gap between the two models is indeed fairly large. XMLDOM is well defined in its set of functionalities and there is not much more you can reasonably expect from the evolution of this model. With this large set of functionalities, of course, comes a downside—a large memory footprint and hefty bandwidth that is required in order to process large documents.
SAX parsers work by letting client applications pass living instances of platform-specific objects to handle parser events. The parser controls the whole process and pushes data to the application which, in turn, is free to accept or just ignore it. This model is extremely lean and features a very limited memory footprint.
The .NET Framework provides full support for the XMLDOM parsing model, but not for SAX. There's a good reason for this. The .NET Framework supports two different models of parser: XMLDOM parsers and XML readers. The apparent lack of support for SAX parsers does not mean that you have to renounce the functionality that they offer. All the functions of a SAX parser can be implemented easily and more effectively by using an XML reader. Unlike a SAX parser, a .NET Framework reader works under the total control of the client application. In this way, the application itself can then pull out only the data it really needs and skip over the remainder of the XML stream. With SAX, the parser passes all available information to the client application, which will then have to either use or discard the information.
Readers are based on .NET Framework streams and work in much the same way as a database cursor. Interestingly, the classes that implement this cursor-like parsing model also provide the substrate for the .NET Framework implementation of XMLDOM parser. Two abstract classes, XmlReader and XmlWriter, are at the very foundation of all the .NET Framework XML classes, including XMLDOM classes, ADO.NET-related classes, and configuration classes. So in the .NET Framework you have two possible approaches when it comes to processing XML data. You can either use classes built directly on XmlReader and XmlWriter, or classes that expose information through the well-known XMLDOM object model. A more general-purpose introduction to readers of documents in the .NET Framework is available in my August 2002
Cutting Edge column.
The XmlReader Class
An XML reader supplies a programming interface that callers use to connect to XML documents and pull out the data they need. If you look at readers more closely, you realize that under the hood they work just like applications that fetch data out of a database. The database server returns a reference to a cursor object, which contains all the query results and makes them available on demand. The clients of an XML reader receive a reference to an instance of the reader class, which abstracts the underlying data stream and renders it as an XML tree. Methods on the reader class allow you to scroll forward through the contents, moving from node to node rather than from byte to byte or from record to record.
When viewed from the perspective of readers, an XML document is not a tagged text file but a serialized collection of nodes. Such a cursor model is specific to the .NET Framework; you will not find a similar programming API available elsewhere.
There are several key differences between readers and XMLDOM parsers. XML readers are forward-only, they have no notion of the surrounding nodes (siblings, parent, ancestors, children), and they are limited to reading. In the .NET Framework, reading and writing XML documents are two completely separate functions that require different, unrelated classes: XmlReader and XmlWriter. To be able to edit the contents of an XML document, you either use the XMLDOM parser (built around the XmlDocument class) or you design a custom class that aggregates two distinct entities like the reader and the writer under a common logical roof. Let's start by analyzing the programming features of the reader classes.
XmlReader is an abstract class that you can use to build more refined functionality. User applications are usually based on one of three derived classes: XmlTextReader, XmlValidatingReader, or XmlNodeReader. All these classes share a common set of properties (see Figure 1) and methods (see Figure 2). Note that sometimes the value that the properties actually contain depends on the reader class you are using in your code. Hence, the description of each property provided in Figure 1 refers to its intended goal, even though this may not reflect the role of the property in a derived reader class. For example, CanResolveEntity returns True only for XmlValidatingReader; it is set to False for any other type of reader class. Likewise, the behavior and the return value of some of the methods listed in Figure 2 are influenced by the type of the node the reader is working on. For example, all methods that involve attributes are void if the node is not an element node.

Figure 2 Methods of XmlReader-derived Classes
| Method |
Description |
| Close |
Closes the reader and sets the internal state to Closed |
| GetAttribute |
Gets the value of the specified attribute; an attribute can be accessed by index, local, or qualified name |
| IsStartElement |
Indicates whether the current content node is a start tag |
| LookupNamespace |
Returns the namespace URI to which the given prefix maps |
| MoveToAttribute |
Moves the pointer to the specified attribute; an attribute can be accessed by index, local, or qualified name |
| MoveToContent |
Moves the pointer ahead to the next content node or end of file; the method returns immediately if the current node is already a content node such as non-white space text, CDATA, Element, EndElement, EntityReference, or EndEntity |
| MoveToElement |
Moves the pointer back to the element node that contains the current attribute node; relevant only when the current node is an attribute |
| MoveToFirstAttribute |
Moves to the first attribute of the current Element node |
| MoveToNextAttribute |
Moves to the next attribute of the current Element node |
| Read |
Reads the next node and advances the pointer |
| ReadAttributeValue |
Parses the attribute value into one or more Text or EntityReference nodes |
| ReadElementString |
Reads and returns the text from a text-only element |
| ReadEndElement |
Checks that the current content node is an end tag and advances the reader to the next node; throws an exception if the node is not an end tag |
| ReadInnerXml |
Reads and returns all the content below the current node, including markup information |
| ReadOuterXml |
Reads and returns all the content of the current node, including markup information |
| ReadStartElement |
Checks that the current node is an element and advances the reader to the next node; throws an exception if the node is not a start tag |
| ReadString |
Reads the contents of an element or text node as a string; the method concatenates all the text until the next markup; for attribute nodes, it is equivalent to reading the attribute value |
| ResolveEntity |
Expands and resolves the current entity reference node |
| Skip |
Skips the children of the current node |

Figure 1 Properties of XmlReader-derived Classes
| Property |
Description |
| AttributeCount |
Gets the number of attributes on the current node |
| BaseURI |
Gets the base URL of the current node |
| CanResolveEntity |
Gets a value indicating whether the reader can resolve entities |
| Depth |
Gets the depth of the current node in the XML document |
| EOF |
Indicates whether the reader has reached the end of the stream |
| HasAttributes |
Indicates whether the current node has any attributes |
| HasValue |
Indicates whether the current node can have a value |
| IsDefault |
Indicates whether the current node is an attribute that originated from the default value defined in the DTD or schema |
| IsEmptyElement |
Indicates whether the current node is an empty element with no attributes or value |
| Item |
Indexer property which returns the value of the specified attribute |
| LocalName |
Gets the name of the current node with any prefix removed |
| Name |
Gets the fully qualified name of the current node |
| NamespaceURI |
Gets the namespace URI of the current node; relevant to Element and Attribute nodes only |
| NameTable |
Gets the name table object associated with the reader |
| NodeType |
Gets the type of the current node |
| Prefix |
Gets the namespace prefix associated with the current node |
| QuoteChar |
Gets the quotation mark character used to enclose the value of an attribute |
| ReadState |
Gets the state of the reader from the ReadState enumeration |
| Value |
Gets the text value of the current node |
| XmlLang |
Gets the xml:lang scope within which the current node resides |
| XmlSpace |
Gets the current xml:space scope from the XmlSpace enumeration (Default, None, or Preserve) |
The XmlTextReader class is designed to provide fast access to streams of XML data in a forward-only, read-only manner. The reader verifies that the submitted XML is well formed and throws an exception if it is not. The reader also performs a quick check to ensure that the referenced Document Type Definition (DTD), if any, exists. There's no case in which the XmlTextReader class validate the document contents against a schema or a DTD. The XmlTextReader parser is ideal for the quick processing of well-formed XML data accessible through file names, URLs, or open streams. If you need data validation, then you should use the XmlValidatingReader class.
An instance of the XmlTextReader class can be created in a number of ways and from a variety of sources including disk files, URLs, streams, and text readers:
XmlTextReader reader = new XmlTextReader(file);
Note that all the public constructors available require you to indicate the source of the data, be it a stream, a file, or whatever. The default constructor of the XmlTextReader class is marked as protected and, as such, is not used directly. As with all .NET Framework reader classes, once the reader object is up and running, you have to use the Read method to access the data. Readers move from their initial state to the first element only using the Read method. To move from any node to the next, you can continue using Read as well as a number of other more specialized methods including Skip, MoveToContent, and ReadInnerXml. To process the entire contents of an XML source, you typically set up a loop controlled by the Read method's return value: False when data is left to read and True otherwise.
Figure 3 shows a simple function that outputs the node layout of a given XML document. The function opens the document and sets up a loop to navigate through all the contents. Each time the Read method is called, the reader's internal pointer is moved one node forward. Although you will most likely be using the Read method to process element nodes, consider that in general when you move from one node to the next you are not necessarily moving between nodes of the same type. For example, the method does not move through attribute nodes. The reader's MoveToContent method lets you skip all the heading nodes and position the pointer directly on the first content node. Thus, the method skips over nodes of type ProcessingInstruction, DocumentType, Comment, Whitespace, and SignificantWhitespace.

Figure 3 Outputting an XML Document Node Layout
string GetXmlFileNodeLayout(string file)
{
// Open the stream
XmlTextReader reader = new XmlTextReader(file);
// Loop through the nodes and accumulate text into a string
StringWriter writer = new StringWriter();
string tabPrefix = "";
while (reader.Read())
{
// Write the start tag
if (reader.NodeType == XmlNodeType.Element)
{
tabPrefix = new string('\t', reader.Depth);
writer.WriteLine("{0}<{1}>", tabPrefix, reader.Name);
}
else
{
// Write the end tag
if (reader.NodeType == XmlNodeType.EndElement)
{
tabPrefix = new string('\t', reader.Depth);
writer.WriteLine("{0}</{1}>", tabPrefix, reader.Name);
}
}
}
// Write to the output window
string buf = writer.ToString();
writer.Close();
// Close the stream
reader.Close();
return buf;
}
Each node is given a type picked up from the NodeType enumeration. In the code shown in
Figure 3, only two types of nodes are relevant: Element and EndElement. The output of the code duplicates the structure of the original document but discards attributes and text, preserving only the node layout. Let's assume I use the following XML fragment:
<mags>
<mag name="MSDN Magazine">
MSDN Magazine
</mag>
<mag name="MSDN Voices">
MSDN Voices
</mag>
</mags>
The final output looks like the following:
<mags>
<mag>
</mag>
<mag>
</mag>
</mags>
The indentation of child nodes is obtained using the reader's Depth property, which returns an integer value denoting the level of nesting of the current node. All the text is accumulated in a StringWriter object—an extremely handy stream-based wrapper for the StringBuilder class.
As I mentioned earlier, attribute nodes are not automatically visited by a reader that moves forward with the Read method. To visit the set of attributes of the current element node you must use a similar loop, but one that is controlled by the MoveToNextAttribute method. The following code accesses all the attributes of the current node (the one you selected with Read) and concatenates their names and values into a comma-separated string:
if (reader.HasAttributes)
while(reader.MoveToNextAttribute())
buf += reader.Name + "=\"" + reader.Value + "\",";
reader.MoveToElement();
Once you're finished with node attributes, consider calling MoveToElement on the reader object. MoveToElement will "move" the internal pointer back to the element node that contains the attributes. To be precise, the method does not really "move" the pointer because the pointer never moved away from that element node during the attribute's navigation. The MoveToElement method simply refreshes some internal members making them expose values from the element node rather than the last attribute that was read. For example, the Name property returns the name of the last attribute that was read before you called MoveToElement and the name of the parent node read afterward. If you have no more work to do on the node and want to proceed to the next element once you're finished with the attributes, you don't really need to call MoveToElement.
Parsing the Contents of Attributes
In most cases, the content of an attribute is a simple string of text. However, it does not mean that string is the actual type of an attribute value. Sometimes the attribute value consists of the string representation of a more specific type, such as a Date or a Boolean, that you then convert into the native type using the methods of the static classes, XmlConvert or System.Convert. The two classes perform nearly identical tasks, but the XmlConvert class works according to the XML Schema Definition (XSD) data type specification and ignores the current locale.
Suppose that you have an XML fragment like the following:
<person birthday="2-8-2001" />
Let's also assume that, according to the current locale, the birthday attribute is February 8, 2001. If you convert the string into a specific .NET Framework type (the DateTime type) using the System.Convert class, everything will work as expected and the string will be transformed into the expected date object. By contrast, if you convert the string using XmlConvert, you'll get a parse error because the XmlConvert class does not recognize a correct date in the string. The reason is that in XML a date must have the YYYY-MM-DD format to be understood. The XmlConvert class works as a translator between CLR types and XSD types. When the conversion takes place, the result is locale independent.
In some situations, the value of the attribute comprises plain text along with entities. Of all the reader classes, only XmlValidatingReader is actually capable of resolving entities. The XmlTextReader class, although unable to resolve entity references, can separate text from entities when both are embedded in an attribute's value. For this to happen, you must parse the attribute's content using the ReadAttributeValue method instead of simply reading it through the Value property.
The ReadAttributeValue method parses the attribute value and isolates each constituent token, be it plain text or an entity. You call ReadAttributeValue repeatedly in a loop until the end of the attribute string value is reached. Since XmlTextReader does not resolve entities, there is not much you can do with the embedded entity other than writing your own resolver or maybe recognizing and skipping it. The following code snippet shows how you can call a custom resolver:
while(reader.ReadAttributeValue())
{
if (reader.NodeType == XmlNodeType.EntityReference)
// Resolve the "reader.Name" reference and add
// the result to a buffer
buf += YourResolverCode(reader.Name);
else
// Just append the value to the buffer
buf += reader.Value;
}
The final value of a mixed content attribute is determined by accumulating the text in a global buffer. When the attribute value has been fully parsed, the ReadAttributeValue method returns False.
Manipulating XML Text
When working with XML markup text, some features can quickly become issues if not properly tackled. One such stumbling point is character conversion, which is sometimes necessary in order to transmit non-XML text in an XML stream of data. Not all the characters you may find available on a given platform are necessarily valid XML characters. Only the characters in the XML specification (www.w3.org/TR/2000/REC-xml-20001006.html) can be safely used for element and attribute names.
The XmlConvert class provides key functionality for tunneling non-XML names through XML between servers. When names contain invalid XML characters, the methods EncodeName and DecodeName can adjust them to fit into an XML name schema. Several applications, including SQL Server™ and Microsoft Office, allow and support Unicode characters in their documents. However, some of these characters are not valid as XML names. The typical circumstance, which demonstrates the importance of XmlConvert, occurs when you manipulate database column names containing blanks. Although SQL Server allows for a column name like Invoice Details, this would not be a valid name for an XML stream. The space must be replaced with its hexadecimal encoding, resulting in Invoice_0x0020_Details. The only valid form for hexadecimal sequences is _0xHHHH_, where HHHH stands for a four-digit hexadecimal value. Similar but different formats are left unaltered, though they can be considered logically equivalent. Here's how you can programmatically obtain that string:
XmlConvert.EncodeName("Invoice Details");
The reverse operation is accomplished by DecodeName. This method translates an XML name back to its original form by unescaping any escaped sequence. Notice that only fully escaped forms are detected. For example, only _0x0020_ is rendered as a blank, while _0x20_ would not be processed:
XmlConvert.DecodeName("Invoice_0x0020_Details");
In XML, whitespace found in the body of an XML document can either be significant or insignificant. Whitespace is said to be significant when it appears in the text of an element node or when it appears to be within the scope of a whitespace declaration:
<MyNode xml:space="preserve">
<!-- any space here must be preserved -->
•••
</MyNode>
In XML, the blanket term "whitespace" encompasses not just blanks (ASCII 0x20), but also carriage returns (ASCII 0x0D), line feeds (ASCII 0x0A), and tabs (ASCII 0x09).
The XmlTextReader class lets you control how whitespace is handled through the WhiteSpaceHandling property. This property accepts and returns a value taken from the WhiteSpaceHandling enumeration, which lists three feasible options. The default option is All and means that both significant and insignificant spaces are returned as distinct nodes—SignificantWhitespace and Whitespace, respectively. The option None indicates that no whitespace at all will be returned as a node. Finally, the option "Significant" discards all insignificant whitespace and only returns nodes of the type SignificantWhitespace. Note that the WhiteSpaceHandling property is one of the few reader properties that can be changed at any time and takes effect on the very next Read operation. The other "sensitive" properties are Normalization and XmlResolver.
Strings and Fragments
Programmers who cut their teeth on MSXML will certainly notice a key difference between the COM library and the .NET Framework XML API. The .NET Framework classes don't provide a native way to parse XML data stored in a string. Unlike the MSXML parser object, the XmlTextReader class doesn't provide any kind of loadXML method to build a reader from a well-formed string. This lack of a loadXML-style method is moot. There's no need for one because you can get the same functionality using a particular text reader—the StringReader class.
One of the XmlTextReader constructors accepts a TextReader-derived object and instantiates the XML reader from the contents of the text reader. A text reader class is a stream which has been optimized for character input. The StringReader class inherits from TextReader and uses an in-memory string as the input stream. The following code snippet shows how to initialize an XML reader with a well-formed XML string:
string xmlText = "...";
StringReader strReader = new StringReader(xmlText);
XmlTextReader reader = new XmlTextReader(strReader);
Likewise, using the StringWriter class in lieu of a TextWriter class, you can create XML documents in memory strings.
A special type of XML string is the XML fragment. The fragment is made of XML text in which the root-level rules for well-formed XML documents are not applied. As a result, an XML fragment differs from an ordinary document because it may lack a single root node. For example, the following XML string is a valid XML fragment but not a valid XML document since XML documents must have a root node:
<firstname>Dino</firstname>
<lastname>Esposito</lastname>
The .NET Framework XML API allows programmers to associate XML fragments with a parser context made of information like the encoding character set, the DTD document, namespaces, language, and whitespace handling:
public XmlTextReader(
string xmlFragment,
XmlNodeType fragType,
XmlParserContext context
);
The xmlFragment parameter contains the XML string to parse. The fragType argument represents the type of the fragment and is given by the type of the fragment's root node(s). Only element, attribute, and document nodes are permitted as the root of a fragment and the parser context is expressed by the XmlParserContext class.
Validating Readers
The XmlValidatingReader class is an implementation of the XmlReader class that provides support for several types of XML validation: DTD, XML-Data Reduced (XDR) schemas, and XSD. DTD and XSD are official recommendations issued by the W3C, whereas XDR is the Microsoft implementation of an early working draft of XML Schemas.
You can use the XmlValidatingReader class to validate entire XML documents as well as XML fragments. The XmlValidatingReader class works on top of an XML reader—typically an instance of the XmlTextReader class. The text reader is used to walk through the nodes of the document, whereas the validating reader is expected to validate each single piece of XML according to the requested validation type.
The XmlValidatingReader class implements only a very small subset of the functionalities that an XML reader must expose. The class always works on top of an existing XML reader and mirrors many methods and properties. The dependency of validating readers on an existing text reader is particularly evident if you look at the class constructors. An XML validating reader cannot be directly initialized from a file or a URL. The list of available constructors comprises the following overloads:
public XmlValidatingReader(XmlReader);
public XmlValidatingReader(Stream, XmlNodeType, XmlParserContext);
public XmlValidatingReader(string, XmlNodeType, XmlParserContext);
A validating reader can parse any XML fragments accessible through a string or an open stream, as well as any XML document for which a reader is provided.
The list of methods for which the class provides a significant implementation is very short. In addition to Read there are the Skip and the ReadTypedValue methods. The Skip method jumps over the children of the currently active node in the underlying reader. (It's worth noting that you cannot skip over badly formed XML text.) The Skip method also validates the skipped contents. The ReadTypedValue method returns the value of the underlying node as a CLR type. If the method can map an XSD type with a CLR type, it will return it as such. If a direct mapping isn't possible, the node value will be returned as a string.
The validating reader is just what its name suggests—a node-based reader that validates the structure of the current node against the current schema. The validation occurs incrementally; there is no method that returns a Boolean value to indicate whether the given document is valid. You move around the input document using the Read method as usual. Actually, you use the validating reader as you would any other XML-based .NET Framework reader. At each step, the structure of the currently visited node is validated against the specified schema and an exception is raised if an error is found. Figure 4 contains a console application that takes the name of a file on the command line and then outputs the results of the validation process.

Figure 4 Console App
using System;
using System.Xml;
using System.Xml.Schema;
class MyXmlValidApp
{
public MyXmlValidApp(String fileName)
{
try {
Validate(fileName);
}
catch (Exception e) {
Console.WriteLine("Error:\t{0}", e.Message);
Console.WriteLine("Exception raised: {0}",
e.GetType().ToString());
}
}
private void Validate(String fileName)
{
XmlTextReader xtr = new XmlTextReader(fileName);
XmlValidatingReader vreader = new XmlValidatingReader(xtr);
vreader.ValidationType = ValidationType.Auto;
vreader.ValidationEventHandler += new
ValidationEventHandler(this.ValidationEventHandle);
vreader.Read();
vreader.MoveToContent();
while (vreader.Read()) {}
xtr.Close();
vreader.Close();
}
public void ValidationEventHandle(Object sender,
ValidationEventArgs args)
{
Console.Write("Validation error: " + args.Message + "\r\n");
}
public static void Main(String[] args)
{
MyXmlValidApp o = new MyXmlValidApp(args[0]);
return;
}
}
The ValidationType property sets the type of validation you want—DTD, XSD, XDR, or none. If no validation type is specified (by using the ValidationType.Auto option), the reader automatically applies the validation it determines to be most appropriate for the document. The caller application is notified of any error through a ValidationEventHandler event. If you don't specify a custom event handler, an XML exception is thrown against the application. Defining a ValidationEventHandler method is a way to catch any XML exceptions caused by inconsistencies in the source document. Note that the reader's mechanisms for checking whether a document is well formed and for checking schema compliance are distinct. If a validating reader happens to work on a badly formed XML document, no event is fired but an XmlException exception is raised.
The validation takes place as the user moves the pointer forward with the Read method. Once the node has been parsed and read, it gets passed to the internal validator object for further processing. The validator operates based on the node type and the type of validation that has been requested. It makes sure that the node has all the attributes and the children it is expected to contain.
The validator object internally invokes two flavors of objects: the DTD parser and the schema builder. The DTD parser processes the content of the current node and its subtree against the DTD. The schema builder builds up a schema object model (SOM) for the current node based on the XDR or XSD schema source code. The schema builder class is actually the base class for more specialized XDR and XSD schema builders. What matters, though, is that XDR and XSD schemas are treated in much the same way and with no difference in performance.
If a node has children, another temporary reader is used to collect information about the children of the node so that the schema information for the node can be fully investigated, as you can see in Figure 5.
Figure 5 Using A Temporary Reader on Child Nodes
Note that although the signature of one of the XmlValidatingReader constructors refers generically to an XmlReader class as the underlying reader, that reader can only be an instance of the XmlTextReader class or a class which derives from it. This means that you cannot use any class which happens to inherit from XmlReader (such as a custom XML reader). Internally, the XmlValidatingReader class assumes that the underlying reader is an XmlTextReader object and specifically casts the input reader to XmlTextReader. If you use XmlNodeReader or a custom reader class, you will not get any error at compile time but an exception will be thrown at run time.
Node Readers
XML reader classes provide a way to process the contents of documents in an incremental way, one node after the other. So far, I've assumed that the source document is a disk-based stream or a string. However, nothing really prevents me from using an XMLDOM object as the source document. In this case, though, I would need a special class with an ad hoc Read method. For this purpose, the .NET Framework provides the XmlNodeReader class.
Just as XmlTextReader visits all the nodes of a specified XML stream, the XmlNodeReader class visits all the nodes that form an XMLDOM subtree. The XMLDOM class (XmlDocument in the .NET Framework) supplies XPath-based methods such as SelectNodes and SelectSingleNode. The net effect of such methods is that the matching node sets are loaded into memory. If you need to process all the nodes in the subtree, a node reader is more efficient as it just processes one node after the next in an incremental cursor-like manner:
// xmldomNode is the XML DOM node
XmlNodeReader nodeReader = new XmlNodeReader(xmldomNode);
while (nodeReader.Read()) {
// Do something here
}
Using the XmlNodeReader class along with the XML DOM classes is also beneficial when you're processing an XMLDOM tree full of custom data excerpted from a configuration file (for example, web.config).
The XmlTextWriter Class
Creating XML documents in a programmatic way has never been part