This documentation is archived and is not being maintained.

Special Character Conversion when Writing XML Content

The XmlWriter includes a method, WriteRaw, which allows you to write out raw markup manually. This method prevents special characters from being escaped. This is in contrast to the WriteString method, which escapes some strings to their equivalent entity reference. The characters that are escaped are given in the XML 1.0 recommendation in section 2.4 Character Data and Markup, and section 3.3.3 Attribute-Value Normalization of the Extensible Markup Language (XML) 1.0 (fourth edition) recommendation. If the WriteString method is called when writing an attribute value, it escapes ' and ". Character values 0x-0x1F are encoded as numeric character entities � through , except for the white space characters 0x9, 0x10, and 0x13.

Therefore, the guiding principle of when to use WriteString or WritingRaw is that WriteString is used when you need to walk through every character looking for entity characters, and WriteRaw writes exactly what it is given.

The WriteNode method copies everything from the current node, and the reader is positioned at the writer. The reader is then advanced to the next sibling node for further processing. The WriteNode method is a quick way to extract information from one document into another.

The following table shows the supported NodeTypes for the WriteNode method.

Node type

Description

Element

Writes out the element node and all attribute nodes.

Attribute

No operation. Use WriteStartAttribute or WriteAttributeString to write the attribute.

Text

Writes out the text node.

CDATA

Writes out the CDATA section node.

EntityReference

Writes the Entity Ref node.

ProcessingInstruction

Writes the PI node.

Comment

Writes the Comment node.

DocumentType

Writes the DocType Node.

Whitespace

Writes the Whitespace node.

SignificantWhitespace

Writes out the Whitespace node.

EndElement

No operation.

EndEntity

No operation.

The following example shows the difference between the WriteString and WriteRaw method when given the "<" character. This code sample uses WriteString.

w.WriteStartElement("myRoot");
w.WriteString("<");
w.WriteEndElement();
XmlTextWriter tw = new XmlTextWriter(Console.Out);
tw.WriteDocType(name, pubid, sysid, subset);

Output

<myRoot>&lt;</myRoot>

This code sample uses WriteRaw, and the output has an illegal character as the element content.

w.WriteStartElement("myRoot")
w.WriteRaw("<")
w.WriteEndElement()

w.WriteStartElement("myRoot");
w.WriteRaw("<");
w.WriteEndElement();

Output

<myRoot><</myRoot>

The following example shows how to convert an XML document from an element-centric document to an attribute-centric document. You can also convert an XML attribute-centric document back to an element-centric document. An element-centric mode means the XML document was designed to have many elements but few attributes. An attribute-centric design has fewer elements, and what would be elements in an element-centric design have been made attributes of elements. So there are fewer elements, but more attributes per element.

This sample is useful if you have designed the XML data in either mode, as this will allow it to be converted to the other mode.

The following XML is using an element-centric document. The elements contain no attributes.

Input - centric.xml

<?xml version='1.0' encoding='UTF-8'?>
<root>
    <Customer>
        <firstname>Jerry</firstname>
        <lastname>Larson</lastname>
        <Order>
        <OrderID>Ord-12345</OrderID>
          <OrderDetail>
            <Quantity>1301</Quantity>
            <UnitPrice>$3000</UnitPrice>
            <ProductName>Computer</ProductName>
          </OrderDetail>
        </Order>
    </Customer>
</root>

The following sample application does the conversion.

// The program will convert an element-centric document to an 
// attribute-centric document or element-centric to attribute-centric.
using System;
using System.Xml;
using System.IO;
using System.Text;
using System.Collections;

class ModeConverter {
    private const int bufferSize=2048;

    internal class ElementNode {
        String _name;
        String _prefix;
        String _namespace;
        bool   _startElement;
        internal ElementNode() {
            this._name = null;
            this._prefix = null;
            this._namespace = null;
            this._startElement = false;
        }
        internal ElementNode(String prefix, String name, String nameSpace) {
            this._name = name;
            this._prefix = prefix;
            this._namespace = nameSpace;
        }
        public String name{
            get { return _name;   }
        }
        public String prefix{
            get { return _prefix;   }
        }
        public String nameSpace{
            get { return _namespace;   }
        }
        public bool startElement{
            get { return _startElement;   }
            set { _startElement = value;}
        }
    }
    public static void Main(String[] args) {
        ModeConverter modeConverter = new ModeConverter();
        if (args[0]== null || args[0]== "?" || args.Length < 2 ) {
            modeConverter.Usage();
            return;
        }
        FileStream sourceFile = new FileStream(args[1], FileMode.Open, FileAccess.Read, FileShare.Read);
        FileStream targetFile = new FileStream(args[2], FileMode.Create, FileAccess.ReadWrite, FileShare.ReadWrite);
        if (args[0] == "-a") {
            modeConverter.ConertToAttributeCentric(sourceFile, targetFile);
        } else {
            modeConverter.ConertToElementCentric(sourceFile, targetFile);
        }
        return;
    }
    public void Usage() {
        Console.WriteLine("? This help message \n");
        Console.WriteLine("Convert -mode sourceFile, targetFile \n");
        Console.WriteLine("\t mode: e element centric\n");
        Console.WriteLine("\t mode: a attribute centric\n");
    }
    public void ConertToAttributeCentric(FileStream sourceFile, FileStream targetFile) {
        // Stack is used to track how many.
        Stack stack = new Stack();
        XmlTextReader reader = new XmlTextReader(sourceFile);
        reader.Read();
        XmlTextWriter writer = new XmlTextWriter(targetFile, reader.Encoding);
        writer.Formatting = Formatting.Indented;

        do {
            switch (reader.NodeType) {
                case XmlNodeType.XmlDeclaration:
                    writer.WriteStartDocument(null == reader.GetAttribute("standalone") || "yes" == reader.GetAttribute("standalone"));
                    break;

                case XmlNodeType.Element:
                    ElementNode element = new ElementNode(reader.Prefix, reader.LocalName, reader.NamespaceURI);

                    if (0 == stack.Count) {
                        writer.WriteStartElement(element.prefix, element.name, element.nameSpace);
                        element.startElement=true;
                    }

                    stack.Push(element);
                    break;

                case XmlNodeType.Attribute:
                    throw new Exception("We should never been here!");

                case XmlNodeType.Text:
                    ElementNode attribute = new ElementNode();
                    attribute = (ElementNode)stack.Pop();
                    element = (ElementNode)stack.Peek();
                    if (!element.startElement) {
                        writer.WriteStartElement(element.prefix, element.name, element.nameSpace);
                        element.startElement=true;
                    }
                    writer.WriteStartAttribute(attribute.prefix, attribute.name, attribute.nameSpace);
                    writer.WriteRaw(reader.Value);
                    reader.Read(); //jump over the EndElement
                    break;

                case XmlNodeType.EndElement:
                    writer.WriteEndElement();
                    stack.Pop();

                    break;

                case XmlNodeType.CDATA:
                    writer.WriteCData(reader.Value);
                    break;

                case XmlNodeType.Comment:
                    writer.WriteComment(reader.Value);
                    break;

                case XmlNodeType.ProcessingInstruction:
                    writer.WriteProcessingInstruction(reader.Name, reader.Value);
                    break;

                case XmlNodeType.EntityReference:
                    writer.WriteEntityRef( reader.Name);
                    break;

                case XmlNodeType.Whitespace:
                    writer.WriteWhitespace(reader.Value);
                    break;

                case XmlNodeType.None:
                    writer.WriteRaw(reader.Value);
                    break;

                case XmlNodeType.SignificantWhitespace:
                    writer.WriteWhitespace(reader.Value);
                    break;

                case XmlNodeType.DocumentType:
                    writer.WriteDocType(reader.Name, reader.GetAttribute("PUBLIC"), reader.GetAttribute("SYSTEM"), reader.Value);
                    break;

                case XmlNodeType.EndEntity:
                    break;


                default:
                    Console.WriteLine("UNKNOWN Node Type = " + ((int)reader.NodeType));
                    break;
             }
        } while (reader.Read());

        writer.WriteEndDocument();

        reader.Close();
        writer.Flush();
        writer.Close();
    }

    // Use the WriteNode to simplify the process.
    public void ConertToElementCentric(FileStream sourceFile, FileStream targetFile) {
        XmlTextReader reader = new XmlTextReader(sourceFile);
        reader.Read();
        XmlTextWriter writer = new XmlTextWriter(targetFile, reader.Encoding);
        writer.Formatting = Formatting.Indented;
        do {
            switch (reader.NodeType) {

                case XmlNodeType.Element:
                    writer.WriteStartElement(reader.Prefix, reader.LocalName, reader.NamespaceURI);
                    if (reader.MoveToFirstAttribute()) {
                         do {
                            writer.WriteStartElement(reader.Prefix, reader.LocalName, reader.NamespaceURI);
                            writer.WriteRaw(reader.Value);
                            writer.WriteEndElement();

                        } while(reader.MoveToNextAttribute());

                        writer.WriteEndElement();
                    }
                    break;

                case XmlNodeType.Attribute:
                     throw new Exception("We should never been here!");

                case XmlNodeType.Whitespace:
                    writer.WriteWhitespace(reader.Value);
                    break;

                case XmlNodeType.EndElement:
                    writer.WriteEndElement();
                    break;

                case XmlNodeType.Text:
                    throw new Exception("The input document is not a attribute centric document\n");

                default:
                    Console.WriteLine(reader.NodeType);
                    writer.WriteNode(reader, false);
                    break;
             }
        } while (reader.Read());

        reader.Close();
        writer.Flush();
        writer.Close();
    }
}

After the code is compiled, run it from the command line by typing in <compiled name> -a centric.xml <output file name>. The output file must exist and can be an empty text file.

For the following output, assuming the C# program was compiled to centric_cs, the command line is C:\centric_cs -a centric.xml centric_out.xml.

The mode of -a tells the application to convert the input XML to attribute-centric, while a mode of -e will change it to element-centric. The output below is the new attribute-centric output generated using the -a mode. The elements now contain attributes instead of nested elements.

<?xml version="1.0" encoding="utf-8" standalone="yes" ?>

<root>
<Customer firstname="Jerry" lastname="Larson">
 <Order OrderID="Ord-12345">
  <OrderDetail Quantity="1301" UnitPrice="$3000" ProductName="Computer" />
 </Order>
</Customer>
</root>

Show: