Export (0) Print
Expand All

How to: Stream XML Fragments from an XmlReader

When you have to process large XML files, it might not be feasible to load the whole XML tree into memory. This topic shows how to stream fragments using an XmlReader.

One of the most effective ways to use an XmlReader to read XElement objects is to write your own custom axis method. An axis method typically returns a collection such as IEnumerable<T> of XElement, as shown in the example in this topic. In the custom axis method, after you create the XML fragment by calling the ReadFrom method, return the collection using yield return. This provides deferred execution semantics to your custom axis method.

When you create an XML tree from an XmlReader object, the XmlReader must be positioned on an element. The ReadFrom method does not return until it has read the close tag of the element.

If you want to create a partial tree, you can instantiate an XmlReader, position the reader on the node that you want to convert to an XElement tree, and then create the XElement object.

The topic How to: Stream XML Fragments with Access to Header Information contains information and an example on how to stream a more complex document.

The topic How to: Perform Streaming Transform of Large XML Documents contains an example of using LINQ to XML to transform extremely large XML documents while maintaining a small memory footprint.

This example creates a custom axis method. You can query it by using a LINQ query. The custom axis method, StreamRootChildDoc, is a method that is designed specifically to read a document that has a repeating Child element.

Note Note

The following example uses the yield return construct of C#. Equivalent code is provided in Visual Basic using a class that implements the IEnumerable(Of XElement) interface. For an example of implement IEnumerable(Of T) in Visual Basic, see Walkthrough: Implementing IEnumerable(Of T) in Visual Basic.

static IEnumerable<XElement> StreamRootChildDoc(StringReader stringReader)
{
    using (XmlReader reader = XmlReader.Create(stringReader))
    {
        reader.MoveToContent();
        // Parse the file and display each of the nodes.
        while (reader.Read())
        {
            switch (reader.NodeType)
            {
                case XmlNodeType.Element:
                    if (reader.Name == "Child") {
                        XElement el = XElement.ReadFrom(reader) as XElement;
                        if (el != null)
                            yield return el;
                    }
                    break;
            }
        }
    }
}

static void Main(string[] args)
{
    string markup = @"<Root>
      <Child Key=""01"">
        <GrandChild>aaa</GrandChild>
      </Child>
      <Child Key=""02"">
        <GrandChild>bbb</GrandChild>
      </Child>
      <Child Key=""03"">
        <GrandChild>ccc</GrandChild>
      </Child>
    </Root>";

    IEnumerable<string> grandChildData =
        from el in StreamRootChildDoc(new StringReader(markup))
        where (int)el.Attribute("Key") > 1
        select (string)el.Element("GrandChild");

    foreach (string str in grandChildData) {
        Console.WriteLine(str);
    }
}

This example produces the following output:

bbb
ccc

In this example, the source document is very small. However, even if there were millions of Child elements, this example would still have a small memory footprint.

Show:
© 2014 Microsoft