Searching for Content in Word 2007 Documents by Using the Open XML SDK 2.0 for Microsoft Office

Summary:  Learn how to use the typesafe classes and enumerations that are in the Open XML Software Development Kit (SDK) 2.0 for Microsoft Office to find paragraphs in a Microsoft Office Word 2007 document by style name or content. In addition, view and compare sample code that uses two different ways to accomplish the task; one by using the Open XML Format SDK 2.0 classes, and the other by using .NET Language Integrated Query for XML Data (XLINQ). (19 printed pages)

Office Visual How To

Applies to:  2007 Microsoft Office System, Microsoft Office Excel 2007, Microsoft Office PowerPoint 2007, Microsoft Office Word 2007, Microsoft Visual Studio 2008, Open XML SDK 2.0 for Microsoft Office

Joel Krist, iSoftStone

September 2009

Overview

The Open XMLSoftware Development Kit 2.0 for Microsoft Office makes it possible to create and manipulate Microsoft Office Word 2007, Microsoft Office Excel 2007, and Microsoft Office PowerPoint 2007 documents programmatically via the Open XML formats. The typesafe classes included with the SDK provide a layer of abstraction between the developer and the Open XML formats, simplifying the process of working with Office 2007 documents and enabling the creation of solutions that are not dependent on the presence of the Office client applications to handle document creation.

The sample code in this visual how-to article shows how to use the classes in the Open XML SDK 2.0 for Microsoft Office to find paragraphs in a Word 2007 document by style name or content. The sample solution also includes code that uses XLINQ to provide the same functionality so that you can compare the two coding styles.

See It Video startup screen

Watch the Video

Length: 00:05:02 | Size: 4.96 MB | Type: WMV file

Code It | Read It | Explore It

Code It

Download the sample code

This visual how-to article presents a solution that creates a Windows console application that finds paragraphs in a Word 2007 document by style name or content. The ideas and the code are based on the approach and the sample code in a series of blog posts by Eric White called Finding Paragraphs by Style Name or Content in an Open XML Word Processing Document that show how he developed a set of queries to provide the same functionality by using XLINQ.

The code in this how-to article uses the same approach as Eric's code but uses the Open XML SDK 2.0 for Microsoft Office classes. The accompanying sample code includes Eric's original code so that you can compare the two approaches.

This section walks through the following steps:

  1. Creating the test document.

  2. Creating a Windows console application solution in Microsoft Visual Studio 2008.

  3. Adding references to the DocumentFormat.OpenXml and WindowsBase assemblies.

  4. Adding the sample code to the solution.

Creating the Test Document

The sample code that accompanies this visual how-to article uses a known test document to verify that it is finding paragraphs correctly. For the sample code to work successfully, the test document must contain paragraphs that are in a specific order, that are formatted with specific styles, and that contain specific text.

NoteNote

The sample code includes the required test document, so if you download the sample code, you can skip the following steps, which specify how to create the test document.

To create the test document

  1. Start Word 2007.

  2. In the first paragraph of the new document, type Aaa, format it as a Heading 1, and then press ENTER to start a new paragraph.

  3. In the second paragraph type Bbb, format it as a Heading 2, and then press ENTER to start a new paragraph.

  4. In the third paragraph type Ccc, format it as a Heading 3, and then press ENTER to start a new paragraph.

  5. In the fourth paragraph, type 111 and then press ENTER key to start a new paragraph.

  6. In the fifth paragraph, type 222 and then press ENTER to start a new paragraph.

  7. In the sixth paragraph, type 333 and then press ENTER to start a new paragraph.

  8. In the seventh paragraph type Hello, format it in the Title style, and then press ENTER to start a new paragraph.

  9. In the eighth paragraph type hello and then press ENTER to start a new paragraph. Depending on the current settings, Word might automatically make the first letter upper case since it is the first word of the sentence. If so, press CTRL+Z to undo the last edit; make sure the string hello is all lower case.

  10. In the ninth paragraph, insert a table that has exactly three rows and two columns. Position the cursor in the paragraph after the table and press ENTER to create a paragraph after the table. Type the following strings in the table cells:

    Aaa          Bbb

    Ccc          Ddd

    Eee          Fff

  11. In the tenth paragraph, insert a Rich Text content control (click Aa in the Controls group on the Developer tab) and then press ENTER to create a paragraph after the control. Type aaa, bbb, and ccc in the content control, each in its own paragraph. Make sure that the text is all lower case.

    Figure 1. Text in the Rich Text content control

    Text in the Rich Text content control

  12. In the paragraph below the rich text content control, type Good-bye and then press ENTER to start a new paragraph.

  13. In the new paragraph, type good-bye, all lower case, and then press ENTER to start a new paragraph.

  14. In the final paragraph, insert a plain text content control and add the string 111 222 333 to the control.

    After you complete the procedure, the final document should look like the following figure.

    Figure 2. Final Test document

    Final Test document

  15. The sample code specifies that the test document is located in the C:\Temp folder. Save the test document with the name SearchContent.docx to the C:\Temp folder.

  16. Exit Word.

Creating a Windows Console Application in Visual Studio 2008

This visual how-to article uses a Windows console application to provide the framework for the sample code. However, you could use the same approach that is illustrated here with other application types as well.

To create a Windows Console Application in Visual Studio 2008

  1. Start Microsoft Visual Studio 2008.

  2. On the File menu, point to New, and then click Project.

  3. In the New Project dialog box select the Visual C# Windows type in the Project types pane.

  4. Select Console Application in the Templates pane, and then name the project ParagraphSearch.

    Figure 3. Create new solution in the New Project dialog box

    Create new solution in the New Project dialog box

     

  5. Click OK to create the solution.

Adding References to the DocumentFormat.OpenXml and WindowsBase Assemblies

The sample code uses the classes and enumerations that are in the DocumentFormat.OpenXml.dll assembly that is installed with the Open XML SDK 2.0 for Microsoft Office. To add the reference to the assembly in the following steps or to build the sample code that accompanies this visual how-to, you must first download and install the Open XML SDK 2.0 for Microsoft Office so that the assembly is available.

To add References to the DocumentFormat.OpenXml and WindowsBase Assemblies

  1. Add a reference to the DocumentFormat.OpenXml assembly by doing the following:

    1. On the Project menu in Visual Studio, click Add Reference to open the Add Reference dialog box.

    2. Select the .NET tab, scroll down to DocumenFormat.OpenXml, select it, and then click OK.

      Figure 4. Add Reference to DocumentFormat.OpenXML

      Add Reference to DocumentFormat.OpenXml

       

  2. The classes in the DocumentFormat.OpenXml assembly use the System.IO.Packaging.Package class that is defined in the WindowsBase assembly. Add a reference to the WindowsBase assembly by doing the following:

    1. On the Project menu in Visual Studio, click Add Reference to open the Add Reference dialog box.

    2. Select the .NET tab, scroll down to WindowsBase, select it, and then click OK.

      Figure 5. Add Reference to WindowsBase

      Add Reference to WindowsBase

       

Adding the Sample Code to the Solution

Replace the entire contents of the Program.cs source file with the following code.

// The default version of this code builds the Open XML Format SDK 2.0
// version of the solution. Uncomment the following #define to build
// the XLINQ version.

//#define XLINQ

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;

#if XLINQ
using System.IO;
using System.Xml;
using System.Xml.Linq;
#endif

using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;

public static class LocalExtensions
{
  public static string StringConcatenate<T>(this IEnumerable<T> source,
    Func<T, string> func)
  {
    StringBuilder sb = new StringBuilder();
    foreach (T item in source)
      sb.Append(func(item));
    return sb.ToString();
  }

  public static string StringConcatenate<T>(this IEnumerable<T> source,
    Func<T, string> func, string separator)
  {
    StringBuilder sb = new StringBuilder();
    foreach (T item in source)
      sb.Append(func(item)).Append(separator);
    if (sb.Length > separator.Length)
      sb.Length -= separator.Length;
    return sb.ToString();
  }

#if XLINQ
  public static XDocument GetXDocument(this OpenXmlPart part)
  {
    XDocument xdoc = part.Annotation<XDocument>();
    if (xdoc != null)
      return xdoc;
    using (StreamReader sr = new StreamReader(part.GetStream()))
    using (XmlReader xr = XmlReader.Create(sr))
      xdoc = XDocument.Load(xr);
    part.AddAnnotation(xdoc);
    return xdoc;
  }
#endif
}

#if XLINQ
public static class W
{
  public static XNamespace w =
    "http://schemas.openxmlformats.org/wordprocessingml/2006/main";

  public static XName style = w + "style";
  public static XName type = w + "type";
  public static XName styleId = w + "styleId";
  public static XName name = w + "name";
  public static XName val = w + "val";
  public static XName basedOn = w + "basedOn";
  public static XName r = w + "r";
  public static XName ins = w + "ins";
  // "default" is not a valid identifier, so must use _default
  public static XName _default = w + "default";
  public static XName body = w + "body";
  public static XName pPr = w + "pPr";
  public static XName pStyle = w + "pStyle";
  public static XName p = w + "p";
  public static XName t = w + "t";
}
#endif

class Program
{
  static bool ContainsAnyStyles(IEnumerable<string> stylesToSearch,
    IEnumerable<string> searchStrings)
  {
    return stylesToSearch.Intersect(searchStrings).Any();
  }

  static bool ContainsAnyContent(string stringToSearch,
    IEnumerable<string> searchStrings,
    IEnumerable<Regex> regularExpressions, bool isRegularExpression,
    bool caseInsensitive)
  {
    if (isRegularExpression)
      return regularExpressions.Any(r => r.IsMatch(stringToSearch));
    else
      if (caseInsensitive)
        return searchStrings.Any(
          s => stringToSearch.ToLower().Contains(s));
      else
        return searchStrings.Any(s => stringToSearch.Contains(s));
  }
  
#if XLINQ

  static IEnumerable<string> GetAllStyleIdsAndNames(
  WordprocessingDocument doc, string styleId)
  {
    string localStyleId = styleId;
    yield return styleId;
 
    string styleNameForFirstStyle = (string)doc
      .MainDocumentPart
      .StyleDefinitionsPart
      .GetXDocument()
      .Root
      .Elements(W.style)
      .Where(e => (string)e.Attribute(W.type) == "paragraph" &&
        (string)e.Attribute(W.styleId) == styleId)
      .Elements(W.name)
      .Attributes(W.val)
      .FirstOrDefault();

    if (styleNameForFirstStyle != null)
      yield return styleNameForFirstStyle;

    while (true)
    {
      XElement style = doc
        .MainDocumentPart
        .StyleDefinitionsPart
        .GetXDocument()
        .Root
        .Elements(W.style)
        .Where(e => (string)e.Attribute(W.type) == "paragraph" &&
          (string)e.Attribute(W.styleId) == localStyleId)
        .FirstOrDefault();

      if (style == null)
        yield break;

      var basedOn = (string)style
        .Elements(W.basedOn)
        .Attributes(W.val)
        .FirstOrDefault();

      if (basedOn == null)
        yield break;

      yield return basedOn;

      XElement basedOnStyle = doc
        .MainDocumentPart
        .StyleDefinitionsPart
        .GetXDocument()
        .Root
        .Elements(W.style)
        .Where(e => (string)e.Attribute(W.type) == "paragraph" &&
          (string)e.Attribute(W.styleId) == basedOn)
        .FirstOrDefault();


      string basedOnStyleName = (string)basedOnStyle
        .Elements(W.name)
        .Attributes(W.val)
        .FirstOrDefault();

      if (basedOnStyleName != null)
        yield return basedOnStyleName;

       localStyleId = basedOn;
    }
  }

  static int[] SearchInDocument(WordprocessingDocument doc,
    IEnumerable<string> styleSearchString,
    IEnumerable<string> contentSearchString,
    bool isRegularExpression, bool caseInsensitive)
  {
    RegexOptions options;
    Regex[] regularExpressions = null;
    if (isRegularExpression && contentSearchString != null)
    {
      if (caseInsensitive)
        options = RegexOptions.IgnoreCase | RegexOptions.Compiled;
      else
        options = RegexOptions.Compiled;
      regularExpressions = contentSearchString
        .Select(s => new Regex(s, options)).ToArray();
    }

    string[] contentSearchStringToUse = null;
    if (contentSearchString != null)
    {
      if (!isRegularExpression && caseInsensitive)
        contentSearchStringToUse =
          contentSearchString.Select(s => s.ToLower()).ToArray();
      else
        contentSearchStringToUse = contentSearchString.ToArray();
    }

    var defaultStyleName = (string)doc
      .MainDocumentPart
      .StyleDefinitionsPart
      .GetXDocument()
      .Root
      .Elements(W.style)
      .Where(style =>
        (string)style.Attribute(W.type) == "paragraph" &&
        (string)style.Attribute(W._default) == "1")
      .First()
      .Attribute(W.styleId);

    var q1 = doc
      .MainDocumentPart
      .GetXDocument()
      .Root
      .Element(W.body)
      .Elements()
      .Select((p, i) =>
      {
        var styleNode = p
          .Elements(W.pPr)
          .Elements(W.pStyle)
          .FirstOrDefault();
        var styleName = styleNode != null ?
          (string)styleNode.Attribute(W.val) :
          defaultStyleName;
        return new
        {
          Element = p,
          Index = i,
          StyleName = styleName
        };
       }
       );

    var q2 = q1
      .Select(i =>
      {
        string text = null;
        if (i.Element.Name == W.p)
          text = i.Element.Elements()
            .Where(z => z.Name == W.r || z.Name == W.ins)
            .Descendants(W.t)
            .StringConcatenate(element => (string)element);
        else
          text = i.Element
            .Descendants(W.p)
            .StringConcatenate(p => p
              .Elements()
              .Where(z => z.Name == W.r || z.Name == W.ins)
              .Descendants(W.t)
              .StringConcatenate(element => (string)element),
                Environment.NewLine);
    
        return new
        {
          Element = i.Element,
          StyleName = i.StyleName,
          Index = i.Index,
          Text = text
        };
      }
      );

    var q3 = q2
      .Select(i =>
        new
        {
          Element = i.Element,
          StyleName = i.StyleName,
          Index = i.Index,
          Text = i.Text,
          InheritedStyles =
            GetAllStyleIdsAndNames(doc, i.StyleName).Distinct()
        }
      );

    int[] q4 = null;
    if (styleSearchString != null)
      q4 = q3
        .Where(i => ContainsAnyStyles(
          i.InheritedStyles, styleSearchString))
        .Select(i => i.Index)
        .ToArray();

    int[] q5 = null;
    if (contentSearchStringToUse != null)
      q5 = q3
        .Where(i => ContainsAnyContent(
          i.Text, contentSearchStringToUse, regularExpressions,
          isRegularExpression, caseInsensitive))
        .Select(i => i.Index)
        .ToArray();

    int[] q6 = null;
    if (q4 != null && q5 != null)
      q6 = q4.Intersect(q5).ToArray();
    else
      q6 = q5 != null ? q5 : q4;

    return q6;
  }

#else

  static IEnumerable<string> GetAllStyleIdsAndNames(
    WordprocessingDocument doc, string styleId)
  {
    string localStyleId = styleId;
    yield return styleId;

    string styleNameForFirstStyle = doc
      .MainDocumentPart
      .StyleDefinitionsPart
      .Styles
      .Elements<Style>()
      .Where(e => e.Type == StyleValues.Paragraph &&
        e.StyleId.Value == styleId)
      .FirstOrDefault()
      .Elements<StyleName>()
      .FirstOrDefault()
      .Val.Value;

    if (styleNameForFirstStyle != null)
      yield return styleNameForFirstStyle;

    while (true)
    {
      Style style = doc
        .MainDocumentPart
        .StyleDefinitionsPart
        .Styles
        .Elements<Style>()
        .Where(e => e.Type == StyleValues.Paragraph &&
          e.StyleId.Value == localStyleId)
        .FirstOrDefault();

      if (style == null)
        yield break;

      var basedOn = style
        .Elements<BasedOn>()
        .FirstOrDefault();

      if (basedOn == null)
        yield break;

      yield return basedOn.Val.Value;

      Style basedOnStyle = doc
        .MainDocumentPart
        .StyleDefinitionsPart
        .Styles
        .Elements<Style>()
        .Where(e => e.Type == StyleValues.Paragraph &&
          e.StyleId.Value == basedOn.Val.Value)
        .FirstOrDefault();

      var basedOnStyleName = style
        .Elements<StyleName>()
        .FirstOrDefault()
        .Val.Value;

      if (basedOnStyleName != null)
        yield return basedOnStyleName;

      localStyleId = basedOn.Val.Value;
    }
  }
  
  static int[] SearchInDocument(WordprocessingDocument doc,
    IEnumerable<string> styleSearchString,
    IEnumerable<string> contentSearchString,
    bool isRegularExpression, bool caseInsensitive)
  {
    RegexOptions options;
    Regex[] regularExpressions = null;
    if (isRegularExpression && contentSearchString != null)
    {
      if (caseInsensitive)
        options = RegexOptions.IgnoreCase | RegexOptions.Compiled;
      else
        options = RegexOptions.Compiled;
      regularExpressions = contentSearchString
        .Select(s => new Regex(s, options)).ToArray();
    }

    string[] contentSearchStringToUse = null;
    if (contentSearchString != null)
    {
      if (!isRegularExpression && caseInsensitive)
        contentSearchStringToUse =
          contentSearchString.Select(s => s.ToLower()).ToArray();
      else
        contentSearchStringToUse = contentSearchString.ToArray();
    }

    var defaultStyleName = doc
      .MainDocumentPart
      .StyleDefinitionsPart
      .Styles
      .Elements<Style>()
      .Where(style => style.Type == StyleValues.Paragraph &&
        style.Default == BooleanValues.One)
      .First()
      .StyleId.Value;

    var q1 = doc
      .MainDocumentPart
      .Document
      .Body
      .Elements()
      .Select((p, i) =>
      {
        var styleNode = p
          .Descendants<ParagraphStyleId>()
          .FirstOrDefault();
        var styleName = styleNode != null ?
          styleNode.Val.Value :
          defaultStyleName;
        return new
        {
          Element = p,
          Index = i,
          StyleName = styleName
        };
      }
      );

    var q2 = q1
      .Select(i =>
      {
        string text = null;
        if (i.Element is Paragraph)
          text = i.Element
            .Descendants<Text>()
            .Where(z => z.Parent is Run || z.Parent is InsertedRun)
            .StringConcatenate(element => element.Text);
        else
          text = i.Element
            .Descendants<Paragraph>()
            .StringConcatenate(p => p
              .Descendants<Text>()
              .Where(z => z.Parent is Run || z.Parent is InsertedRun)
              .StringConcatenate(element => element.Text),
              Environment.NewLine);

        return new
        {
          Element = i.Element,
          StyleName = i.StyleName,
          Index = i.Index,
          Text = text
        };
      }
      );

    var q3 = q2
      .Select(i =>
        new
        {
          Element = i.Element,
          StyleName = i.StyleName,
          Index = i.Index,
          Text = i.Text,
          InheritedStyles =
            GetAllStyleIdsAndNames(doc, i.StyleName).Distinct()
        }
      );

    int[] q4 = null;
    if (styleSearchString != null)
      q4 = q3
        .Where(i => ContainsAnyStyles(
          i.InheritedStyles, styleSearchString))
        .Select(i => i.Index)
        .ToArray();

    int[] q5 = null;
    if (contentSearchStringToUse != null)
      q5 = q3
        .Where(i => ContainsAnyContent(
          i.Text, contentSearchStringToUse, regularExpressions,
          isRegularExpression, caseInsensitive))
        .Select(i => i.Index)
        .ToArray();

    int[] q6 = null;
    if (q4 != null && q5 != null)
      q6 = q4.Intersect(q5).ToArray();
    else
      q6 = q5 != null ? q5 : q4;

    return q6;
  }

#endif

  static int[] SearchInDocument(string filename,
    IEnumerable<string> styleSearchString,
    IEnumerable<string> contentSearchString,
    bool isRegularExpression, bool caseInsensitive)
  {
    using (WordprocessingDocument doc =
      WordprocessingDocument.Open(filename, false))

      return SearchInDocument(doc, styleSearchString,
        contentSearchString, isRegularExpression, caseInsensitive);
  }

  static int[] SearchInDocument(string filename,
    string styleSearchString, string contentSearchString,
    bool isRegularExpression, bool caseInsensitive)
  {
    return SearchInDocument(filename,
      styleSearchString != null ?
        new List<string>() { styleSearchString } : null,
      contentSearchString != null ?
        new List<string>() { contentSearchString } : null,
      isRegularExpression, caseInsensitive);
  }

  static void Main(string[] args)
  {
    string fileToSearch = @"C:\Temp\SearchContent.docx";

#if XLINQ
    Console.WriteLine("Using XLINQ");
    Console.WriteLine("-----------");
#else
    Console.WriteLine("Using Open XML Format SDK 2.0");
    Console.WriteLine("-----------------------------");
#endif

    Console.WriteLine("Test 1");
    int[] results1 = SearchInDocument(
      fileToSearch, new[] { "Normal" }, new[] { "h.*o", "aaa" },
      true, false);
    foreach (var i in results1) Console.WriteLine(i);
    Console.WriteLine(results1.SequenceEqual(new[] { 7, 10 }) ?
      "Passed" : "Failed");
    Console.WriteLine();

    Console.WriteLine("Test 2");
    int[] results2 = SearchInDocument(
      fileToSearch, new[] { "NotAStyle" }, new[] { "h.*o", "aaa" },
      true, false);
    foreach (var i in results2) Console.WriteLine(i);
    Console.WriteLine(results2.SequenceEqual(new int[] { }) ?
      "Passed" : "Failed");
    Console.WriteLine();

    Console.WriteLine("Test 3");
    int[] results3 = SearchInDocument(
      fileToSearch, new[] { "Heading1" }, null, true, false);
    foreach (var i in results3) Console.WriteLine(i);
    Console.WriteLine(results3.SequenceEqual(new int[] { 0 }) ?
      "Passed" : "Failed");
    Console.WriteLine();

    Console.WriteLine("Test 4");
    int[] results4 = SearchInDocument(
      fileToSearch, new[] { "Normal" }, new[] { "h.*o", "aaa" },
      true, true);
    foreach (var i in results4) Console.WriteLine(i);
    Console.WriteLine(
      results4.SequenceEqual(new int[] { 0, 6, 7, 8, 10 }) ?
      "Passed" : "Failed");
    Console.WriteLine();

    Console.WriteLine("Test 5");
    int[] results5 = SearchInDocument(
      fileToSearch, null, new[] { "hello", "aaa" }, false, false);
    foreach (var i in results5) Console.WriteLine(i);
    Console.WriteLine(
      results5.SequenceEqual(new int[] { 7, 10 }) ?
      "Passed" : "Failed");
    Console.WriteLine();

    Console.WriteLine("Test 6");
    int[] results6 = SearchInDocument(
      fileToSearch, null, new[] { "hello", "aaa" }, false, true);
    foreach (var i in results6) Console.WriteLine(i);
    Console.WriteLine(
      results6.SequenceEqual(new int[] { 0, 6, 7, 8, 10 }) ?
      "Passed" : "Failed");
    Console.WriteLine();

    Console.WriteLine("Test 7");
    int[] results7 = SearchInDocument(fileToSearch, "Heading1", "Aaa",
      false, false);
    foreach (var i in results7) Console.WriteLine(i);
    Console.WriteLine(results7.SequenceEqual(new int[] { 0 }) ?
      "Passed" : "Failed");
    Console.WriteLine();
  }
}

Press CTRL+F5 to build and run the solution in Visual Studio. When you build and run the code, it opens SearchContent.docx, the test document that is located in the C:\Temp folder, and performs a set of paragraph searches on the document. To change the name or location of the test document, modify the sample code and change the value of the fileToSearch variable that is defined in the Main method. The default version of the sample code builds the version of the solution that uses the Open XML SDK 2.0 classes. To build the XLINQ version, uncomment the #define XLINQ statement at the beginning of the source file.

//#define XLINQ

To validate the search results, the code compares the contents of the test document to a set of specifications. The following figure shows the output that the code displays, including the index of the paragraph that matched each search and a Passed or Failed value that denotes whether the expected results were returned.

Figure 6. Search results output

Search results output

Read It

This section uses code snippets from the Code It section to compare and contrast code that uses classes from the Open XML SDK 2.0 to search for paragraphs in Word as opposed to using XLINQ.

NoteNote

For a thorough explanation of the functionality and the process that led to the development of the initial XLINQ queries, see Finding Paragraphs by Style Name or Content in an Open XML Word Processing Document from Eric White's blog.

Two methods differentiate the code in this visual how-to article and the code in Eric's blog: the GetAllStyleIdsAndNames method and the version of the overridden SearchInDocument method that accepts a WordprocessingDocument parameter. The next section focuses on two representative changes to those methods that illustrate some of the differences between the two coding approaches.

The code fragments that follow show how the SearchInDocument method determines the name of the default style that is applied to paragraphs that do not have a specific style applied to them. Both coding styles get the main document part and then the style definitions part.

After that, the XLINQ code uses the GetXDocument extension method to stream the XML from the style definitions part into an XDocument instance. That way, it can query the part content for the style that is tagged as the default paragraph style.

In contrast, the code that uses the SDK classes shields you from the underlying WordprocessingML schema. It uses the StyleDefinitionsPart.Styles property and then queries for the default paragraph style by using the Style.Type and Style.Default properties, and the StyleValues and BooleanValues enumerations.

The following is the code that uses XLINQ.

var defaultStyleName = (string)doc
  .MainDocumentPart
  .StyleDefinitionsPart
  .GetXDocument()
  .Root
  .Elements(W.style)
  .Where(style =>
    (string)style.Attribute(W.type) == "paragraph" &&
    (string)style.Attribute(W._default) == "1")
  .First()
  .Attribute(W.styleId);

 

The following is the code that uses the Open XML SDK 2.0 classes.

var defaultStyleName = doc
  .MainDocumentPart
  .StyleDefinitionsPart
  .Styles
  .Elements<Style>()
  .Where(style => style.Type == StyleValues.Paragraph &&
    style.Default == BooleanValues.One)
  .First()
  .StyleId.Value;

The next two code fragments are from the GetAllStyleIdsAndNames method and show how the code determines the base style for a given style. This allows the code to search for a base style and to find paragraphs that are formatted with styles that are inherited from the base style. The XLINQ code uses a static class named W to help manage the XName objects that are used when querying for elements and attributes. The code in the second fragment uses the BasedOn class to find the element, which frees you from working with the element and attribute names.

The following is the code that uses XLINQ.

var basedOn = (string)style
  .Elements(W.basedOn)
  .Attributes(W.val)
  .FirstOrDefault();

 

The following is the code that uses the Open XML SDK 2.0 classes.

var basedOn = style
  .Elements<BasedOn>()
  .FirstOrDefault();

Explore It