Creating Valid Open XML Documents by Using the Validation Tools in the Open XML Format SDK

Summary: Learn how to use the validation functionality built into the Open XML Format SDK 2.0 to create Open XML files for the 2007 Microsoft Office system that comply with the Open XML schema and standard. (7 printed pages)

Tarik Nesh-Nash, Microsoft Corporation

April 2009

Applies to: 2007 Microsoft Office system

Contents

  • Introduction to the Problem and the Solution

  • Creating a Validation Helper Routine

  • Creating a Table in a Document

  • Adding a Cell to the Table

  • Validating Documents

  • Conclusion

  • Additional Resources

Introduction to the Problem and the Solution

One challenge that Open XML developers face is how to create Open XML files that comply fully with the Open XML schema and standard, and that are also interoperable with Office client applications such as Microsoft Office Word 2007.

With the release in April 2009 of the second Open XML Format SDK 2.0 Community Technology Preview (CTP), developers can use the validation functionality built into the SDK to create those fully compliant Open XML files. Although the SDK's validation feature does not catch all validation errors, it is nevertheless a powerful developer tool.

This article describes a step-by-step process that you can use to create and modify your Open XML documents. The particular problem that the article addresses is how to create and modify a table in a new Open XML document, but you can apply the principles and constraints that you learn in this article to all of your Open XML documents.

Important noteImportant

Before you read this article, be sure that you are already familiar with the Open XML documentation and with the Open XML Format SDK tools, particularly ClassExplorer.

Creating a Validation Helper Routine

The following code example uses a simple routine to check whether a document is valid. It uses the two main validation classes in the Open XML Format SDK:

  • OpenXmlValidation — Specifies the validation entry point; contains a Validate method that can validate a document, package, part, or element.

  • ValidationErrorInfo — Stores information about a particular validation error. The information includes the error description and the source of the error (Node, Part, or XPath).

public static bool IsDocumentValid(WordprocessingDocument mydoc)
{
    OpenXmlValidator validator = new OpenXmlValidator();
    var errors = validator.Validate(mydoc);
    foreach (ValidationErrorInfo error in errors)
        Debug.Write(error.Description);
    return (errors.Count() == 0);
}

This routine returns false if the SDK validation discovers any error, and prints the error description in the debug console. You could also set a breakpoint and use the Watch window to get more information from the runtime ValidationErrorInfo.

Creating a Table in a Document

Suppose that you decide to start writing code right away with the idea that you can learn progressively from experience. To begin, you try to create a table in a document by writing the following code.

public static void LearnInsertTable(string file)
{
    using (WordprocessingDocument myDoc =
            WordprocessingDocument.Create(file,
            WordprocessingDocumentType.Document))
    {
            MainDocumentPart mainPart = myDoc.AddMainDocumentPart();
            mainPart.Document = new Document();
            mainPart.Document.Append(new Table());

            Debug.Assert(IsDocumentValid(myDoc), "Invalid File!");
    }

This validation fails and generates the following information:

  • Description   The element has an invalid child element \http://schemas.openxmlformats.org/wordprocessingml/2006/main:tbl\. List of possible elements expected: <http://schemas.openxmlformats.org/wordprocessingml/2006/main:background>.

  • Node   {DocumentFormat.OpenXml.Wordprocessing.Document}

In other words, you cannot append Table to Document.

To learn what you must do, open ClassExplorer and investigate Document and Table. It turns out that Document requires a Body child element, and that Table must be appended to that Body. Update your code accordingly.

public static void LearnInsertTable(string file)
{
    using (WordprocessingDocument myDoc =
            WordprocessingDocument.Create(file,
            WordprocessingDocumentType.Document))
    {
            MainDocumentPart mainPart = myDoc.AddMainDocumentPart();
            mainPart.Document = new Document();
            Body body = new Body();
            body.Append(new Table());
            mainPart.Document.Append(body);
            Debug.Assert(IsDocumentValid(myDoc), "Invalid File!");
    }
}

If you run the code again, the validation fails and generates the following information.

  • Description   The element has incomplete content. List of possible elements expected: <http://schemas.openxmlformats.org/wordprocessingml/2006/main:tblPr>.

  • Node   {DocumentFormat.OpenXml.Wordprocessing.Table}

In short, the Table is missing content; you must add TableProperties (tblPr) to the code.

Delete the following code.

                body.Append(new Table());

Replace it with this code.

            Table table = new Table();
            table.Append(new TableProperties());
            body.Append(table);

When you run the code, you get another validation error that tells you that you must first add the TableGrid (tblGrid) after the TableProperties, so you modify the code as follows.

            Table table = new Table();
            table.Append(new TableProperties());
            table.Append(new TableGrid());
            body.Append(table);

Now the application runs successfully, and the generated code opens in Office Word 2007 correctly.

Adding a Cell to the Table

The generated document has a single blank table cell at the top of the page.

The following code shows how you might begin to add a cell that contains the text "Hello!".

          table.Append(new TableCell(new Text("Hello!")));

As you might expect, when you try to run the code, the validation fails.

  • Description   The element has an invalid child element \http://schemas.openxmlformats.org/wordprocessingml/2006/main:tc\.   String

  • Node   {DocumentFormat.OpenXml.Wordprocessing.Table}

In other words, you cannot insert a cell inside a table.

When you open ClassExplorer and read the documentation about Table, you discover that you must first add a row, and then insert a cell inside that row.

          table.Append(new TableRow(new TableCell(new Text("Hello!")));

The code still fails, but this time the validation fails because the child element of TableCell is invalid. When you read the documentation about TableCell in ClassExplorer, you discover that it requires a Paragraph.

You update the code for your cell as follows.

table.Append(
    new TableRow(
        new TableCell(
            new Paragraph(new Run(new Text("Hello!"))))));

The application runs successfully, and the document opens in Word.

The code for your table and cells now looks like the following.

public static void LearnInsertTable(string file)
{
    using (WordprocessingDocument myDoc =
        WordprocessingDocument.Create(file, WordprocessingDocumentType.Document))
    {
        MainDocumentPart mainPart = myDoc.AddMainDocumentPart();
        mainPart.Document = new Document();
        Body body = new Body();
        mainPart.Document.Body = body;
        
        Table table = new Table();
        table.Append(new TableProperties());
        table.Append(new TableGrid());
        table.Append(new TableRow(new TableCell(new Paragraph(new Run(new Text("Hello!"))))));
        body.Append(table);
        myDoc.MainDocumentPart.Document.Save();

        Debug.Assert(IsDocumentValid(myDoc), "Invalid File!");
    }
}

The following example shows the final code to create the document, table, and cells.

public static bool IsDocumentValid(WordprocessingDocument mydoc)
{
    OpenXmlValidator validator = new OpenXmlValidator();
    var errors = validator.Validate(mydoc);
    foreach (ValidationErrorInfo error in errors)
        Debug.Write(error.Description);
    return (errors.Count() == 0);
}

public static void LearnInsertTable(string file)
{
    using (WordprocessingDocument myDoc =
        WordprocessingDocument.Create(file, WordprocessingDocumentType.Document))
    {
        MainDocumentPart mainPart = myDoc.AddMainDocumentPart();
        mainPart.Document = new Document();
        Body body = new Body();
        mainPart.Document.Body = body;
        
        Table table = new Table();
        
        table.Append(new TableProperties());
        table.Append(new TableGrid());
        table.Append(new TableRow(new TableCell(new Paragraph(new Run(new Text("Hello!"))))));
        body.Append(table);
        myDoc.MainDocumentPart.Document.Save();
        // mainPart.Document.Append(body);

        Debug.Assert(IsDocumentValid(myDoc), "Invalid File!");
    }
}

Validating Documents

Sometimes you must debug invalid and corrupted documents without access to the underlying code. You can use the Path property in the Open XML Format SDK validator to help you determine the problem.

For the sake of example, suppose that you start with the following Document.xml.

<w:document>
    <w:tbl/>
</w:document>

When you attempt to validate this document, it generates the following error.

  • Description   The element has invalid child element \http://schemas.openxmlformats.org/wordprocessingml/2006/main:tbl\. List of possible elements expected: <http://schemas.openxmlformats.org/wordprocessingml/2006/main:background>.

  • Node   {DocumentFormat.OpenXml.Wordprocessing.Document}

  • Part   {DocumentFormat.OpenXml.Packaging.MainDocumentPart}

  • Path   /w:document[1]

The Part property returns the name of the part where the validation error exists; in this case, it is in the MainDocumentPart.

The Path property returns the XPath of the node where the error exists. In this case, it is the first instance of the Document node.

Using that information, you can determine that tbl is an invalid child of Document. From there, you can follow a similar process to the one ealier in this article, and change the Open XML file accordingly.

Ultimately, the correct part is as follows.

<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
  <w:body>
    <w:tbl>
      <w:tblPr />
      <w:tblGrid />
      <w:tr>
        <w:tc>
          <w:p>
            <w:r>
              <w:t>Hello!</w:t>
            </w:r>
          </w:p>
        </w:tc>
      </w:tr>
    </w:tbl>
  </w:body>
</w:document>

Conclusion

You can use the Open XML Format SDK validation features to create and modify Open XML documents with very little knowledge about the Open XML file format.

Although the SDK validation might not catch every validation error in an Open XML document, it can be a powerful tool that helps you debug code and learn more about the Open XML format.

Additional Resources

For more information, see the following resources: