Export (0) Print
Expand All
Expand Minimize

Manipulating Excel 2007 and PowerPoint 2007 Files with the Open XML Format API (Part 1 of 2)

Office 2007

Summary: The Welcome to the Open XML Format SDK 1.0 is a library for accessing Open XML Format files. This article, the first in a series of two, describes the Open XML object model code that you can use to access and manipulate Microsoft Office Excel 2007 and Microsoft Office PowerPoint 2007 files. (12 printed pages)

The 2007 Microsoft Office system introduces new file formats that are based on XML called Open XML Formats. Microsoft Office Word 2007, Microsoft Office Excel 2007, and Microsoft Office PowerPoint 2007 all use these formats as the default file format. Open XML formats are useful because they are an open standard and are based on well-known technologies: ZIP and XML. Microsoft provides a library for accessing these files as part of the .NET Framework 3.0 technologies in the DocumentFormat.OpenXml namespace in the Welcome to the Open XML Format SDK 1.0. The Open XML Format members are contained in theDocumentFormat.OpenXml API and provide strongly-typed part classes to manipulate Open XML documents. The SDK simplifies the task of manipulating Open XML packages. The Open XML Format API encapsulates many common tasks that developers perform on Open XML Format packages, so you can perform complex operations with just a few lines of code.

NoteNote

You can find additional samples of manipulating Open XML Format files as well as references for each member contained in the Open XML object model in the Open XML Format SDK 1.0.

The Open XML Package Convention specification defines a set of XML files that contain the content and define the relationships for all of the parts stored in a single package. These packages combine the parts that make up the document files for the 2007 Microsoft Office programs that support the Open XML Format. The Open XML Format API discussed in this article allows you to create packages and manipulate the files that make up the packages.

In the following code, you remove document comments for a specific user name.

public static void XLDeleteCommentsByUser(string fileName, string userName)
{
   const string commentsSchema = "http://schemas.openxmlformats.org/spreadsheetml/2006/main";

   using (SpreadsheetDocument doc = SpreadsheetDocument.Open(fileName, true))
   {
      //  Manage namespaces to perform XML XPath queries.
      NameTable nt = new NameTable();
      XmlNamespaceManager nsManager = new XmlNamespaceManager(nt);
      nsManager.AddNamespace("sh", commentsSchema);

      foreach (WorksheetPart part in doc.WorkbookPart.WorksheetParts)
      {
         if (part.WorksheetCommentsPart != null)
         {
            XmlDocument commentDoc = new XmlDocument();
            commentDoc.Load(part.WorksheetCommentsPart.GetStream());
            string searchString = string.Format("//sh:authors/sh:author[text()='{0}']", userName);
            XmlNode node = commentDoc.SelectSingleNode(searchString, nsManager);
            if (node != null)
            {
                XmlNodeList nodes = node.SelectNodes("preceding-sibling::sh:author", nsManager);
                int authorID = nodes.Count;

                XmlNode commentListNode = commentDoc.SelectSingleNode("sh:comments/sh:commentList", nsManager);
                if (commentListNode != null)
                {
                   searchString = string.Format("./sh:comment[@authorId='{0}']", authorID);
                   XmlNodeList comments = commentListNode.SelectNodes(searchString, nsManager);
                   foreach (System.Xml.XmlNode commentNode in comments)
                   {
                      commentListNode.RemoveChild(commentNode);
                   }
                }
            }
            //  Save the comment XML back to its part.
       commentDoc.Save(part.WorksheetCommentsPart.GetStream(FileMode.Create));
         }
      }
   }
}

In this procedure, you pass in two parameters: the full path to the file and the user name of the person whose comments you want to delete. Then you open the input file as an Open XML package by using the Open method of the SpreadsheetDocument object. Next, you set up a namespace manager by using the XmlNamespaceManager object and by setting a reference to the default SpreadsheetML namespace with the sh qualifier.

The procedure then checks for the existence of the comments part in the package. The contents of the comments part are loaded into a memory-resident XML document. Then you search the workbook for nodes by a specific user name by using the XPath expression sh:authors/sh:author. If the user name is found, you then find the ordinal value of the user name. Then you retrieve the index of the selected node. To do that, you count the number of preceding nodes by retrieving a reference to those nodes.

Finally, the updated comments markup stream is saved back to the comments part.

In the following code, you delete a worksheet from a spreadsheet document.

public static bool XLDeleteSheet(string fileName, string sheetToDelete)
{
   bool returnValue = false;
   using (SpreadsheetDocument xlDoc = SpreadsheetDocument.Open(fileName, true))
   {
      XmlDocument doc = new XmlDocument();
      doc.Load(xlDoc.WorkbookPart.GetStream());

      XmlNamespaceManager nsManager = new XmlNamespaceManager(doc.NameTable);
      nsManager.AddNamespace("d", doc.DocumentElement.NamespaceURI);

      string searchString = string.Format("//d:sheet[@name='{0}']", sheetToDelete);
      XmlNode node = doc.SelectSingleNode(searchString, nsManager);
      if (node != null)
      {
         XmlAttribute relationAttribute = node.Attributes["r:id"];
         if (relationAttribute != null)
         {
            string relId = relationAttribute.Value;
            xlDoc.WorkbookPart.DeletePart(relId);
            node.ParentNode.RemoveChild(node);
            doc.Save(xlDoc.WorkbookPart.GetStream(FileMode.Create));
            returnValue = true;
          }
      }
   }
   return returnValue;
}

In this procedure, you pass in two parameters: the full path to the workbook and the name of the worksheet to delete. Then you open the input file as an Open XML package by using the Open method of the SpreadsheetDocument object. Next, you load the contents of the workbook into an XML DOM document. You then set up a namespace manager by using the XmlNamespaceManager object and by setting a reference to the default SpreadsheetML namespace with the d qualifier. Then you search the document for the specified worksheet by using the name attribute of the //d:sheet node.

For all of the matching nodes (if any exist), the relationship Id is retrieved and the worksheet corresponding to that Id is deleted.

And finally, the updated SpreadsheetML markup is saved back to the main workbook part.

Given a workbook and a worksheet name, the following code returns either a list of hidden row numbers, or a list of hidden column numbers.

public static List<int> XLDetectHiddenRowsOrCols(string fileName, string sheetName, bool detectRows)
{
   const string worksheetSchema = "http://schemas.openxmlformats.org/spreadsheetml/2006/main";

   //  Set up the return value list.
   List<int> itemList = new System.Collections.Generic.List<int>();

   using (SpreadsheetDocument xlDoc = SpreadsheetDocument.Open(fileName, false))
   {
      //  Load the contents of the workbook.
      XmlDocument doc = new XmlDocument();
      doc.Load(xlDoc.WorkbookPart.GetStream());

      //  Create a namespace manager, so you can search.
      //  Add a prefix for the default namespace.
      XmlNamespaceManager nsManager = new XmlNamespaceManager(doc.NameTable);
      nsManager.AddNamespace("d", worksheetSchema);

      string searchString = string.Format("//d:sheet[@name='{0}']", sheetName);
      XmlNode sheetNode = doc.SelectSingleNode(searchString, nsManager);
      if (sheetNode != null)
      {
          //  Get the relId attribute.
          XmlAttribute relationAttribute = sheetNode.Attributes["r:id"];
          if (relationAttribute != null)
          {
            string relId = relationAttribute.Value;

            WorksheetPart sheetPart = (WorksheetPart)xlDoc.WorkbookPart.GetPartById(relId);
            //  First, get the relationship between the document and the sheet.

            //  Load the contents of the workbook.
            XmlDocument sheetDoc = new XmlDocument();
            sheetDoc.Load(sheetPart.GetStream(FileMode.Open));

            if (detectRows)
            {
                //  Retrieve the list of hidden rows.
                foreach (System.Xml.XmlNode node in sheetDoc.SelectNodes("//d:row[@hidden='1']", nsManager))
                {
                  //  For each hidden row, add information to the output list.
                  XmlAttribute rowAttr = node.Attributes["r"];
                  if (rowAttr != null)
                  {
                      itemList.Add(Convert.ToInt32(rowAttr.Value));
                  }
                }
            }
            else
            {
               //  Retrieve the list of hidden columns.
               foreach (System.Xml.XmlNode node in sheetDoc.SelectNodes("//d:cols/d:col", nsManager))
               {
                  XmlAttribute hiddenAttr = node.Attributes["hidden"];
                  if (hiddenAttr != null)
                  {
                      if (hiddenAttr.Value == "1")
                      {
                          //  Get the range of columns that are hidden.
                          XmlAttribute minAttr = node.Attributes["min"];
                          XmlAttribute maxAttr = node.Attributes["max"];
                          if (minAttr != null && maxAttr != null)
                          {
                             //  Finally, add the range of values to the list.
                             int minValue = Convert.ToInt32(minAttr.Value);
                             int maxValue = Convert.ToInt32(maxAttr.Value);
                             for (int i = minValue; i <= maxValue; i++)
                             {
                                itemList.Add(i);
                             }
                          }
                      }
                   }
                }
             }
          }
       }
   }
   return itemList;
}

In this procedure, you pass in three parameters: the full path to the Excel 2007 workbook, the name of the worksheet, and, a Boolean value specifying whether to search for hidden rows. If detectRows is true, the procedure returns a list of hidden rows. If detectRows is false, then the list of hidden columns is returned. Then you open the input file as an Open XML package by using the Open method of the SpreadsheetDocument object. Next, you load the contents of the workbook into an XML DOM document. You then set up a namespace manager by using the XmlNamespaceManager object and by setting a reference to the default SpreadsheetML namespace, with the d qualifier. Then you search the document for the node representing the specified worksheet by using the name attribute of the //d:sheet node.

If the matching node is found, you get the relationship between the workbook and the worksheet. First, load the contents of the worksheet. If detectRows is true, then retrieve the list of hidden rows by using the hidden attribute of the //d:row node.

NoteNote

Rows are numbered starting with 1. Columns are numbered starting with zero (0).

If detectRows is false then it is assumed that you want to search for hidden columns, which you do by using the hidden attribute of the //d:cols/d:col node.

And finally, the list of items is returned to the calling procedure.

Given an Excel 2007 workbook and the title of a chart, create a Word 2007 file and export the chart.

public static void XLExportChart(string inputFileName, string outputFileName, string chartTitle)
{
   const string drawingMLSchema = "http://schemas.openxmlformats.org/drawingml/2006/main";
   const string chartMLSchema = "http://schemas.openxmlformats.org/drawingml/2006/chart";

   bool chartFound = false;

   using (SpreadsheetDocument xlDoc = SpreadsheetDocument.Open(inputFileName, false))
   {
      foreach (WorksheetPart sheetPart in xlDoc.WorkbookPart.WorksheetParts)
      {
          if (sheetPart.DrawingsPart == null)
            continue;
          //  Loop through all the parts related to the worksheet.
          foreach (ChartPart chartPart in sheetPart.DrawingsPart.ChartParts)
          {
              //  A chart is found. Does it have the correct title?      

              //  Create a namespace manager, so you can search.
              NameTable nt = new NameTable();
              XmlNamespaceManager nsManager = new XmlNamespaceManager(nt);
              nsManager.AddNamespace("c", chartMLSchema);
              nsManager.AddNamespace("a", drawingMLSchema);

              //  Load the chart document from the part's stream, and 
              //  search for the requested title.
              XmlDocument chartDoc = new XmlDocument(nt);
              chartDoc.Load(chartPart.GetStream());

              XmlNode titleNode = chartDoc.SelectSingleNode("//c:chart//c:title//a:t", nsManager);
              if (titleNode != null)
              {
                 if (string.Compare(titleNode.InnerText, chartTitle, true) == 0)
                 {
                    using (WordprocessingDocument newDoc = WordprocessingDocument.Create(outputFileName, WordprocessingDocumentType.Document))
                    {
                        newDoc.AddMainDocumentPart();
                        newDoc.MainDocumentPart.AddPart<ChartPart>(chartPart);
                    }

                    //  Tell the outer loops that you are finished.
                    chartFound = true;
                 }
            }
          }
          //  You need to export only a single chart, so get out now.
          if (chartFound)
          {
            break;
          }
      }
   }
}

In this procedure, you pass in three parameters: the full path to the workbook, the full path of the output file, and the name of the chart.

NoteNote

In this procedure, the Word 2007 document is used as temporary storage for the chart part and, as such, none of the relationships are defined. Thus, the document cannot be opened in Word. You could substitute the code in this procedure with code that correctly adds a part to the document package. For a sample of this type of code, see the Open XML Format SDK 1.0.

First you open the input file as an Open XML package by using the Open method of the SpreadsheetDocument object. Next, you set up a namespace manager by using the XmlNamespaceManager object and by setting a reference to the default chartML namespace with the c qualifier and to the drawingML namespace with the a qualifier.

Then the procedure loops through each chartPart part searching for the chart title by using the //c:chart//c:title//a:t XPath expression. If it is found, a Word 2007 document is created and the chartPart part is exported to the package.

In the following code example, you create a list of all of the hidden worksheets in a workbook.

public static List<string> XLGetHiddenSheets(string fileName)
{
   List<string> sheets = new List<string>();

   using (SpreadsheetDocument xlDoc = SpreadsheetDocument.Open(fileName, false))
   {
      //  Get the main document part (workbook.xml).
      XmlDocument doc = new XmlDocument();
      doc.Load(xlDoc.WorkbookPart.GetStream());

      //  Create a NamespaceManager to handle the default namespace, 
      //  and create a prefix for the default namespace.
      XmlNamespaceManager nsManager = new XmlNamespaceManager(doc.NameTable);
      nsManager.AddNamespace("default", doc.DocumentElement.NamespaceURI);

      foreach (System.Xml.XmlNode node in doc.SelectNodes("//default:sheets//default:sheet[@state='hidden']", nsManager))
      {
         string sheetName = node.Attributes["name"].Value;
         sheets.Add(sheetName);
      }
   }
   return sheets;
}

In this procedure, you pass in one parameter: the full path to the workbook. Then you open the input file as an Open XML package by using the Open method of the SpreadsheetDocument object and load the data into an XML document. Next, you set up a namespace manager by using the XmlNamespaceManager object and by setting a reference to the default SpreadsheetML namespace to search for the hidden worksheets.

Then you search the document for hidden worksheets by using the state attribute of the //default:sheets//default:sheet node. The name of each hidden worksheet is added to the sheets list. After all of the worksheets are searched, the sheets list is returned to the calling procedure.

In the following code, you retrieve the value of a cell in a worksheet.

public static string XLGetCellValue(string fileName, string sheetName, string addressName)
{
   const string worksheetSchema = "http://schemas.openxmlformats.org/spreadsheetml/2006/main";
   const string sharedStringSchema = "http://schemas.openxmlformats.org/spreadsheetml/2006/main";

   string cellValue = null;

   //  Retrieve the stream containing the requested
   //  worksheet's info.
   using (SpreadsheetDocument xlDoc = SpreadsheetDocument.Open(fileName, false))
   {
      //  Get the main document part (workbook.xml).
      XmlDocument doc = new XmlDocument();
      doc.Load(xlDoc.WorkbookPart.GetStream());

      //  Create a namespace manager, so you can search.
      //  Add a prefix (d) for the default namespace.
      NameTable nt = new NameTable();
      XmlNamespaceManager nsManager = new XmlNamespaceManager(nt);
      nsManager.AddNamespace("d", worksheetSchema);
      nsManager.AddNamespace("s", sharedStringSchema);

      string searchString = string.Format("//d:sheet[@name='{0}']", sheetName);
      XmlNode sheetNode = doc.SelectSingleNode(searchString, nsManager);
      if (sheetNode != null)
      {
         //  Get the relId attribute.
          XmlAttribute relationAttribute = sheetNode.Attributes["r:id"];
         if (relationAttribute != null)
         {
            string relId = relationAttribute.Value;
            //  Load the contents of the workbook.
            XmlDocument sheetDoc = new XmlDocument(nt);
            sheetDoc.Load(xlDoc.WorkbookPart.GetPartById(relId).GetStream());

            XmlNode cellNode = sheetDoc.SelectSingleNode(string.Format("//d:sheetData/d:row/d:c[@r='{0}']", addressName), nsManager);
            if (cellNode != null)
            {
               XmlAttribute typeAttr = cellNode.Attributes["t"];
               string cellType = string.Empty;
               if (typeAttr != null)
               {
                  cellType = typeAttr.Value;
               }

               XmlNode valueNode = cellNode.SelectSingleNode("d:v", nsManager);
               if (valueNode != null)
               {
                  cellValue = valueNode.InnerText;
               }
               if (cellType == "b")
               {
                  if (cellValue == "1")
                  {
                     cellValue = "TRUE";
                  }
                  else
                  {
                     cellValue = "FALSE";
                  }
               }
               else if (cellType == "s")
               {
                   if (xlDoc.WorkbookPart.SharedStringTablePart != null)
                   {
                      XmlDocument stringDoc = new XmlDocument(nt);
                      stringDoc.Load(xlDoc.WorkbookPart.SharedStringTablePart.GetStream());
                      //  Add the string schema to the namespace manager.
                      nsManager.AddNamespace("s", sharedStringSchema);

                      int requestedString = Convert.ToInt32(cellValue);
                      string strSearch = string.Format("//s:sst/s:si[{0}]", requestedString + 1);
                      XmlNode stringNode = stringDoc.SelectSingleNode(strSearch, nsManager);
                      if (stringNode != null)
                      {
                          cellValue = stringNode.InnerText;
                      }
                   }
                }
            }
         }
       }
   }
   return cellValue;
}

In this procedure, you pass in three parameters: the full path to the workbook, the name of the worksheet, and the address of the cell containing the value you want to retrieve. Then you open the input file as an Open XML package by using the Open method of the SpreadsheetDocument object and load the data into an XML document. Next, you set up a namespace manager by using the XmlNamespaceManager object and by setting a reference to the default worksheetSchema namespace with d the qualifier and the sharedStringSchema with the s qualifier. The sharedStringSchema namespace refers to the SharedStringTablePart part, which contains strings that are shared across cells.

Next, you retrieve the node representing the specified worksheet in the main workbook part by selecting the name attribute of the //d:sheet node. If the node is found, the relationship Id of the worksheet is retrieved and used to load the worksheet into an XML document. Then you retrieve the value. In the node, if the t attribute contains an s, this indicates that this is a shared string and must be looked up in the SharedStringTablePart part. Otherwise, the value can be retrieved directly from the node.

NoteNote

This code checks only for Boolean and String values individually.

Finally, the procedure returns either the cell value or a Boolean value specifying whether the search succeeded.

As this article demonstrates, working with Excel 2007 and PowerPoint 2007 files is much easier with the Welcome to the Open XML Format SDK 1.0. In part two of this series of articles (coming soon), I describe other common tasks that you can perform with Excel 2007 and PowerPoint 2007 files.

Show:
© 2014 Microsoft