Office Open XML Formats: Retrieving Lists of PowerPoint 2007 Slides
Summary: Learn how to retrieve lists of PowerPoint slides programmatically using code snippets for use with Visual Studio 2005.
Applies to: 2007 Microsoft Office System, Microsoft Office PowerPoint 2007, Microsoft Visual Studio 2005
Ken Getz, MCW Technologies, LLC
To help you get started, download a set of forty code snippets for Microsoft Visual Studio 2005, each of which demonstrate various techniques working with the 2007 Office System Sample: Open XML File Format Code Snippets for Visual Studio 2005. After you install the code snippets, create a sample Microsoft Office PowerPoint presentation to test with. (For more information, see Read It). Create a Windows Application project in Visual Studio 2005, open the code editor, right-click and select Insert Snippet, and select the PowerPoint: Get List of Slide Titles snippet from the list of available 2007 Microsoft Office snippets. If you are using Microsoft Visual Basic, inserting the snippet adds a reference to WindowsBase.dll with the following Imports statements:
If you use Microsoft Visual C#, you need to add the reference to the WindowsBase.dll assembly and the corresponding using statements, so that you can compile the code. (Code snippets in C# cannot set references and insert using statements for you.) If the Windowsbase.dll reference does not appear on the .NET tab of the Add Reference dialog box, click the Browse tab, locate the C:\Program Files\Reference assemblies\Microsoft\Framework\v3.0 folder, and then click WindowsBase.dll.
The PPTGetSlideTitles snippet delves programmatically into the various document parts and relationships between the parts to retrieve a list of slide titles. To test it out, store your sample presentation somewhere easy to find (for example, C:\Demo.pptx). In a Windows application, insert the PPTGetSlideTitles snippet, and then call it using the sample below. You see a list of slide titles in the Output window.
Dim titles As List(Of String) = PPTGetSlideTitles("C:\demo.pptx") For Each title As String In titles Debug.Print(title) Next
The snippet code starts with the following block:
Public Function PPTGetSlideTitles( _ ByVal fileName As String) As List(Of String) ' Return a generic list containing all ' the slide titles. Const documentRelationshipType As String = _ "http://schemas.openxmlformats.org/officeDocument/2006/" & _ "relationships/officeDocument" Const presentationmlNamespace As String = _ "http://schemas.openxmlformats.org/" & _ "presentationml/2006/main" ' Fill this collection with a list of all ' the titles of all the slides in the ' requested slide deck. Dim titles As New List(Of String) ' Next block goes here. Return titles End Function
The code returns a generic List containing a string value for each slide in the document you specify. As with any other work with the Open XML File Formats, you want to use relationships between document parts to find the various parts you need. The code includes a constant, documentRelationshipType, that contains the fixed relationship type you need to find the document part within the PowerPoint package. The presentationmlNamespace constant contains the namespace you need when searching. The code declares a generic List to contain the results. At the end of the procedure, it returns that generic list.
Nearly every procedure that interacts with the Office Open XML File Formats needs to open a package, either for read-only, or for both reading and writing. In this exercise, you are only reading content from the file, so you can open the package in read-only mode. The next block of code does this for you:
Dim documentPart As PackagePart = Nothing Dim documentUri As Uri = Nothing Using pptPackage As Package = _ Package.Open(fileName, FileMode.Open, FileAccess.Read) ' Next block goes here. End Using
The code creates the pptPackage variable, using the System.IO.Packaging.Package type, and fills it by calling the Package.Open method, passing in the name of the file to open, the mode to use, and the access method. When you are finished with the package, close it. The snippet completes its work in a using block, which closes the package when it is finished.
Every 2007 Office document contains a single document part, which acts as the start part. This document part contains the document itself. In just about every situation, the goal is to find that part first. The next code block finds the document's start part—the XML part representing the document content. It calls the Package.GetRelationshipsByType method, passing in the constant that contains the document relationship name (see Figure 2). The code then loops through all the returned relationships, and retrieves the document URI, relative to the root of the package. You must loop through the PackageRelationship objects to retrieve the one you want. In every case, this loop only executes once:
For Each relationship As PackageRelationship _ In pptPackage.GetRelationshipsByType( _ documentRelationshipType) documentUri = PackUriHelper.ResolvePartUri( _ New Uri("/", UriKind.Relative), relationship.TargetUri) documentPart = pptPackage.GetPart(documentUri) Exit For Next ' Next block goes here.
To search for the list of relationship IDs, the code starts by setting up an XmlNamespaceManager instance. The namespace manager includes a namespace abbreviated “p”, referring to a namespace named using the presentationmlNamespace constant, discussed above. Next, the code creates an XmlDocument instance, and loads the XML content from the document part into the new XML document. Finally, this code example calls the XmlDocument.SelectNodes method, passing in a query string to find the nodes shown in Figure 3. Note that the variable names and comments in the code in this snippet refer to sheets in many places, instead of slides. Clearly, copy and paste errors occurred in its creation.
' Manage namespaces to perform Xml XPath queries. Dim nt As New NameTable() Dim nsManager As New XmlNamespaceManager(nt) nsManager.AddNamespace("p", presentationmlNamespace) ' Iterate through the slides and extract ' the title string from each. Dim xDoc As New XmlDocument(nt) xDoc.Load(documentPart.GetStream()) Dim sheetNodes As XmlNodeList = _ xDoc.SelectNodes("//p:sldIdLst/p:sldId", nsManager) If sheetNodes IsNot Nothing Then ' Next block goes here. End If
The code next loops through each item in the node list, retrieving the r:id attribute for each item—this information provides the relationship ID the code needs to load the individual sheets:
Dim relAttr As XmlAttribute = Nothing Dim sheetRelationship As PackageRelationship = Nothing Dim sheetPart As PackagePart = Nothing Dim sheetUri As Uri = Nothing Dim sheetDoc As XmlDocument = Nothing Dim titleNode As XmlNode = Nothing ' Look at each sheet node, retrieving ' the relationship id. For Each xNode As XmlNode In sheetNodes relAttr = xNode.Attributes("r:id") If relAttr IsNot Nothing Then ' Next block goes here. End If Next
For each slide relationship, the code uses the PackagePart.GetRelationship method to retrieve the relationship corresponding to the specific ID (listed in Figure 4). For each relationship, the code resolves the URI it finds in the relationships part, and retrieves a reference to the individual slide part:
' Retrieve the PackageRelationship object ' for the sheet: sheetRelationship = documentPart.GetRelationship(relAttr.Value) If sheetRelationship IsNot Nothing Then sheetUri = PackUriHelper.ResolvePartUri( _ documentUri, sheetRelationship.TargetUri) sheetPart = pptPackage.GetPart(sheetUri) If sheetPart IsNot Nothing Then ' Next block goes here. End If End If
Finally, the code includes a reference to the start part. It loads a new XmlDocument instance with the XML content of the slide. With the slide's XML content, the code searches for XML content that represents the title of the slide. If the search finds a matching node, the code adds the InnerText property of the node before the title. You may wonder why it adds it several lines before the title. This becomes an issue if you use several different fonts or styles in the title—the text is broken up among multiple elements. By retrieving the inner text of a parent node, you are guaranteed to retrieve all the text. Finally, the code retrieves the title of a single slide. The code repeats this routine for each slide in the presentation:
It is important to understand the file structure of a simple PowerPoint document so that you can find the data you need—in this case, you want the title for each slide in the presentation. To do that, create a PowerPoint document with several slides in it, giving each slide a title. I named my document, Demo.pptx, and it contains four slides, as shown in Figure 1.
To investigate the contents of the document, follow these steps:
In Windows Explorer, rename the document, changing the extension to .zip. For example, Demo.pptx.zip.
Open the ZIP file, using either Window Explorer, or some ZIP application.
View the _rels\.rels file, shown in Figure 2. This document contains information about the relationships between the parts in the document. Note the value for the presentation.xml part, as highlighted in the figure—this information allows you to find specific parts.
Open ppt\presentation.xml, shown in Figure 3. The highlighted element, p:sldIdLst, contains one reference for each slide in the deck. The snippet youl investigate retrieves each of these slide references to retrieve the slide title.
Open ppt\_rels\presentation.xml.rels, as shown in Figure 4. This document contains information about the relationships between the document part and all the subsidiary parts. The code snippet uses this information to find each of the slides so that it can retrieve the title from the slide. Note, for example, that the slide whose relationship ID is rId2 refers to slides/slide1.xml.
Open ppt\slides\slide1.xml, as shown in Figure 5—this part contains the slide title. The code snippet uses XML-searching techniques to find this particular element within the XML content. The code repeats the actions for each slide in the presentation.
Close the tool you are using to review the presentation, and rename the file with a .PPTX extension.