Assembling Documents on SharePoint 2010 Sites by Merging Content from Excel, PowerPoint, and Word

Office 2010

Summary: Combining data and images from various Microsoft Office programs to create a single document is a frequently requested scenario. Doing this from documents stored in SharePoint Foundation 2010 has several advantages. Learn to create Word documents in a document library by merging components from Excel, PowerPoint, and other Word documents. (25 printed pages)

Document assembly seems to be a hot topic these days especially when you combine it with the power of Microsoft SharePoint Server 2010. Prior to the 2007 Microsoft Office system, accessing text, tables, images, and other information to create documents in Microsoft Word, for example, relied on COM Automation to access each program contributing to the merge. Automation has its share of issues such as interruptions by pop-up dialog boxes, timing constraints, inability to scale out, and other limitations. Additionally, running Microsoft Office solutions on the server is not recommended or supported by Microsoft.

With the release of 2007 Microsoft Office system, the Open XML file formats present documents as a composite of parts and relationships. For example, there are image parts, document parts, and charts parts. This enables you to access and create documents and their component parts without using Automation. In addition, the Open XML SDK 2.0 for Microsoft Office enables you to create and manipulate documents, and expose data and other components programmatically, without using Microsoft Office Automation. And because Automation is no longer required to work with Microsoft Office documents, you can now perform merges and other actions directly on the server.

Likewise, SharePoint Server 2010 is a great a tool for storing, sharing, and controlling documents and providing functionality to work with documents. For example, the Document Center in SharePoint Server 2010 is an area where you can collaborate and store documents in document libraries. Document libraries are containers that are already configured to use workflow, version history, and other features that are important for working with documents at the enterprise level.

This article describes a rich document assembly solution which takes Microsoft Word 2010 documents, Microsoft Excel 2010 documents, and Microsoft PowerPoint 2010 documents and merges them together to form a final report in Word. It does all of this from a SharePoint 2010 Web Part.

You can download the solution project from the Code Gallery.

Imagine a scenario where you work for a company that analyzes stocks and generates reports for every company and stock analyzed. These reports are typically very rich and usually involve more than one person contributing to the content. Content is separated into multiple Word documents, Excel documents, and PowerPoint documents where each document is assigned to a person. After all the content is written, the content is assembled into a final report as a Word document. The company asks you to write a solution that merges all these documents programmatically.

This solution uses document sets, a feature in SharePoint 2010. Document sets enables you to manage collection of documents as single objects. Think of this feature as a binder of related content.

In this solution, a custom document set includes a set of files (six in this example) that correspond to the various components of the final analysis report. Figure 1 shows a document set for a company called Contoso.

Figure 1. Document set for the Contoso company sample

Document set for Contoso company sample

Based on this scenarios, by using document sets, you follow these steps:

  1. Create a set of documents that represent the document set for this solution. One of the documents is a Word template that represents the appearance of the final report.

  2. Create and add a custom document set to a document library.

  3. Create a Web Part for the document library with a button labeled Assemble Documents. The button merges all the content together into a final document. Add this Web Part to the document set library.

  4. Using the Open XML SDK 2.0, open the template document from the document set and search for all content controls.

  5. For every content control you find, find the corresponding document content in the library and merge that content into the final document.

  6. After the document assembly is complete, offer the user the ability to open or save the report.

Step 1 – Create the Template

Setting up the correct template makes all the difference when you are creating Office document solutions. In this solution, you have a document set with six files, where one of the files is a template file. The template document represents the look of the final report. Then you use content controls within the template to specify semantic regions within the document to be used to merge content together.

To make the process easier, the titles of the content controls represent the types of content to be merged. For example, a Word document, a chart from a spreadsheet, a table from a spreadsheet, and a SmartArt graphic from a presentation. The content of a content control represents the name of the file that contains the content to be merged. For example, Figure 2 shows the template document highlighting one of the content controls labeled "Word:Document" with the content set to "Introduction".

Figure 2. Content control for information from the Word document

Content control for document information

This content control represents the region where you merge the Contoso - Introduction document (from Word) into the template file. For the sake of completeness, the other content controls are labeled Spreadsheet:Chart, Spreadsheet:Table, and Presentation:SmartArt.

Step 2 – Set Up a Document Set in SharePoint 2010

Next, you must create document sets for the six documents on the SharePoint 2010 site.

To create document sets

  1. Create your document library first. In the navigation pane, click Libraries and then click Create.

  2. In the Create dialog, click Document Library, type a name for the library (Doc Set Assembly Demo) and then click OK.

  3. Next create your content type. Click Site Actions, click Site Settings, and then in the Galleries section, click Site content types.

  4. Click Create and then type a name for the content type.

  5. In the Select parent content type from list, select Document Set Content Types and then click OK.

  6. Now create the document set. In the Site Content Types dialog, in the Settings section, click Document Set settings.

  7. In the Create dialog, click Document Library, type a name for the library (Doc Set Assembly Demo) and then click OK.

  8. In the Default Content section, browse to the first document (Analysis.docx), select it, and then click Open.

  9. Now click Add new default content, browse to the second document (Introduction.docx), and then click Open.

  10. Repeat this for the remaining four documents and when you are finished, click OK.

  11. Click back to the document library and then under Library Tools, click Library, and then click Library Settings.

  12. In the Document Library Setting dialog under General Settings, click Advanced Settings.

  13. In the Content Types section, click Yes for Allow management of content types and then click OK.

  14. Back in the Document Library Setting dialog under Content Types, click Add from existing site content types.

  15. In the Available Site Content Types list, select your content type and then click Add.

  16. Add the new document set to your library. Click back to the document library and under Library Tools, click Documents.

  17. Click the New Document list and select your content type, type a name for the new document set, and then click OK.

At this point, you should have a library set up with a document set content type displaying the six documents that use the document set name prefix (see Figure 1).

Step 3 – Create a Web Part with an Assemble Documents Command

To make this solution usable, you must add a command within the document library that lets you merge documents together. The easiest way to perform this task is to create a Web Part within the Microsoft Visual Studio 2010 project.

To create the Web Part project in Visual Studio 2010

  1. Start Visual Studio 2010.

  2. On the File menu, click New, and then Project.

  3. Under Installed Templates, click SharePoint, 2010, and then Visual Web Part.

  4. In the Name box, type DocumentAssembly, and then click OK.

  5. In the SharePoint Configuration Wizard page, select your Web application, and then click Finish.

  6. After you create the Web Part, open the VisualWebPart1.cs file and add the following references to the existing references at the top of the file.

    using System.Linq;
    using System.IO;
    using DocumentFormat.OpenXml.Packaging;
    using Word = DocumentFormat.OpenXml.Wordprocessing;
    using Draw = DocumentFormat.OpenXml.Drawing;
    using DocumentFormat.OpenXml.Drawing.Charts;
    using WP = DocumentFormat.OpenXml.Drawing.Wordprocessing;
    using DocumentFormat.OpenXml.Drawing.Spreadsheet;
    using Excel = DocumentFormat.OpenXml.Spreadsheet;
    using DocumentFormat.OpenXml;
    using PPT = DocumentFormat.OpenXml.Presentation;
    using Dgm = DocumentFormat.OpenXml.Drawing.Diagrams;
  7. Next, modify the CreateChildControls method as follows.

    protected override void CreateChildControls() 
       Control control = this.Page.LoadControl(_ascxPath); 
       Button btnSubmit = new Button(); 
       btnSubmit.Text = "Assemble Documents"; 
       btnSubmit.Click += new EventHandler(btnSubmit_Click); 

    This method adds a new button control where you add logic to merge the documents that are contained in a given document set (the merge code is called from the btnSubmit_Click event).

When you create the Web Part, add the button to the document set. The easiest way to do this task is to use Microsoft SharePoint Designer 2010.

To add the custom Web Part to the document set library

  1. From SharePoint Designer 2010, open the SharePoint site.

  2. Navigate to the document library that contains the document set.

  3. Click All Files, Custom document library name, Forms, and then Document set name. Type docsethomepage.aspx.

  4. Click the bottom of WebPartZone_CenterMain.

  5. Next, click Insert, Web Part, and then select your custom Web Part.

    Figure 3. Adding the Web Part in SharePoint Designer 2010

    Adding Web Part in SharePoint Designer 2010
  6. Save and close SharePoint Designer 2010.

At this point, you should see an Assemble Documents command displayed for any document set as shown in Figure 4.

Figure 4. Assemble Documents button added to the document set

Assemble Documents button in document

Step 4 – Find the Content Controls

Finding content controls in a document involves the following steps:

  1. Open template document from the SharePoint site.

  2. Load the document into memory.

  3. Open the document (from memory) with the Open XML SDK 2.0.

  4. Loop through all content controls within the document.

  5. For every content control, determine the title.

  6. Based on the title, perform a particular action.

The following code performs these steps.

void btnSubmit_Click(object sender, EventArgs e) 
   SPFolder folder = SPContext.Current.ListItem.Folder; 
   char[] splitter = { '/' }; 
   string[] folderName = folder.Name.Split(splitter); 
   string filePrefix = @"Stock Analysis Demo/" + folderName[0] + "/" + folderName[0]; 
   SPFile template = folder.Files[filePrefix + " - Template.docx"]; 
   SPFile file; 
   byte[] byteArray = template.OpenBinary(); 
   using (MemoryStream mem = new MemoryStream()) 
      mem.Write(byteArray, 0, (int)byteArray.Length); 
      using (WordprocessingDocument myDoc = WordprocessingDocument.Open(mem, true)) 
         MainDocumentPart mainPart = myDoc.MainDocumentPart; 
         foreach (Word.SdtElement sdt in mainPart.Document .Descendants<Word.SdtElement>().ToList()) 
            Word.SdtAlias alias = sdt.Descendants<Word.SdtAlias>().FirstOrDefault(); 
            if (alias != null) 
               string sdtTitle = alias.Val.Value; 
               if (sdtTitle == "Spreadsheet:Table") 
                  file = folder.Files[filePrefix + " - " + sdt.InnerText + ".xlsx"]; 
                  ImportTableFromSpreadsheet(mainPart, sdt, file); 
               else if (sdtTitle == "Spreadsheet:Chart") 
                  file = folder.Files[filePrefix + " - " + sdt.InnerText + ".xlsx"]; 
                  ImportChartFromSpreadsheet(mainPart, sdt, file); 
               else if (sdtTitle == "Presentation:SmartArt") 
                  file = folder.Files[filePrefix + " - " + sdt.InnerText + ".pptx"]; 
                  ImportSmartArtFromPowerPoint(mainPart, sdt, file); 
               else if (sdtTitle == "Word:Document") 
                  file = folder.Files[filePrefix + " - " + sdt.InnerText + ".docx"]; 
                  AddAltChunk(mainPart, sdt, file); 
      // For the remainder of the code in this procedure, see step 6.

Step 5 – Merge the Content Together

In this solution, there are four kinds of content to be merged:

  • Text from a Word document

  • SmartArt graphics from a PowerPoint presentation

  • Charts from an Excel spreadsheet

  • Tables of data from an Excel spreadsheet

The following sections describe these steps.

Step 5a – Merge the Word Documents Together

By far, the easiest way to assemble Word documents is to take advantage of the altChunks object. You can find more information about document assembly with altChunks object in the article Creating Documents by Using the Open XML Format SDK 2.0 (Part 3 of 3)

Here is the code to merge documents together in SharePoint Server 2010.

protected int id = 1; 
void AddAltChunk(MainDocumentPart mainPart, Word.SdtElement sdt, SPFile filename) 
   string altChunkId = "AltChunkId" + id; 
   byte[] byteArray = filename.OpenBinary(); 
   AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart( 
   AlternativeFormatImportPartType.WordprocessingML, altChunkId); 
   using (MemoryStream mem = new MemoryStream()) 
      mem.Write(byteArray, 0, (int)byteArray.Length); 
      mem.Seek(0, SeekOrigin.Begin); 
   Word.AltChunk altChunk = new Word.AltChunk(); 
   altChunk.Id = altChunkId; 

   // Replace content control with altChunk information.  
   OpenXmlElement parent = sdt.Parent; 
   parent.InsertAfter(altChunk, sdt); 

Step 5b – Import SmartArt Graphics from PowerPoint to Word

The next step is to import the SmartArt from PowerPoint to Word. You can find more information about this in the blog post Importing SmartArt from PowerPoint to Word.

Here is the code that you need to do this task.

void ImportSmartArtFromPowerPoint(MainDocumentPart mainPart, Word.SdtElement sdt, SPFile filename) 
   string docLayoutPartId = ""; 
   string docDataPartId = ""; 
   string docColorsPartId = ""; 
   string docStylePartId = ""; 
   byte[] byteArray = filename.OpenBinary(); 
   using (MemoryStream mem = new MemoryStream()) 
      mem.Write(byteArray, 0, (int)byteArray.Length); 
      using (PresentationDocument myPres = PresentationDocument.Open(mem, true)) 
         PresentationPart presPart = myPres.PresentationPart; 
         // Get the slide that contains the SmartArt graphic. 
         SlidePart slide = (SlidePart)presPart.GetPartById("rId3"); 
         // Get all the appropriate parts associated with the SmartArt. 
         DiagramLayoutDefinitionPart layoutPart = 
         DiagramDataPart dataPart = slide.DiagramDataParts.First(); 
         DiagramColorsPart colorsPart = slide.DiagramColorsParts.First(); 
         DiagramStylePart stylePart = slide.DiagramStyleParts.First(); 
         // Get some of the appropriate properties off the SmartArt graphic. 
         PPT.GraphicFrame graphicFrame = 
         PPT.NonVisualDrawingProperties drawingPr = graphicFrame 
         Draw.Extents extents = 
         // Import SmartArt into the Word document. 
         // Add the SmartArt parts to the Word document. 
         DiagramLayoutDefinitionPart docLayoutPart = 
         DiagramDataPart docDataPart = 
         DiagramColorsPart docColorsPart = 
         DiagramStylePart docStylePart = 
         // Get all the relationship ids of the added parts. 
         docLayoutPartId = mainPart.GetIdOfPart(docLayoutPart); 
         docDataPartId = mainPart.GetIdOfPart(docDataPart); 
         docColorsPartId = mainPart.GetIdOfPart(docColorsPart); 
         docStylePartId = mainPart.GetIdOfPart(docStylePart); 
         // Use the document reflector to figure out how to add a SmartArt 
         // graphic to Word. 
         // Change attribute values based on specifics related to the SmartArt. 
         Word.Paragraph p = new Word.Paragraph( 
            new Word.Run( 
            new Word.Drawing( 
            new WP.Inline( 
            new WP.Extent() { Cx = extents.Cx, Cy = extents.Cy }, 
            new WP.EffectExtent() 
            { LeftEdge = 0L, TopEdge = 0L, RightEdge = 0L, BottomEdge = 0L }, 
            new WP.DocProperties() {Id = drawingPr.Id, Name = drawingPr.Name }, 
            new WP.NonVisualGraphicFrameDrawingProperties(), 
            new Draw.Graphic( 
            new Draw.GraphicData( 
            new Dgm.RelationshipIds() { DataPart = docDataPartId, 
               LayoutPart = docLayoutPartId, StylePart = docStylePartId, 
               ColorPart = docColorsPartId }) 
            { Uri = "" })) 
            { DistanceFromTop = (UInt32Value)0U, DistanceFromBottom = (UInt32Value)0U, 
               DistanceFromLeft = (UInt32Value)0U, DistanceFromRight = (UInt32Value)0U 
           // Swap out the content control for the SmartArt. 
           OpenXmlElement parent = sdt.Parent; 
          parent.InsertAfter(p, sdt); 

Step 5c – Import the Chart from Excel to Word

In this step, you import a chart from an Excel spreadsheet to the Word template. You can find more information about doing this in the blog post Importing Charts from Spreadsheets to Wordprocessing Documents.

Here is the same code as seen in the blog post modified to work with files that exist within SharePoint 2010.

void ImportChartFromSpreadsheet(MainDocumentPart mainPart, Word.SdtElement sdt, 
   SPFile spreadsheetFileName) 
   // Create a paragraph that has an inline drawing object. 
   Word.Paragraph p = new Word.Paragraph(); 
   Word.Run r = new Word.Run(); 
   Word.Drawing drawing = new Word.Drawing(); 
   // These dimensions work perfectly for the template document. 
   WP.Inline inline = new WP.Inline( new WP.Extent() { Cx = 5486400, Cy = 3200400 }); 
   byte[] byteArray = spreadsheetFileName.OpenBinary(); 
   using (MemoryStream mem = new MemoryStream()) 
      mem.Write(byteArray, 0, (int)byteArray.Length); 
     // Open the Excel spreadsheet. 
     using (SpreadsheetDocument mySpreadsheet = SpreadsheetDocument.Open(mem, true)) 
        // Get all of the appropriate parts. 
        WorkbookPart workbookPart = mySpreadsheet.WorkbookPart; 
        WorksheetPart worksheetPart = XLGetWorksheetPartByName(mySpreadsheet, 
        DrawingsPart drawingPart = worksheetPart.DrawingsPart; 
        ChartPart chartPart = (ChartPart)drawingPart.GetPartById("rId1"); 
       // Clone the chart part and add it to the Word document. 
       ChartPart importedChartPart = mainPart.AddPart<ChartPart>(chartPart); 
       string relId = mainPart.GetIdOfPart(importedChartPart); 
       // The frame element contains information for the chart. 
       GraphicFrame frame = 
       string chartName =      frame.NonVisualGraphicFrameProperties.NonVisualDrawingProperties.Name; 
       // Clone this node so that you can add it to the Word document. 
       Draw.Graphic clonedGraphic = (Draw.Graphic)frame.Graphic.CloneNode(true); 
       ChartReference c = clonedGraphic.GraphicData.GetFirstChild<ChartReference>(); 
       c.Id = relId; 
       // Give the chart a unique ID and name. 
       WP.DocProperties docPr = new WP.DocProperties(); 
       docPr.Name = chartName; 
       docPr.Id = GetMaxDocPrId(mainPart) + 1; 
       // Add the chart data to the inline drawing object. 
       inline.Append(docPr, clonedGraphic); 
   OpenXmlElement parent = sdt.Parent; 
   parent.InsertAfter(p, sdt); 

WorksheetPart XLGetWorksheetPartByName(SpreadsheetDocument document, string sheetName) 
   WorkbookPart wbPart = document.WorkbookPart; 
   // Find the sheet with the supplied name, and then use that Sheet object 
   // to retrieve a reference to the appropriate worksheet. 
   Excel.Sheet theSheet = wbPart.Workbook.Descendants<Excel.Sheet>() 
      .Where(s => s.Name == sheetName).FirstOrDefault(); 
   if (theSheet == null) 
       throw new ArgumentException("sheetName"); 
   return (WorksheetPart)(wbPart.GetPartById(theSheet.Id)); 
uint GetMaxDocPrId(MainDocumentPart mainPart) 
   uint max = 1; 
   // Get the maximum ID value of the docPr elements. 
   foreach (WP.DocProperties docPr in mainPart.Document.Descendants<WP.DocProperties>()) 
      uint id = docPr.Id; 
      if (id > max) max = id; 
   return max; 

Step 5d – Import Table Data from Excel to Word

In this step, you import a table from Excel to the template. Here are the steps that you need to do this task.

  1. Create a Word table with the Open XML SDK 2.0 that holds the data from Excel.

  2. Open the Excel file with the Open XML SDK 2.0.

  3. Retrieve the specific sheet that contains the table of data.

  4. In the found sheet, locate the SheetData object which contains the table of data.

  5. For every row in the SheetData object, retrieve all of the cell data.

  6. After all data in the row is retrieved, create a Word row with the same data.

  7. Append the created Word row to the Word table.

To help with these tasks, you can take advantage of some Open XML SDK 2.0 code examples. For more information, see the 2007 Office System Sample: Open XML Format SDK 2.0 Code Snippets for Visual Studio 2008.

Here is the code that you need to do these tasks.

void ImportTableFromSpreadsheet(MainDocumentPart mainPart, Word.SdtElement sdt, SPFile spreadsheetFileName) 
   ArrayList cellText = new ArrayList(); 
   // Create a Word table. 
   Word.Table tbl = new Word.Table(); 
   Word.TableProperties tblPr = new Word.TableProperties(); 
   Word.TableStyle tblStyle = new Word.TableStyle(); 
   tblStyle.Val = "LightShading-Accent1"; 
   Word.TableWidth tblW = new Word.TableWidth(); 
   tblW.Width = 5000; 
   tblW.Type = Word.TableWidthUnitValues.Pct; 
   byte[] byteArray = spreadsheetFileName.OpenBinary(); 
   using (MemoryStream mem = new MemoryStream()) 
      mem.Write(byteArray, 0, (int)byteArray.Length); 
      using (SpreadsheetDocument mySpreadsheet = SpreadsheetDocument.Open(mem, true)) 
         WorkbookPart workbookPart = mySpreadsheet.WorkbookPart; 
         WorksheetPart worksheetPart = XLGetWorksheetPartByName(mySpreadsheet, "Sheet1"); 
         Excel.SheetData sheetData = 
         foreach (Excel.Row r in sheetData) 
            foreach (Excel.Cell c in r) 
               cellText.Add(XLGetCellValue(c, workbookPart)); 
            Word.TableRow tr = CreateRow(cellText); 
            cellText = new ArrayList(); 
   // Swap out the content control for the SmartArt. 
   OpenXmlElement parent = sdt.Parent; 
   parent.InsertAfter(tbl, sdt); 
Word.TableRow CreateRow(ArrayList cellText) 
   Word.TableRow tr = new Word.TableRow(); 
   // Create cells with simple text. 
   foreach (string s in cellText) 
      Word.TableCell tc = new Word.TableCell(); 
      Word.Paragraph p = new Word.Paragraph(); 
      Word.Run r = new Word.Run(); 
      Word.Text t = new Word.Text(s); 
   return tr; 
// Get the value of a cell, given a file name, sheet name, and address name. 
string XLGetCellValue(Excel.Cell c, WorkbookPart wbPart) 
   string value = null; 
   // If the cell does not exist, return an empty string. 
   if (c != null) 
      value = c.InnerText; 
      // If the cell represents an integer number, you are finished. 
      // For dates, this code returns the serialized value that 
      // represents the date. The code handles strings and Boolean values 
      // individually. For shared strings, the code looks up the corresponding 
      // value in the shared string table. For Boolean values, the code converts 
      // the value into the words TRUE or FALSE. 
      if (c.DataType != null) 
         switch (c.DataType.Value) 
            case Excel.CellValues.SharedString: 
              // For shared strings, look up the value in the shared strings table. 
              var stringTable =  wbPart.GetPartsOfType<SharedStringTablePart>().FirstOrDefault(); 
              // If the shared string table is missing, something is wrong. 
              // Just return the index that you found in the cell. 
              // Otherwise, look up the correct text in the table. 
              if (stringTable != null) 
                 value =  stringTable.SharedStringTable.ElementAt(int.Parse(value)).InnerText; 
          case Excel.CellValues.Boolean: 
             switch (value) 
                case "0": 
                   value = "FALSE"; 
                   value = "TRUE"; 
   return value; 

Step 6 – Offer the Assembled Document to the User

Now offer the assembled document to the user by using an Open, Save, or Cancel dialog box as shown in Figure 5.

Figure 5. File Download dialog presented to the user

File Download dialog

Here is the code example to create this dialog based on a document that is in memory.

void btnSubmit_Click(object sender, EventArgs e) 
   // For the code that goes here, see step 4.
   HttpResponse resp = HttpContext.Current.Response; 
   resp.AddHeader("Content-Disposition", "attachment; filename=Assembled Document.docx"); 
   resp.ContentEncoding = System.Text.Encoding.UTF8; 
   resp.OutputStream.Write(mem.ToArray(), 0, (int)mem.Length); 

Step 7 - Sign the Project

The final step in creating the project is to sign the project with a strong-name key file, if one does not already exist.

To sign the project with a strong-name key file

  1. With the DocumentAssembly node selected in Solution Explorer, right-click, and then click Properties.

  2. In the Project Designer, click the Signing tab.

  3. Select the Sign the assembly check box.

  4. Specify a new key file. In the Choose a strong name key file drop-down list, select <New...>. Note that new key files are always created in the .pfx format.

    The Create Strong Name Key dialog box appears.

  5. In the Create Strong Name Key dialog box, type a name and password for the new key file, and then click OK.

After putting everything together and running the code, you have a document that contains all the content that is contained within the library merged into a final report as shown in Figure 6.

Figure 6. Completed document with the merged components

Completed document with merged components

This article describes how you can combine the power of SharePoint 2010 Web Parts and the Open XML SDK 2.0 to simplify the assembly of documents from different parts. By using content controls and alt-chunks, it is easy to insert whole sections from other documents. I encourage you to experiment more with the techniques described here to create your own custom solutions.