Processing Documents in Bulk Using SharePoint 2010 and Open XML 2.0

Summary:  Learn how to process documents in bulk automatically by using the Open XML SDK 2.0 for Microsoft Office to generate specific documents from a master document, SharePoint 2010 to store those documents, Word Automation Services to convert them to XPS files to print, and .NET Framework code to send the documents to the printer.

Applies to: Business Connectivity Services | Office 2010 | Open XML | SharePoint Designer 2010 | SharePoint Foundation 2010 | SharePoint Online | SharePoint Server 2010 | Visual Studio | Word Autmomation Services

Published:  April 2011

Provided by:  Bob McClellan

Contents

  • Overview of Processing Documents in Bulk Using SharePoint 2010 and Open XML SDK 2.0

  • Preparing the XML File and Master Word Document

  • Retrieving and Reading the XML File from SharePoint 2010

  • Retrieving the Master Word Document from the SharePoint Library

  • Binding the Content Controls to Custom XML

  • Generating the Specific Word Documents

  • Using Word Automation Services to Convert the Documents to XPS Files

  • Printing the Generated and Converted XPS Documents

  • Conclusion

  • Additional Resources

Click to grab code  Download the code

Video  Watch the video

Overview of Processing Documents in Bulk Using SharePoint 2010 and Open XML SDK 2.0

Many businesses need a method to generate thousands of documents from a single master document and then print them. This article shows how to automate that process by using the Open XML SDK 2.0 for Microsoft Office to generate specific documents from the master document, Microsoft SharePoint 2010 to store those documents, Word Automation Services to convert them for printing, and Microsoft .NET Framework code to send the documents to the printer.

The basis of this article is a scenario in which you run a business that wants to print form letters to send to customers. You have a data source that you can export into an XML file and then use as input to the process. The automated solution uses a Microsoft Word 2010 document that is set up as the master document. It also uses content controls that are tagged to match the XML file. Both the master document and the XML file are stored in a SharePoint document library. The automated process performs the following steps:

  1. Retrieve the XML file from SharePoint 2010.

  2. Retrieve the master document from SharePoint 2010 and bind the content controls to custom XML.

  3. Generate a document for each record in the XML file, and then store the resulting Word 2010 document in the SharePoint document library.

  4. Use Word Automation Services to create XPS files from the generated Word 2010 documents.

  5. Retrieve each XPS document from the SharePoint document library and send it to the printer.

Note

The process uses only the SharePoint document library to store the documents. Nothing is stored in a local file system.

Before you begin to create this automated process, ensure that you have the following:

  • Write access to a SharePoint 2010 document library

  • Microsoft Visual Studio 2010 Professional

  • Word 2010

A virtual machine that meets all these requirements, 2010 Information Worker Demonstration and Evaluation Virtual Machine (RTM), is available for download from Microsoft. If you use that virtual machine, you can follow the steps in this article exactly. You only have to have the "A" machine.

Preparing the XML File and Master Word Document

The first step is to create an XML file that has the source data and that specifies which master document and printer to use. The following is a sample XML file.

<?xml version="1.0" encoding="utf-8"?>
<Automation Master="Invoice Master" Printer="Send to OneNote 2010">
    <Records>
        <Record>
            <InvoiceNum>200</InvoiceNum>
            <Name>John Smith</Name>
            <CompanyName>A. Datum Corporation</CompanyName>
            <StreetAddress>5678 N. Maple St.</StreetAddress>
            <City>Boston</City>
            <State>MA</State>
            <ZipCode>01234</ZipCode>
            <Phone>555-0100/Phone>
            <CustID>7733-99</CustID>
            <Salesperson>Adams</Salesperson>
            <DueDate>5/31/2008</DueDate>
            <Total>4,556.88</Total>
        </Record>
        <Record>
            <InvoiceNum>201</InvoiceNum>
            <Name>Kim Akers</Name>
            <CompanyName>Proseware, Inc.</CompanyName>
            <StreetAddress>3456 W. Elm St.</StreetAddress>
            <City>Los Angeles</City>
            <State>CA</State>
            <ZipCode>12345</ZipCode>
            <Phone>555-0110</Phone>
            <CustID>7736-94</CustID>
            <Salesperson>Halstead</Salesperson>
            <DueDate>6/30/2008</DueDate>
            <Total>1,200.00</Total>
        </Record>
        <Record>
            <InvoiceNum>202</InvoiceNum>
            <Name>Mark Hanson</Name>
            <CompanyName>Fourth Coffee</CompanyName>
            <StreetAddress>45678 E. Hickory Ave.</StreetAddress>
            <City>New York</City>
            <State>NY</State>
            <ZipCode>23456</ZipCode>
            <Phone>555-0145</Phone>
            <CustID>7653-00</CustID>
            <Salesperson>Patel</Salesperson>
            <DueDate>5/15/2008</DueDate>
            <Total>9,554.22</Total>
        </Record>
    </Records>
</Automation>

The Automation element contains a Template attribute that specifies the master document name, and a Printer attribute that specifies the printer to use for the generated documents. The Records element contains a Record element that is used to generate each document. The child elements of the Record are used to fill in specific content controls that appear in the master document.

The master document is a standard Word 2010 document that has content controls in positions where you want to show data from a record. Tag them to match the element name. If you are not familiar with content controls in Word, use the following steps to add them to a document.

Note

If you already have the Developer tab displayed in Microsoft Word 2010, skip the following procedure and go directly to the one that follows it, To add and configure a content control.

To show the Developer tab on the ribbon in Word 2010

  1. On the ribbon, click the File tab, and then click Options in the left pane.

  2. In the Word Options dialog box, click Customize Ribbon.

  3. In Customize the Ribbon list on the right side, click Developer.

  4. Click OK.

    Figure 1. Showing the Developer tab on the ribbon

    Showing the Developer tab on the ribbon

The next step is to set specific parts of the document as content controls. The content controls mark where each generated document displays the values from each record.

To add and configure a content control

  1. As you work in your document, include some generic values for the text that you want to replace in the generated document. For example, you might type XXXCompany where you want the company name to appear.

  2. Select the text that you want to replace.

  3. In the Developer tab on the ribbon, click Rich Text Content Control in the Controls group (the blue Aa in the upper-left corner of the group).

  4. Click Properties in the Controls group to open the Content Control Properties dialog box.

  5. Type the tag you want in the Tag field. The value that you specify should match the element name that is a child of the Record element in the XML file.

For example, the content control that shows the company name uses CompanyName as its tag.

Figure 2. Setting the tag for a content control

Setting the tag for a Content Control

 

The sample code includes a sample master document, Invoice Template.docx, with content controls and tags that match the sample XML file, Automation.xml. The sample program will obtain those files from a SharePoint document library. To run the sample, create a new document library named Automation and then upload the two files.

Retrieving and Reading the XML File from SharePoint 2010

The program begins by reading Automation.xml, the XML file from the document library that is specified on the command line. The sample project specifies http://intranet.com as the site and Automation as the name of the document library, but you can change those values in the Debug options. Or, you can build an executable file and then specify the site and document library from the command line when you run it.

The site name is used to create an SPSite object. The library is then found by name. The docPath variable is built to specify the XML file name. The XML can be loaded into an XDocument object for convenient access by using standard LINQ to XML methods. The values for the Template attribute and Printer attribute are read from the Automation element, as shown in the following code.

static void Main(string[] args)
{
    string siteURL = args[0];
    string docLib = args[1];

    using (SPSite spSite = new SPSite(siteURL))
    {
        SPDocumentLibrary library = spSite.RootWeb.Lists[docLib] as
          SPDocumentLibrary;
        string docPath = spSite.MakeFullUrl(
          library.RootFolder.ServerRelativeUrl) + "/Automation.xml";
        SPFile file = spSite.RootWeb.GetFile(docPath);
        XDocument automation = XDocument.Load(
          new StreamReader(file.OpenBinaryStream()));
        XElement autoElement = automation.Element("Automation");
        string templateName = autoElement.Attribute("Template").Value;
        string printerName = autoElement.Attribute("Printer").Value;

Retrieving the Master Word Document from the SharePoint Library

The next step is to retrieve the master Word document from the SharePoint library.

To retrieve the master Word document

  1. Create an SPFile object for the document.

  2. Create a new MemoryStream to hold the document in memory for editing.

  3. Open a binary stream that reads the document in the document library.

  4. Create a BinaryReader for the source document and a BinaryWriter for the destination, the memory stream.

  5. Use the Write method of the BinaryWriter object to fill the memory stream from the SharePoint document.

After the memory stream contains a copy of the document, you can use the stream to create a Package object and then a WordprocessingDocument object, which is an Open XML object that enables the program to modify the master document.

// Open the template document.
docPath = spSite.MakeFullUrl(library.RootFolder.ServerRelativeUrl)
  + "/" + templateName + ".docx";
SPFile templateFile = spSite.RootWeb.GetFile(docPath);
Stream docStream = new MemoryStream();
Stream docTemplateStream = templateFile.OpenBinaryStream();
BinaryReader docTemplateReader = new BinaryReader(docTemplateStream);
BinaryWriter docWriter = new BinaryWriter(docStream);
docWriter.Write(docTemplateReader.ReadBytes(
  (int)docTemplateStream.Length));
docWriter.Flush();
docTemplateReader.Close();
docTemplateStream.Dispose();
Package package = Package.Open(docStream, FileMode.Open,
  FileAccess.ReadWrite);
WordprocessingDocument template = WordprocessingDocument.Open(package);

Binding the Content Controls to Custom XML

You must modify the Word document to bind the content controls to custom XML in order to easily update the document with each record. The code uses two methods to complete the binding. The first method, CreateCustomXML, adds the custom XML part. The second method, BindControls, modifies the content controls in the main document so that they are bound to the new custom XML part. This is shown in the following code example.

string guid;
Uri XMLUri = CreateCustomXML(template, automation.Descendants("Record").First(), out guid);
BindControls(template, guid);
template.Close();
docPath = spSite.MakeFullUrl(library.RootFolder.ServerRelativeUrl)
  + "/Output 00001.docx";
spSite.RootWeb.Files.Add(docPath, docStream, true);

After the document is modified, it is saved to the document library as Output 00001. Because the first record was used to create the custom XML, this document has the correct values. The remaining documents are created by repeatedly modifying the memory stream.

The CreateCustomXML method creates a CustomXML part and then fills it with a copy of the first Record element from the Automation.xml file. A new GUID is generated for the data store, and then a CustomXMLPropertiesPart is created with that GUID. This is shown in the following code example.

static Uri CreateCustomXML(WordprocessingDocument document,
  XElement customXML, out string guid)
{
    CustomXmlPart XMLPart = document.MainDocumentPart.AddCustomXmlPart(
      CustomXmlPartType.CustomXml);
    StreamWriter sw = new StreamWriter(
      XMLPart.GetStream(FileMode.Create, FileAccess.ReadWrite));
    sw.Write(customXML.ToString());
    sw.Flush();
    sw.Close();
    guid = "{" + Guid.NewGuid().ToString().ToUpper() + "}";
    CustomXmlPropertiesPart propPart = XMLPart.AddNewPart
      <CustomXmlPropertiesPart>();
    propPart.DataStoreItem = new DataStoreItem(
      new SchemaReferences()) { ItemId = guid };
    return XMLPart.Uri;
}

The BindControls method searches through the main document for content controls. These are identified by the sdtPr element, and known as SdtProperties in the Open XML SDK 2.0. The program reads the tag value and uses it to create a dataBinding element to the data store by using the previously created GUID. This is shown in the following code example.

static void BindControls(WordprocessingDocument document, string guid)
{
    foreach (SdtProperties item in
      document.MainDocumentPart.Document.Descendants<SdtProperties>())
    {
        string tag = item.Descendants<Tag>().First<Tag>()
          .Val.ToString();
        item.Append(new DataBinding() { XPath = "/Record/" + tag,
          StoreItemId = guid });
    }
}

Generating the Specific Word Documents

Now that the binding is complete, it is much easier to generate the remaining documents. The code uses a count variable to number the output file names. For each Record element in the XML file, a document is created that has that record in the custom XML part. As shown in the following example, the code skips the first document because it is already generated and saved. As each variation is generated, it is saved under a new name in the SharePoint document library.

// Loop through all the records from the XML file.
int count = 1;
foreach (XElement element in automation.Descendants("Record"))
{
    if (count != 1)
    {
        package = Package.Open(docStream, FileMode.Open,
          FileAccess.ReadWrite);
        template = WordprocessingDocument.Open(package);
        foreach (CustomXmlPart part in
          template.MainDocumentPart.CustomXmlParts)
        {
            if (part.Uri == XMLUri)
            {
                Stream stream = part.GetStream(FileMode.Create,
                  FileAccess.ReadWrite);
                StreamWriter sw = new StreamWriter(stream);
                sw.Write(element.ToString());
                sw.Flush();
                sw.Close();
                break;
            }
        }
        template.Close();
        docPath = spSite.MakeFullUrl(
          library.RootFolder.ServerRelativeUrl) + "/Output "
          + count.ToString("D5") + ".docx";
        spSite.RootWeb.Files.Add(docPath, docStream, true);
    }
    count++;
}

Using Word Automation Services to Convert the Documents to XPS Files

The next section of the code uses Word Automation Services to convert all of the Word documents into XPS files that can be printed easily without using Word. The job is configured and started. For simplicity, the job is set to convert all the Word documents, including the master document, but only the numbered documents are printed.

Word Automation Services jobs are processed periodically as defined in the options for the service. The for loop at the end waits until the service has finished processing all of the generated documents, after which the document library contains an XPS version of each generated Word document.

// Use Word Automation Services to convert to XPS files.
ConversionJob job = new ConversionJob(WordAutomationServicesName);
job.UserToken = spSite.UserToken;
job.Settings.UpdateFields = true;
job.Settings.OutputFormat = SaveFormat.XPS;
job.Settings.OutputSaveBehavior = SaveBehavior.AlwaysOverwrite;
SPList listToConvert = spSite.RootWeb.Lists[docLib];
job.AddLibrary(listToConvert, listToConvert);
job.Start();
for (; ; )
{
    Thread.Sleep(5000);
    ConversionJobStatus status = new ConversionJobStatus(WordAutomationServicesName, job.JobId, null);
    if (status.Count == status.Succeeded + status.Failed)
        break;
}

Printing the Generated and Converted XPS Documents

All that remains now is to print the generated and converted documents. Each XPS document is located by name, read from the document library, and then sent to the print queue by using the XpsDocumentWriter object.

// Print output XPS files.
LocalPrintServer srv = new LocalPrintServer();
PrintQueue pq = srv.GetPrintQueue(printerName);
for (int num = 1; num < count; num++)
{
    XpsDocumentWriter xdw = PrintQueue.CreateXpsDocumentWriter(pq);
    docPath = spSite.MakeFullUrl(library.RootFolder.ServerRelativeUrl)
      + "/Output " + num.ToString("D5") + ".xps";
    templateFile = spSite.RootWeb.GetFile(docPath);
    package = Package.Open(templateFile.OpenBinaryStream(),
      FileMode.Open, FileAccess.Read);
    XpsDocument xdoc = new XpsDocument(package);
    xdoc.Uri = new Uri(docPath);
    xdw.Write(xdoc.GetFixedDocumentSequence());
    xdoc.Close();
}

Conclusion

This article describes how to use SharePoint 2010 and the Open XML SDK 2.0 for Microsoft Office to automatically generate and process multiple documents from a single master document and then to print them. The sample code that accompanies this article, SharePoint 2010: Processing Documents in Bulk Using SharePoint 2010 and Open XML, works for any Microsoft Word document, provided that the tags on the content controls match the element names in the XML file. The XML file can contain as many records as you want. The documents are all saved on the server, so there is a record of what was generated, and they are also automatically printed. The result is an automated process that requires no user intervention from start to finish.

Additional Resources

For more information, see the following resources: