Developing with SharePoint 2010 Word Automation Services

Summary:  Learn to use Word Automation Services to do server-side document conversions to and from a variety of document formats. By using the Open XML SDK, you can accomplish tasks that are difficult such as updating the table of contents or repaginating documents.

Applies to: Business Connectivity Services | Office 2010 | Open XML | SharePoint Designer 2010 | SharePoint Foundation 2010 | SharePoint Online | SharePoint Server 2010 | Visual Studio | Word Autmomation Services

Provided by:   Eric White, Microsoft Corporation | Tristan Davis, Microsoft Corporation | Zeyad Rajabi, Microsoft Corporation

Contents

  • Using Word Automation Services to Change Document Formats

  • One Word Automation Services Scenario

  • How Word Automation Services Works

  • Building a Word Automation Services application

  • Monitoring Conversion Status

  • Identifying Documents That Failed to Convert

  • Deleting Source Files after Conversion

  • Integrating with the Open XML SDK

  • Conclusion

  • Additional Resources

Click to grab code  Download code

Using Word Automation Services to Change Document Formats

There are some tasks that are difficult when using the Welcome to the Open XML SDK 2.0 for Microsoft Office, such as repagination, conversion to other document formats such as PDF, or updating of the table of contents, fields, and other dynamic content in documents. Word Automation Services is a new feature of SharePoint 2010 that can help in these scenarios. It is a shared service that provides unattended, server-side conversion of documents into other formats, and some other important pieces of functionality. It was designed from the outset to work on servers and can process many documents in a reliable and predictable manner.

Using Word Automation Services, you can convert from Open XML WordprocessingML to other document formats. For example, you may want to convert many documents to the PDF format and spool them to a printer or send them by e-mail to your customers. Or, you can convert from other document formats (such as HTML or Word 97-2003 binary documents) to Open XML word-processing documents.

In addition to the document conversion facilities, there are other important areas of functionality that Word Automation Services provides, such as updating field codes in documents, and converting altChunk content to paragraphs with the normal style applied. These tasks can be difficult to perform using the Open XML SDK 2.0. However, it is easy to use Word Automation Services to do them. In the past, you used Word Automation Services to perform tasks like these for client applications. However, this approach is problematic. The Word client is an application that is best suited for authoring documents interactively, and was not designed for high-volume processing on a server. When performing these tasks, perhaps Word will display a dialog box reporting an error, and if the Word client is being automated on a server, there is no user to respond to the dialog box, and the process can come to an untimely stop. The issues associated with automation of Word are documented in the Knowledge Base article Considerations for Server-side Automation of Office.

One Word Automation Services Scenario

This scenario describes how you can use Word Automation Services to automate processing documents on a server.

  • An expert creates some Word template documents that follow specific conventions. She might use content controls to give structure to the template documents. This provides a good user experience and a reliable programmatic approach for determining the locations in the template document where data should be replaced in the document generation process. These template documents are typically stored in a SharePoint document library.

  • A program runs on the server to merge the template documents together with data, generating a set of Open XML WordprocessingML (DOCX) documents. This program is best written by using the Welcome to the Open XML SDK 2.0 for Microsoft Office, which is designed specifically for generating documents on a server. These documents are placed in a SharePoint document library.

  • After generating the set of documents, they might be automatically printed. Or, they might be sent by e-mail to a set of users, either as WordprocessingML documents, or perhaps as PDF, XPS, or MHTML documents after converting them from WordprocessingML to the desired format.

  • As part of the conversion, you can instruct Word Automation Services to update fields, such as the table of contents.

Using the Welcome to the Open XML SDK 2.0 for Microsoft Office together with Word Automation Services enables you to create rich, end-to-end solutions that perform well and do not require automation of the Word client application.

One of the key advantages of Word Automation Services is that it can scale out to your needs. Unlike the Word client application, you can configure it to use multiple processors. Further, you can configure it to load balance across multiple servers if your needs require that.

Another key advantage is that Word Automation Services has perfect fidelity with the Word client applications. Document layout, including pagination, is identical regardless of whether the document is processed on the server or client.

Supported Source Document Formats

The supported source document formats for documents are as follows.

  1. Open XML File Format documents (.docx, .docm, .dotx, .dotm)

  2. Word 97-2003 documents (.doc, .dot)

  3. Rich Text Format files (.rtf)

  4. Single File Web Pages (.mht, .mhtml)

  5. Word 2003 XML Documents (.xml)

  6. Word XML Document (.xml)

Supported Destination Document Formats

The supported destination document formats includes all of the supported source document formats, and the following.

  1. Portable Document Format (.pdf)

  2. Open XML Paper Specification (.xps)

Other Capabilities of Word Automation Services

In addition to the ability to load and save documents in various formats, Word Automation Services includes other capabilities.

You can cause Word Automation Services to update the table of contents, the table of authorities, and index fields. This is important when generating documents. After generating a document, if the document has a table of contents, it is an especially difficult task to determine document pagination so that the table of contents is updated correctly. Word Automation Services handles this for you easily.

Open XML word-processing documents can contain various field types, which enables you to add dynamic content into a document. You can use Word Automation Services to cause all fields to be recalculated. For example, you can include a field type that inserts the current date into a document. When fields are updated, the associated content is also updated, so that the document displays the current date at the location of the field.

One of the powerful ways that you can use content controls is to bind them to XML elements in a custom XML part. See the article, Building Document Generation Systems from Templates with Word 2010 and Word 2007 for an explanation of bound content controls, and links to several resources to help you get started. You can replace the contents of bound content controls by replacing the XML in the custom XML part. You do not have to alter the main document part. The main document part contains cached values for all bound content controls, and if you replace the XML in the custom XML part, the cached values in the main document part are not updated. This is not a problem if you expect users to view these generated documents only by using the Word client. However, if you want to process the WordprocessingML markup more, you must update the cached values in the main document part. Word Automation Services can do this.

Alternate format content (as represented by the altChunk element) is a great way to import HTML content into a WordprocessingML document. The article, Building Document Generation Systems from Templates with Word 2010 and Word 2007 discusses alternate format content, its uses, and provides links to help you get started. However, until you open and save a document that contains altChunk elements, the document contains HTML, and not ordinary WordprocessingML markup such as paragraphs, runs, and text elements. You can use Word Automation Services to import the HTML (or other forms of alternative content) and convert them to WordprocessingML markup that contains familiar WordprocessingML paragraphs that have styles.

You can also convert to and from formats that were used by previous versions of Word. If you are building an enterprise-class application that is used by thousands of users, you may have some users who are using Word 2007 or Word 2003 to edit Open XML documents. You can convert Open XML documents so that they contain only the markup and features that are used by either Word 2007 or Word 2003.

Limitations of Word Automation Services

Word Automation Services does not include capabilities for printing documents. However, it is straightforward to convert WordprocessingML documents to PDF or XPS and spool them to a printer.

A question that sometimes occurs is whether you can use Word Automation Services without purchasing and installing SharePoint Server 2010. Word Automation Services takes advantage of facilities of SharePoint 2010, and is a feature of it. You must purchase and install SharePoint Server 2010 to use it. Word Automation Services is in the standard edition and enterprise edition.

How Word Automation Services Works

By default, Word Automation Services is a service that installs and runs with a stand-alone SharePoint Server 2010 installation. If you are using SharePoint 2010 in a server farm, you must explicitly enable Word Automation Services.

To use it, you use its programming interface to start a conversion job. For each conversion job, you specify which files, folders, or document libraries you want the conversion job to process. Based on your configuration, when you start a conversion job, it begins a specified number of conversion processes on each server. You can specify the frequency with which it starts conversion jobs, and you can specify the number of conversions to start for each conversion process. In addition, you can specify the maximum percentage of memory that Word Automation Services can use.

The configuration settings enable you to configure Word Automation Services so that it does not consume too many resources on SharePoint servers that are part of your important infrastructure. The settings that you must use are dictated by how you want to use the SharePoint Server. If it is only used for document conversions, you want to configure the settings so that the conversion service can consume most of your processor time. If you are using the conversion service for low priority background conversions, you want to configure accordingly.

Important

We recommend that the number of worker processes be set to no more than one less than the number of processors on your server. If you have a four-processor server, set the number of worker processes to three at most.

If you are running a server farm installation, the number of worker processes should be set to one less than the number of processors on the server that has the least number of processors in the server farm.

We recommend that you configure the system for a maximum of 90 document conversions per worker process per minute.

In addition to writing code that starts conversion processes, you can also write code to monitor the progress of conversions. This lets you inform users or post alert results when very large conversion jobs are completed.

Word Automation Services lets you configure four additional aspects of conversions.

  1. You can limit the number of file formats that it supports.

  2. You can specify the number of documents converted by a conversion process before it is restarted. This is valuable because invalid documents can cause Word Automation Services to consume too much memory. All memory is reclaimed when the process is restarted.

  3. You can specify the number of times that Word Automation Services attempts to convert a document. By default, this is set to two so that if Word Automation Services fails in its attempts to convert a document, it attempts to convert it only one more time (in that conversion job).

  4. You can specify the length of elapsed time before conversion processes are monitored. This is valuable because Word Automation Services monitors conversions to make sure that conversions have not stalled.

Configuring Word Automation Services

Unless you have installed a server farm, by default, Word Automation Services is installed and started in SharePoint Server 2010. However, as a developer, you want to alter its configuration so that you have a better development experience. By default, it starts conversion processes at 15 minute intervals. If you are testing code that uses it, you can benefit from setting the interval to one minute. In addition, there are scenarios where you may want Word Automation Services to use as much resources as possible. Those scenarios may also benefit from setting the interval to one minute.

To adjust the conversion process interval to one minute

  1. Start SharePoint 2010 Central Administration.

  2. On the home page of SharePoint 2010 Central Administration, Click Manage Service Applications.

  3. In the Service Applications administration page, service applications are sorted alphabetically. Scroll to the bottom of the page, and then click Word Automation Services . If you are installing a server farm and have installed Word Automation Services manually, whatever you entered for the name of the service is what you see on this page.

  4. In the Word Automation Services administration page, configure the conversion throughput field to the desired frequency for starting conversion jobs.

  5. As a developer, you may also want to set the number of conversion processes, and to adjust the number of conversions per worker process. If you adjust the frequency with which conversion processes start without adjusting the other two values, and you attempt to convert many documents, you make the conversion process much less efficient. The best value for these numbers should take into consideration the power of your computer that is running SharePoint Server 2010.

  6. Scroll to the bottom of the page and then click OK.

Building a Word Automation Services application

Because Word Automation Services is a service of SharePoint Server 2010, you can only use it in an application that runs directly on a SharePoint Server. You must build the application as a farm solution. You cannot use Word Automation Services from a sandboxed solution.

A convenient way to use Word Automation Services is to write a web service that you can use from client applications.

However, the easiest way to show how to write code that uses Word Automation Services is to build a console application. You must build and run the console application on the SharePoint Server, not on a client computer. The code to start and monitor conversion jobs is identical to the code that you would write for a Web Part, a workflow, or an event handler. Showing how to use Word Automation Services from a console application enables us to discuss the API without adding the complexities of a Web Part, an event handler, or a workflow.

Important

Note that the following sample applications call Sleep(Int32) so that the examples query for status every five seconds. This would not be the best approach when you write code that you intend to deploy on production servers. Instead, you want to write a workflow with delay activity.

To build the application

  1. Start Microsoft Visual Studio 2010.

  2. On the File menu, point to New, and then click Project.

  3. In the New Project dialog box, in the Recent Template pane, expand Visual C#, and then click Windows.

  4. To the right side of the Recent Template pane, click Console Application.

  5. By default, Visual Studio creates a project that targets .NET Framework 4. However, you must target .NET Framework 3.5. From the list at the upper part of the File Open dialog box, select .NET Framework 3.5.

  6. In the Name box, type the name that you want to use for your project, such as FirstWordAutomationServicesApplication.

  7. In the Location box, type the location where you want to place the project.

    Figure 1. Creating a solution in the New Project dialog box

    Creating solution in the New Project box

  8. Click OK to create the solution.

  9. By default, Visual Studio 2010 creates projects that target x86 CPUs, but to build SharePoint Server applications, you must target any CPU.

  10. If you are building a Microsoft Visual C# application, in Solution Explorer window, right-click the project, and then click Properties.

  11. In the project properties window, click Build.

  12. Point to the Platform Target list, and select Any CPU.

    Figure 2. Target Any CPU when building a C# console application

    Changing target to any CPU

  13. If you are building a Microsoft Visual Basic .NET Framework application, in the project properties window, click Compile.

    Figure 3. Compile options for a Visual Basic application

    Compile options for Visual Basic applications

  14. Click Advanced Compile Options.

    Figure 4. Advanced Compiler Settings dialog box

    Advanced Compiler Settings dialog box

  15. Point to the Platform Target list, and then click Any CPU.

  16. To add a reference to the Microsoft.Office.Word.Server assembly, on the Project menu, click Add Reference to open the Add Reference dialog box.

  17. Select the .NET tab, and add the component named Microsoft Office 2010 component.

    Figure 5. Adding a reference to Microsoft Office 2010 component

    Add reference to Microsoft Office 2010 component

  18. Next, add a reference to the Microsoft.SharePoint assembly.

    Figure 6. Adding a reference to Microsoft SharePoint

    Adding reference to Microsoft SharePoint

The following examples provide the complete C# and Visual Basic listings for the simplest Word Automation Services application.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Microsoft.SharePoint;
using Microsoft.Office.Word.Server.Conversions;

class Program
{
    static void Main(string[] args)
    {
        string siteUrl = "https://localhost";
        // If you manually installed Word automation services, then replace the name
        // in the following line with the name that you assigned to the service when
        // you installed it.
        string wordAutomationServiceName = "Word Automation Services";
        using (SPSite spSite = new SPSite(siteUrl))
        {
            ConversionJob job = new ConversionJob(wordAutomationServiceName);
            job.UserToken = spSite.UserToken;
            job.Settings.UpdateFields = true;
            job.Settings.OutputFormat = SaveFormat.PDF;
            job.AddFile(siteUrl + "/Shared%20Documents/Test.docx",
                siteUrl + "/Shared%20Documents/Test.pdf");
            job.Start();
        }
    }
}
Imports Microsoft.SharePoint
Imports Microsoft.Office.Word.Server.Conversions
Module Module1
    Sub Main()
        Dim siteUrl As String = "https://localhost"
        ' If you manually installed Word automation services, then replace the name
        ' in the following line with the name that you assigned to the service when
        ' you installed it.
        Dim wordAutomationServiceName As String = "Word Automation Services"
        Using spSite As SPSite = New SPSite(siteUrl)
            Dim job As ConversionJob = New ConversionJob(wordAutomationServiceName)
            job.UserToken = spSite.UserToken
            job.Settings.UpdateFields = True
            job.Settings.OutputFormat = SaveFormat.PDF
            job.AddFile(siteUrl + "/Shared%20Documents/Test.docx", _
                siteUrl + "/Shared%20Documents/Test.pdf")
            job.Start()
        End Using
    End Sub
End Module

Note

Replace the URL assigned to siteUrl with the URL to the SharePoint site.

To build and run the example

  1. Add a Word document named Test.docx to the Shared Documents folder in the SharePoint site.

  2. Build and run the example.

  3. After waiting one minute for the conversion process to run, navigate to the Shared Documents folder in the SharePoint site, and refresh the page. The document library now contains a new PDF document, Test.pdf.

Monitoring Conversion Status

In many scenarios, you want to monitor the status of conversions, to inform the user when the conversion process is complete, or to process the converted documents in additional ways. You can use the ConversionJobStatus class to query Word Automation Services about the status of a conversion job. You pass the name of the WordServiceApplicationProxy class as a string (by default, "Word Automation Services"), and the conversion job identifier, which you can get from the ConversionJob object. You can also pass a GUID that specifies a tenant partition. However, if the SharePoint Server farm is not configured for multiple tenants, you can pass null (Nothing in Visual Basic) as the argument for this parameter.

After you instantiate a ConversionJobStatus object, you can access several properties that indicate the status of the conversion job. The following are the three most interesting properties.

ConversionJobStatus Properties

Property

Return Value

Count

Number of documents currently in the conversion job.

Succeeded

Number of documents successfully converted.

Failed

The number of documents that failed conversion.

Whereas the first example specified a single document to convert, the following example converts all documents in a specified document library. You have the option of creating all converted documents in a different document library than the source library, but for simplicity, the following example specifies the same document library for both the input and output document libraries. In addition, the following example specifies that the conversion job should overwrite the output document if it already exists.

Download code

Console.WriteLine("Starting conversion job");
ConversionJob job = new ConversionJob(wordAutomationServiceName);
job.UserToken = spSite.UserToken;
job.Settings.UpdateFields = true;
job.Settings.OutputFormat = SaveFormat.PDF;
job.Settings.OutputSaveBehavior = SaveBehavior.AlwaysOverwrite;
SPList listToConvert = spSite.RootWeb.Lists["Shared Documents"];
job.AddLibrary(listToConvert, listToConvert);
job.Start();
Console.WriteLine("Conversion job started");
ConversionJobStatus status = new ConversionJobStatus(wordAutomationServiceName,
    job.JobId, null);
Console.WriteLine("Number of documents in conversion job: {0}", status.Count);
while (true)
{
    Thread.Sleep(5000);
    status = new ConversionJobStatus(wordAutomationServiceName, job.JobId,
        null);
    if (status.Count == status.Succeeded + status.Failed)
    {
        Console.WriteLine("Completed, Successful: {0}, Failed: {1}",
            status.Succeeded, status.Failed);
        break;
    }
    Console.WriteLine("In progress, Successful: {0}, Failed: {1}",
        status.Succeeded, status.Failed);
}
Console.WriteLine("Starting conversion job")
Dim job As ConversionJob = New ConversionJob(wordAutomationServiceName)
job.UserToken = spSite.UserToken
job.Settings.UpdateFields = True
job.Settings.OutputFormat = SaveFormat.PDF
job.Settings.OutputSaveBehavior = SaveBehavior.AlwaysOverwrite
Dim listToConvert As SPList = spSite.RootWeb.Lists("Shared Documents")
job.AddLibrary(listToConvert, listToConvert)
job.Start()
Console.WriteLine("Conversion job started")
Dim status As ConversionJobStatus = _
    New ConversionJobStatus(wordAutomationServiceName, job.JobId, Nothing)
Console.WriteLine("Number of documents in conversion job: {0}", status.Count)
While True
    Thread.Sleep(5000)
    status = New ConversionJobStatus(wordAutomationServiceName, job.JobId, _
                                     Nothing)
    If status.Count = status.Succeeded + status.Failed Then
        Console.WriteLine("Completed, Successful: {0}, Failed: {1}", _
                          status.Succeeded, status.Failed)
        Exit While
    End If
    Console.WriteLine("In progress, Successful: {0}, Failed: {1}", _
                      status.Succeeded, status.Failed)
End While

To run this example, add some WordprocessingML documents in the Shared Documents library. When you run this example, you see output similar to this code snippet,

Starting conversion job
Conversion job started
Number of documents in conversion job: 4
In progress, Successful: 0, Failed: 0
In progress, Successful: 0, Failed: 0
Completed, Successful: 4, Failed: 0

Identifying Documents That Failed to Convert

You may want to determine which documents failed conversion, perhaps to inform the user, or take remedial action such as removing the invalid document from the input document library. You can call the GetItems() method, which returns a collection of ConversionItemInfo() objects. When you call the GetItems() method, you pass a parameter that specifies whether you want to retrieve a collection of failed conversions or successful conversions. The following example shows how to do this.

Download code

Console.WriteLine("Starting conversion job");
ConversionJob job = new ConversionJob(wordAutomationServiceName);
job.UserToken = spSite.UserToken;
job.Settings.UpdateFields = true;
job.Settings.OutputFormat = SaveFormat.PDF;
job.Settings.OutputSaveBehavior = SaveBehavior.AlwaysOverwrite;
SPList listToConvert = spSite.RootWeb.Lists["Shared Documents"];
job.AddLibrary(listToConvert, listToConvert);
job.Start();
Console.WriteLine("Conversion job started");
ConversionJobStatus status = new ConversionJobStatus(wordAutomationServiceName,
    job.JobId, null);
Console.WriteLine("Number of documents in conversion job: {0}", status.Count);
while (true)
{
    Thread.Sleep(5000);
    status = new ConversionJobStatus(wordAutomationServiceName, job.JobId, null);
    if (status.Count == status.Succeeded + status.Failed)
    {
        Console.WriteLine("Completed, Successful: {0}, Failed: {1}",
            status.Succeeded, status.Failed);
        ReadOnlyCollection<ConversionItemInfo> failedItems =
            status.GetItems(ItemTypes.Failed);
        foreach (var failedItem in failedItems)
            Console.WriteLine("Failed item: Name:{0}", failedItem.InputFile);
        break;
    }
    Console.WriteLine("In progress, Successful: {0}, Failed: {1}", status.Succeeded,
        status.Failed);
}
Console.WriteLine("Starting conversion job")
Dim job As ConversionJob = New ConversionJob(wordAutomationServiceName)
job.UserToken = spSite.UserToken
job.Settings.UpdateFields = True
job.Settings.OutputFormat = SaveFormat.PDF
job.Settings.OutputSaveBehavior = SaveBehavior.AlwaysOverwrite
Dim listToConvert As SPList = spSite.RootWeb.Lists("Shared Documents")
job.AddLibrary(listToConvert, listToConvert)
job.Start()
Console.WriteLine("Conversion job started")
Dim status As ConversionJobStatus = _
    New ConversionJobStatus(wordAutomationServiceName, job.JobId, Nothing)
Console.WriteLine("Number of documents in conversion job: {0}", status.Count)
While True
    Thread.Sleep(5000)
    status = New ConversionJobStatus(wordAutomationServiceName, job.JobId, _
                                     Nothing)
    If status.Count = status.Succeeded + status.Failed Then
        Console.WriteLine("Completed, Successful: {0}, Failed: {1}", _
                          status.Succeeded, status.Failed)
        Dim failedItems As ReadOnlyCollection(Of ConversionItemInfo) = _
            status.GetItems(ItemTypes.Failed)
        For Each failedItem In failedItems
            Console.WriteLine("Failed item: Name:{0}", failedItem.InputFile)
        Next
        Exit While
    End If
    Console.WriteLine("In progress, Successful: {0}, Failed: {1}", _
                      status.Succeeded, status.Failed)
End While

To run this example, create an invalid document and upload it to the document library. An easy way to create an invalid document is to rename the WordprocessingML document, appending .zip to the file name. Then delete the main document part (known as document.xml), which is in the Word folder of the package. Rename the document, removing the .zip extension so that it contains the normal .docx extension.

When you run this example, it produces output similar to the following.

Starting conversion job
Conversion job started
Number of documents in conversion job: 5
In progress, Successful: 0, Failed: 0
In progress, Successful: 0, Failed: 0
In progress, Successful: 4, Failed: 0
In progress, Successful: 4, Failed: 0
In progress, Successful: 4, Failed: 0
Completed, Successful: 4, Failed: 1
Failed item: Name:http://intranet.contoso.com/Shared%20Documents/IntentionallyInvalidDocument.docx

Another approach to monitoring a conversion process is to use event handlers on a SharePoint list to determine when a converted document is added to the output document library.

Deleting Source Files after Conversion

In some situations, you may want to delete the source documents after conversion. The following example shows how to do this.

Download code

Console.WriteLine("Starting conversion job");
ConversionJob job = new ConversionJob(wordAutomationServiceName);
job.UserToken = spSite.UserToken;
job.Settings.UpdateFields = true;
job.Settings.OutputFormat = SaveFormat.PDF;
job.Settings.OutputSaveBehavior = SaveBehavior.AlwaysOverwrite;
SPFolder folderToConvert = spSite.RootWeb.GetFolder("Shared Documents");
job.AddFolder(folderToConvert, folderToConvert, false);
job.Start();
Console.WriteLine("Conversion job started");
ConversionJobStatus status = new ConversionJobStatus(wordAutomationServiceName,
    job.JobId, null);
Console.WriteLine("Number of documents in conversion job: {0}", status.Count);
while (true)
{
    Thread.Sleep(5000);
    status = new ConversionJobStatus(wordAutomationServiceName, job.JobId, null);
    if (status.Count == status.Succeeded + status.Failed)
    {
        Console.WriteLine("Completed, Successful: {0}, Failed: {1}",
            status.Succeeded, status.Failed);
        Console.WriteLine("Deleting only items that successfully converted");
        ReadOnlyCollection<ConversionItemInfo> convertedItems =
            status.GetItems(ItemTypes.Succeeded);
        foreach (var convertedItem in convertedItems)
        {
            Console.WriteLine("Deleting item: Name:{0}", convertedItem.InputFile);
            folderToConvert.Files.Delete(convertedItem.InputFile);
        }
        break;
    }
    Console.WriteLine("In progress, Successful: {0}, Failed: {1}",
        status.Succeeded, status.Failed);
}
Console.WriteLine("Starting conversion job")
Dim job As ConversionJob = New ConversionJob(wordAutomationServiceName)
job.UserToken = spSite.UserToken
job.Settings.UpdateFields = True
job.Settings.OutputFormat = SaveFormat.PDF
job.Settings.OutputSaveBehavior = SaveBehavior.AlwaysOverwrite
Dim folderToConvert As SPFolder = spSite.RootWeb.GetFolder("Shared Documents")
job.AddFolder(folderToConvert, folderToConvert, False)
job.Start()
Console.WriteLine("Conversion job started")
Dim status As ConversionJobStatus = _
    New ConversionJobStatus(wordAutomationServiceName, job.JobId, Nothing)
Console.WriteLine("Number of documents in conversion job: {0}", status.Count)
While True
    Thread.Sleep(5000)
    status = New ConversionJobStatus(wordAutomationServiceName, job.JobId, _
                                     Nothing)
    If status.Count = status.Succeeded + status.Failed Then
        Console.WriteLine("Completed, Successful: {0}, Failed: {1}", _
                          status.Succeeded, status.Failed)
        Console.WriteLine("Deleting only items that successfully converted")
        Dim convertedItems As ReadOnlyCollection(Of ConversionItemInfo) = _
            status.GetItems(ItemTypes.Succeeded)
        For Each convertedItem In convertedItems
            Console.WriteLine("Deleting item: Name:{0}", convertedItem.InputFile)
            folderToConvert.Files.Delete(convertedItem.InputFile)
        Next
        Exit While
    End If
    Console.WriteLine("In progress, Successful: {0}, Failed: {1}",
                      status.Succeeded, status.Failed)
End While

Integrating with the Open XML SDK

The power of using Word Automation Services becomes clear when you use it in combination with the Welcome to the Open XML SDK 2.0 for Microsoft Office. You can programmatically modify a document in a document library by using the Welcome to the Open XML SDK 2.0 for Microsoft Office, and then use Word Automation Services to perform one of the difficult tasks by using the Open XML SDK. A common need is to programmatically generate a document, and then generate or update the table of contents of the document. Consider the following document, which contains a table of contents.

Figure 7. Document with a table of contents

Document with table of contents

Let’s assume you want to modify this document, adding content that should be included in the table of contents. This next example takes the following steps.

  1. Opens the site and retrieves the Test.docx document by using a Collaborative Application Markup Language (CAML) query.

  2. Opens the document by using the Open XML SDK 2.0, and adds a new paragraph styled as Heading 1 at the beginning of the document.

  3. Starts a conversion job, converting Test.docx to TestWithNewToc.docx. It waits for the conversion to complete, and reports whether it was converted successfully.

Download code

Console.WriteLine("Querying for Test.docx");
SPList list = spSite.RootWeb.Lists["Shared Documents"];
SPQuery query = new SPQuery();
query.ViewFields = @"<FieldRef Name='FileLeafRef' />";
query.Query =
  @"<Where>
      <Eq>
        <FieldRef Name='FileLeafRef' />
        <Value Type='Text'>Test.docx</Value>
      </Eq>
    </Where>";
SPListItemCollection collection = list.GetItems(query);
if (collection.Count != 1)
{
    Console.WriteLine("Test.docx not found");
    Environment.Exit(0);
}
Console.WriteLine("Opening");
SPFile file = collection[0].File;
byte[] byteArray = file.OpenBinary();
using (MemoryStream memStr = new MemoryStream())
{
    memStr.Write(byteArray, 0, byteArray.Length);
    using (WordprocessingDocument wordDoc =
        WordprocessingDocument.Open(memStr, true))
    {
        Document document = wordDoc.MainDocumentPart.Document;
        Paragraph firstParagraph = document.Body.Elements<Paragraph>()
            .FirstOrDefault();
        if (firstParagraph != null)
        {
            Paragraph newParagraph = new Paragraph(
                new ParagraphProperties(
                    new ParagraphStyleId() { Val = "Heading1" }),
                new Run(
                    new Text("About the Author")));
            Paragraph aboutAuthorParagraph = new Paragraph(
                new Run(
                    new Text("Eric White")));
            firstParagraph.Parent.InsertBefore(newParagraph, firstParagraph);
            firstParagraph.Parent.InsertBefore(aboutAuthorParagraph,
                firstParagraph);
        }
    }
    Console.WriteLine("Saving");
    string linkFileName = file.Item["LinkFilename"] as string;
    file.ParentFolder.Files.Add(linkFileName, memStr, true);
}
Console.WriteLine("Starting conversion job");
ConversionJob job = new ConversionJob(wordAutomationServiceName);
job.UserToken = spSite.UserToken;
job.Settings.UpdateFields = true;
job.Settings.OutputFormat = SaveFormat.Document;
job.AddFile(siteUrl + "/Shared%20Documents/Test.docx",
    siteUrl + "/Shared%20Documents/TestWithNewToc.docx");
job.Start();
Console.WriteLine("After starting conversion job");
while (true)
{
    Thread.Sleep(5000);
    Console.WriteLine("Polling...");
    ConversionJobStatus status = new ConversionJobStatus(
        wordAutomationServiceName, job.JobId, null);
    if (status.Count == status.Succeeded + status.Failed)
    {
        Console.WriteLine("Completed, Successful: {0}, Failed: {1}",
            status.Succeeded, status.Failed);
        break;
    }
}
Console.WriteLine("Querying for Test.docx")
Dim list As SPList = spSite.RootWeb.Lists("Shared Documents")
Dim query As SPQuery = New SPQuery()
query.ViewFields = "<FieldRef Name='FileLeafRef' />"
query.Query = ( _
   <Where>
       <Eq>
           <FieldRef Name='FileLeafRef'/>
           <Value Type='Text'>Test.docx</Value>
       </Eq>
   </Where>).ToString()
Dim collection As SPListItemCollection = list.GetItems(query)
If collection.Count <> 1 Then
    Console.WriteLine("Test.docx not found")
    Environment.Exit(0)
End If
Console.WriteLine("Opening")
Dim file As SPFile = collection(0).File
Dim byteArray As Byte() = file.OpenBinary()
Using memStr As MemoryStream = New MemoryStream()
    memStr.Write(byteArray, 0, byteArray.Length)
    Using wordDoc As WordprocessingDocument = _
        WordprocessingDocument.Open(memStr, True)
        Dim document As Document = wordDoc.MainDocumentPart.Document
        Dim firstParagraph As Paragraph = _
            document.Body.Elements(Of Paragraph)().FirstOrDefault()
        If firstParagraph IsNot Nothing Then
            Dim newParagraph As Paragraph = New Paragraph( _
                New ParagraphProperties( _
                    New ParagraphStyleId() With {.Val = "Heading1"}), _
                New Run( _
                    New Text("About the Author")))
            Dim aboutAuthorParagraph As Paragraph = New Paragraph( _
                New Run( _
                    New Text("Eric White")))
            firstParagraph.Parent.InsertBefore(newParagraph, firstParagraph)
            firstParagraph.Parent.InsertBefore(aboutAuthorParagraph, _
                                               firstParagraph)
        End If
    End Using
    Console.WriteLine("Saving")
    Dim linkFileName As String = file.Item("LinkFilename")
    file.ParentFolder.Files.Add(linkFileName, memStr, True)
End Using
Console.WriteLine("Starting conversion job")
Dim job As ConversionJob = New ConversionJob(wordAutomationServiceName)
job.UserToken = spSite.UserToken
job.Settings.UpdateFields = True
job.Settings.OutputFormat = SaveFormat.Document
job.AddFile(siteUrl + "/Shared%20Documents/Test.docx", _
    siteUrl + "/Shared%20Documents/TestWithNewToc.docx")
job.Start()
Console.WriteLine("After starting conversion job")
While True
    Thread.Sleep(5000)
    Console.WriteLine("Polling...")
    Dim status As ConversionJobStatus = New ConversionJobStatus( _
        wordAutomationServiceName, job.JobId, Nothing)
    If status.Count = status.Succeeded + status.Failed Then
        Console.WriteLine("Completed, Successful: {0}, Failed: {1}", _
                          status.Succeeded, status.Failed)
        Exit While
    End If
End While

After running this example with a document similar to the one used earlier in this section, a new document is produced, as shown in Figure 8.

Figure 8. Document with updated table of contents

Document with updated table of contents

Conclusion

The Open XML SDK 2.0 is a powerful tool for building server-side document generation and document processing systems. However, there are aspects of document manipulation that are difficult, such a document conversions, and updating of fields, table of contents, and more. Word Automation Services fills this gap with a high-performance solution that can scale out to your requirements. Using the Open XML SDK 2.0 in combination with Word Automation Services enables many scenarios that are difficult when using only the Open XML SDK 2.0.

Additional Resources

Click to grab code  Download code