Page Publishing Using SharePoint Document Converters (ECM)

Last modified: September 21, 2009

Applies to: SharePoint Server 2010

The Web content management feature in Microsoft SharePoint Server 2010 includes the ability to convert documents into Web pages that can be published to a specified location, and updated from the source document as necessary. This enables users to author documents in the client application they choose and take advantage of the features that application offers, store those documents in SharePoint Server 2010, and then have SharePoint Server 2010 generate a publishing page from the document.

The document to page conversion process is built on the document converter infrastructure in SharePoint Server 2010. You start the conversion through the user interface or by using either the Add or UpdateContentFromSourceDocument methods. The document to convert is passed, along with an optional XML file containing converter settings, to the DocConversionLoadBalancerService service, which in turn calls the DocConversionLauncherService service. The DocConversionLauncherService service launches the specified converter, which converts the document into HTML using the configuration settings that are passed to the converter. The converter produces a fully-formed HTML document.

Finally, the document to page converter infrastructure performs post-processing that accomplishes the following:

  • Separation of the HTML generated by the converter from the contents of the <Body> tag and any inline <Styles> tags.

  • Creation of a new page, or updates of an existing page, at the specified location and using the selected page layout.

  • Placement of the <Body> and <Styles> content data into the specified fields in that page

The following figure shows the document to page conversion process.

Document to page conversion process

The post-processing steps of publishing a page and inserting the HTML content generated by the converter into that page differ from and replace the standard post-processing procedures of standard document converters.

Post-processing for standard document converters includes copying metadata from the original document to the converted document, and placing the converted document directly into the same document library as the original document. For more information about the standard document conversion process, see Document Converters in SharePoint Server 2010 (ECM).

SharePoint Server 2010 includes four document to page converters:

  • Docx file to Web page

  • Docm file to Web page

  • Microsoft Office InfoPath file to Web page

  • Generic XML file to Web page; converts an XML file to a Web page by means of an XSLT transformation specified by the user. The XSLT must be able to convert XML to HTML.

Relationship Between Original Document and Published Page

Unlike standard document conversions, in document to page conversions, both the published page and the original document retain object model properties that represent links to each other. The published page includes a property that represents the original document on which it is based; similarly, the original document includes a property that represents the last published page generated from it.

You can use a single converter to publish multiple pages from the same original document. You can also use multiple converters to publish pages from the same original document. You can even publish multiple pages from the same original document, and into the same document library, if you specify different conversion settings.

Be aware that only the property pointing to the last Web page created is stored with the original document.

Synchronous and Asynchronous Conversion

The user can select whether to run the document to page conversion immediately, or as an asynchronous timer job, either through the user interface or programmatically.

Be aware that the speed of the conversion can be impacted by the following:

  • The number of document converter launcher services

  • The volume of conversion requests

If you call a converter synchronously, and someone is already using the first launcher to convert a file, the call is routed to the second launcher, and so on until an open launcher is found. However, if you only have a single launcher, or all the launchers are in use, the conversion call fails. SharePoint Server 2010 handles this situation by resubmitting the conversion as an asynchronous job. In such a case, your conversion may take longer because the timer job commences, finds the conversion request, and then continues the conversion. This resubmission process occurs regardless of whether the conversion is initiated through the user interface or programmatically.