Document Parsing and Content Types

Last modified: November 01, 2010

Applies to: SharePoint Foundation 2010

When Microsoft SharePoint Foundation invokes a document parser to promote document properties, the parser writes all document properties to an instance of the IParserPropertyBag Interface. SharePoint Foundation then determines which properties in the property bag match columns on the document library. If the property bag indicates that the document has an assigned content type, and the content type is supported by the document library, SharePoint Foundation promotes the document properties that match the columns that are included in the content type.

For more information, see Document Property Promotion and Demotion.

Using the document parser interface, document parsers can access the content type that is assigned to a document and store the content type in the document itself. In addition, document parsers can update the content type definition that is stored in a document so that it matches the version of the content type definition that is used by a list or document library.

When SharePoint Foundation invokes the parser to parse a document, if the parser writes the document's content type to the property bag object as a document property, SharePoint Foundation compares the content type ID in the document with the content type IDs that are associated with the document library to which the document is being uploaded. If the document's content type is one that is associated with the document library, SharePoint Foundation promotes the appropriate document properties and saves the document. SharePoint Foundation also updates the content type schema in the property bag object and expects the parser to update any content type schema that is embedded in the document.

However, in some cases, the document’s content type may not actually be associated with the document library to which the user is uploading the document. For example, the user might have created the document from a document template that contained a content type; or the user might move a document from one document library to another.

If the document’s content type is not associated with the document library, SharePoint Foundation takes the following actions:

  • If the document contains a document property for content type, but that document property is empty, SharePoint Foundation invokes the parser to demote the default list content type for the document library into the document. SharePoint Foundation then promotes the document properties that match columns in the default list content type and stores the document.

    This occurs if the document has not yet been assigned a content type.

  • If the document is assigned a content type that is not associated with the document library, SharePoint Foundation determines whether the document library allows any content type. If it does, SharePoint Foundation leaves the document’s content type unchanged. SharePoint Foundation does not promote the document content type; however, it does promote any document properties that match document library columns.

    You can set lists to allow any content type. To do this, you add the Unknown Document Type content type to the list, and then documents of any content type can be uploaded to the list without having their content types overwritten. This enables users to move a document to the list without losing the document's metadata, as would occur if the content type was overwritten.

  • If the document is assigned a content type that is not associated with the document library, and the document library does not allow any content types, SharePoint Foundation invokes the parser to demote the default list content type for the document library into the document. SharePoint Foundation then promotes the document properties that match columns in the default list content type and stores the document.

The following figure details the actions taken by SharePoint Foundation if the parser includes the document's content type as a document property in the property bag that is returned to SharePoint Foundation when the parser parses a document.

Logic flow of document parser process

SharePoint Foundation never promotes a document's content type onto a document library.

Show: