Architecture of External BLOB Storage
Published: May 2010
Before the introduction of the external binary large object (BLOB) store provider (EBS Provider), the semantics of BLOB storage routed the binary data stream associated with a SharePoint file to the Microsoft SQL Server content database, which it shared with the site's structured data. Under that scenario, when you invoked a Save command on the SharePoint file, a parser in the Save path recognized the Save command and promoted a parcel of metadata out of the file stream. Then the metadata, along with the BLOB associated with the file, was stored in the SQL Server content database.
However, after you install, configure, and enable the EBS Provider, the semantics change considerably. (See Figure 1.) Now, instead of the Web application, the middle-tier storage access stack routes BLOB data streams and uses the EBS Provider to store BLOB data in the external BLOB store, and then returns metadata that allows it to retrieve the BLOB on demand. Importantly, the SharePoint Foundation object model is completely insulated from semantics of the EBS Provider, as well as the existence of an external BLOB store. This separation ensures that existing applications and services are fully agnostic to storage implementations. Only the storage access stack is aware of the existence and the semantics of the external BLOB store.
The EBS Provider is your custom implementation of the provider interface, the unmanaged ISPExternalBinaryProvider, and is integrated into the storage access stack as a COM component.
The provider interface provides you with two methods: StoreBinary and RetrieveBinary. The storage access stack recognizes the Save and Open commands and, when the commands are associated with BLOB files, invokes the StoreBinary and RetrieveBinarymethods, respectively.
You can store BLOB data to the external data store by acting on the Save command. Figure 2 is a functional illustration of how a Save command from the front-end Web application is routed from the storage access stack to the EBS Provider and stored externally. Finally, a record of its location is persisted as metadata in the content database.
When a Save command is invoked on the front-end Web application, the application middle-tier logic provides business logic validation, including antivirus checks, property promotion, rights management, and other pre-processing tasks. Then the storage access stack recognizes that the Save command is for a BLOB file. The provider interface passes the request to the EBS Provider, and the EBS Provider saves the binary stream to the external BLOB store.
The EBS Provider then returns the BLOB ID (BlobId) to the interface, and the interface passes the ID to the storage access stack. The access stack then persists the ID and the BLOB metadata in the content database.
The EBS Provider is responsible for returning a unique identifier ([Out] ppbBinaryId) for the BLOB file that it places in the external BLOB store.
Retrieving BLOB data from the external BLOB store is the reverse of the Save operation. When the EBS Provider recognizes an Open command on a file that is associated with a BLOB, it invokes methods on the provider interface to retrieve the file from the external BLOB store. Figure 3 is a functional illustration of how an Open command from the front-end Web application is used by the storage access stack to retrieve the BLOB ID from the content database, and then uses the ID to retrieve the binary stream from the external BLOB store.
The storage access stack retrieves metadata and BlobId by sending a Transact-SQL query to the content database; it then passes the return value (BlobId) to the EBS Provider so it can fetch the appropriate binary file from the external BLOB store by using the RetrieveBinary method on the ISPExternalBinaryProvider interface. The method returns an ILockBytes interface to the storage access stack.
As with the StoreBinary method, the EBS Provider is responsible for logging retrieval events. Windows SharePoint Server logs unexpected HRESULT returns, but otherwise it acts as though returns are simply S_OK or E_FAIL.
EBS Provider errors are mapped by the storage access stack to known error codes.