Performance Issues When Synchronizing with SharePoint Foundation
Last modified: October 08, 2009
Applies to: SharePoint Foundation 2010
There are certain performance issues to keep in mind when you are synchronizing with a server. These include items that affect performance, such as latency, throughput, bandwidth, paging, and filtering and ordering to return specific datasets.
Latency, the time delay between when the user makes a request and when they receive information back from the server is very important to users. There are also often limits on the amount of time allowed to process a request on a database or on a front-end Web server, as well as limits on the size of the request itself, so that extremely long requests can turn into denied requests.
Using the rowLimit property on GetListItemChangesSinceToken to limit the amount of data requested each time is crucial for the above reasons, but it should be clear that using this property will also increase total time required for the synchronization process to complete.
Obviously, reducing the total amount of cycles required to process a request helps performance by reducing latency. However, with multiple clients, it is more important to reduce the adverse effects one client has on the others. Most of the time, it is easier, simpler, safer, and more effective to require that the server perform some processing than to implement the same processing on multiple clients. However, to increase throughput, it is almost always better to process on the client. Although the server will likely have more processing power, the client will likely have more available CPU time. Data requests to the server from a synchronization client should be as small and simple as possible.
When performing a full synchronization (no change token), the client should request a maximum number of items returned per page by using the rowLimit parameter. If the filtered number of items in the list is greater than the maximum number of items returned per page, the server will return a ListItemCollectionPositionNext attribute to be used to request the next page.
Only the current change token of the list on the first page is returned, so that changes made to the first page are not lost. The client stores the change token from the first page for a subsequent incremental synchronization.
Secondary pages do not include list and global properties such as permissions, alternate urls, and Time to Live (TTL) values.
Using the Row Limit
rowLimit is also supported on incremental synchronization (change token supplied), but for an incremental synchronization, this element limits the processing of the internal change log. Also, there is an internal limit of 100 rows per page. Although the client can be sure the number of items returned will never be greater than that limit, in certain circumstances all changes may not have been synchronized, even if the number of items returned is smaller than the limit. This occurs when you stop processing the change log as soon as you reach a number of updates that is equal to the limit. When that is the case, you should return the MoreChanges attribute to indicate there are more changes in the change log. Instead of waiting for the next synchronization update, the client should request more changes immediately by using the returned change token.
The rowLimit affects incremental synchronizations as follows:
There is an internal limit of 100 rows per page. There is no modified time index that can be used to filter the items returned. In addition, SQL Server has a limit of 160 on the number of ors in a query, and as it approaches this number it starts performing poorly. 100 gives us a potential extra 60 as part of the filter requested by the client.
We could have made several separate SQL queries, but that would imply supporting all ordering and filtering on the middle tier.
For this reason, you must return a change token that is not current, so that extra changes can be processed on a separate call. You can still look at the entire change log to better determine the latest point at which the number of items returned would be smaller than the limit, but even this would not be accurate without filtering on the middle tier.
When you apply filtering you can return a specific set of items in a list, rather than the entire list. The two most common scenarios in which filtering is used include folder synchronization, where the user gets only the items inside a folder, and for certain Group Board scenarios where the user gets only the items that are associated with that user.
You can filter by using the contains parameter or the query parameter. Contains is more restrictive since it is basically the Where clause of a Collaborative Application Markup Language (CAML) query, while query is the full query. Contains is safer to use because you can optimize certain scenarios.
The Query parameter is more powerful and flexible than the contains parameter but you must understand how it can affect performance. Some ways of structuring the query parameter can affect performance include:
A client should avoid filtering by using a nonindexed column. Otherwise, fetching a page requires a scan of the entire list the number of items requested are found.
A client should also avoid ordering by a column unless that column is indexed. Otherwise, fetching a page will, at a minimum, require a sort on the entire filtered dataset.
If the filter is not on the same indexed column as the order, then SQL Server may still scan the entire list to avoid sorting the filtered dataset.
An incremental synchronization has an implicit filter. You can also request items with a specific identifier (ID). In this case, the client should always return items ordered by ID only. You can filter by other categories as well, because the dataset is restricted to a maximum of 100.
To filter by folder, you can use the Folder query option, but the list should be ordered by the FileLeafRef. A recursive query should first be ordered by FileDirRef as well.
There is also a way to filter by multiple folders by using code such as the following.
[add code header]
"<Or><BeginsWith><FieldRef Name="FileRef"/><Value Type="Note">Shared Documents/folder1/</Value></BeginsWith><BeginsWith><FieldRef Name="FileRef"/><Value Type="Note">Shared Documents/folder2/</Value></BeginsWith></Or>".
This will synchronize the full contents of folder1 and folder2.
The client should use this with the contains parameter and add the following query option.
This will ensure that the SQL Server query is optimized appropriately by ordering it as FileDirRef, FileLeafRef and constraining the right columns.