By Kevin Idzi, Software
Development Engineer, Microsoft Corporation
Introduction
Integration Services package developers often ask about the
best way to get data out of or into a SharePoint list. The SharePoint List
Source and Destination Sample available on the Microsoft SQL Server
Community Samples: Integration Services page on Codeplex provides an
optimized solution with an easy-to-use interface. The sample also includes an
API for accomplishing these tasks efficiently outside of Integration Services.
There a few different ways to extract or load SharePoint
data:
- Use the SharePoint APIs to add or remove items
one by one.
- Use the SharePoint Batch API to generate XML and
submit the XML.
- Use the Lists Web service, which uses the same
XML as the Batch API.
Calling the Web service is a powerful way to transfer data
to or from SharePoint, whether or not you have extensive rights in your
SharePoint environment, because it does not alter the SharePoint server itself
in any way. The Web service also uses the Batch XML structure, which provides
better performance than the server APIs for extracting or loading data.
Features of the SharePoint List Source and Destination
The SharePoint List source and destination use public
SharePoint Web services and have several features that enhance their
performance and their ease of use:
- Only the fields that you want are returned from
SharePoint.
- Large lists are not transferred all at once.
They are paged in batches, with a configurable batch size.
- Column type information from SharePoint is used
for mapping to Integration Services data types
- CAML queries can be added to the query to filter
the rows to be returned.
- Update and Delete operations from an Integration
Services package are simple.
- Important custom properties of the source and
destination can be set by using Integration Services expressions.
Preparing to Use the SharePoint List Source and Destination
To prepare to use the SharePoint List source and
destination:
1. Download the appropriate installer package from the
Codeplex site: Microsoft
SQL Server Community Samples: Integration Services - Release: SharePoint List
Source and Destination.
2. Run the installer package.
.jpg)
Figure 1. Welcome
page of the Setup utility.
3. Add the source and destination to the Toolbox:
a. Open Business Intelligence Development Studio or Visual
Studio.
b. Create a new Integration Services project, or open an
existing one.
.jpg)
Figure 2.
Creating a new Integration Services project.
c. On the Tools
menu, select Choose Toolbox Items.
.jpg)
Figure 3. The Choose Toolbox Items option on the Tools menu.
d. On the SSIS Data
Flow Items tab of the Choose Toolbox
Items dialog box, select the SharePoint
List Source and the SharePoint List
Destination by checking the boxes. Then click OK.
.jpg)
Figure 4. The SSIS Data Flow Items tab of the Choose Toolbox Items dialog box
e. If you find that the source and destination have been
added to the General list in the
toolbox, drag the source to the Data
Flow Sources list, and drag the destination to the Data Flow Destinations list.
.jpg)
Figure 5. Moving
the source and destination to the appropriate categories in the Toolbox.
4. Prepare your SharePoint list for testing.
.jpg)
Figure 6. A
sample SharePoint list.
Extracting Data from SharePoint by Using the SharePoint List Source
Here is the visual representation of the SharePoint List
source after you add it to the data flow of an Integration Services package.
.jpg)
Figure 7. The
SharePoint List source in the data flow.
Setting the Properties of the SharePoint List Source
You can configure the following custom properties of the
SharePoint List source in the Advanced
Editor.
.jpg)
Figure 8. The
custom properties of the SharePoint List source displayed in the Advanced Editor.
BatchSize
The SharePoint List source retrieves items from SharePoint
in batches. Try 1,000 items at a time as a starting point – this should work
well with a typical 15-column SharePoint list. You can vary this number based
on the number of columns that you want to retrieve and the width of the
columns.
If you only request a few small columns, then you can
increase this number for better performance. If you request a lot of columns,
or wide columns, then you may need to reduce the batch size.
Determining the best batch size may require some trial and
error. Keep in mind that SharePoint builds and sends its response in XML
format. If the XML response is too large, an exception with an uninformative
message will be raised.
If you have problems with the batch size, set the batch size
to match the paging size used for the SharePoint list view before you try a
larger number. If your results include text fields of widely varying lengths,
decide whether it’s necessary to include those columns.
CamlQuery
You can use a CAML query to filter the data returned by the
server. You can create a dynamic CAML query by using an Integration Services
expression to set this property.
IncludeFolders
By default, folders that are found in a list are not
returned. However, you can change this setting.
IsRecursive
By default, only the list items at the top level are
returned. If the list contains folders that contain other items, you can change
this setting to load both the items in the top-level folder, and the items in
all child folders.
SiteListName
The name of the SharePoint list, as seen on the SharePoint list
page.
SiteListViewName
The name of the SharePoint list view from which you want to
retrieve data. You can use a list view to pre-filter rows and eliminate the
need to add dynamic CAML queries to your package.
If you do not specify a list view name, then the default
list view is used. The default list view is the one that appears first in the View dropdown list, which may not show
all of the items in the list. Leaving this option blank is the same as entering
the default list view, which is typically All
Items if the default has not been changed.
.jpg)
Figure 9. Viewing
all items in a SharePoint list.
SiteURL
The URL for the primary site on which the list is found. Do
not include any other subfolders or list paths, or the location of an .asmx
file.
Selecting Columns
Selecting Columns by Using the Column Mappings Page
After you configure the custom properties for the SharePoint
List source, go to the Column Mappings
page of the Advanced Editor to
select the columns from the list that you want to include in the data flow. To
improve the performance of the extraction from SharePoint, select Ignore for any columns that you don’t
need in your data flow.
Selecting Columns by Using the Input and Output Properties Page
When you remove columns by using the Column Mappings page, the columns contain no data, but they still
consume space in the buffers of data that pass through the data flow. To remove
unwanted columns completely and optimize the use of memory for buffers, remove
the columns from the Output Columns
list on the Input and Output Properties
page.
.jpg)
Figure 10. The
list of columns on the Column Mappings
page of the Advanced Editor.
Writing Data to SharePoint by Using the SharePoint List Destination
Here is the visual representation of the SharePoint List
destination after you add it to the data flow of an Integration Services
package.
.jpg)
Figure 11. The
SharePoint List destination in the data flow.
Setting the Properties of the SharePoint List Destination
You can configure the following custom properties of the
SharePoint List destination in the Advanced
Editor.
.jpg)
Figure 12. The
custom properties of the SharePoint List destination displayed in the Advanced Editor.
BatchSize
Since updates or deletes add to the size of the XML message,
you have to use a smaller batch size for updates and deletes than for
retrieving list items. Try 200 updates at a time as a starting point, and
adjust this value as necessary.
BatchType
The SharePoint List destination can create, update or delete
rows of data.
- Create a Row (Modification Batch Type). A row is
created on the server when you select this batch type and do not specify an ID
value for a given row of data. Any columns that are not included receive the
default values or a NULL value on the server.
- Update a Row (Modification Batch Type). A row is
updated on the server when you select this batch type and specify the ID of an
existing row. Only the columns contained in the data flow are modified. If you
specify this batch type, but the data flow does not contain any updated rows,
an error is raised.
- Delete a Row (Deletion Batch Type). A row is
deleted on the server when you select this batch type and specify the ID of an
existing row. If you specify this batch type, but the data flow does not
contain any deleted rows, an error is raised.
SiteListName
This property behaves the same as the property of the same
name for the SharePoint List source.
SiteListViewName
This property behaves the same as the property of the same
name for the SharePoint List source. If a row being modified or deleted is
present in the view, then the action will not occur and will cause the data flow
to raise an error.
SiteURL
This property behaves the same as the property of the same
name for the SharePoint List source.
Selecting Columns
After you configure the custom properties for the SharePoint
List destination, go to the Column
Mappings page of the Advanced Editor
to select the columns from the data flow that you want to load into SharePoint.
Any columns that exist on the server but not in the data flow will be ignored
during the loading of data.
Working with SharePoint Data in the Data Flow
Looking Up Values in a SharePoint List
If you have to look up a value in a SharePoint list, you can
use the Lookup transformation in your data flow, and use the SharePoint List
source to load the lookup table. You may have to add a Derived Column
transformation or a Script component that splits data in the lookup column on
the ";#" delimiter to separate the ID value from the description.
If you are replacing values in your data with the values
that you look up in the list, then loading the changed data back into
SharePoint, you only have to include the ID from the lookup column. SharePoint
ignores the description if you include it.
Understanding the Data Types of Columns
The data types of columns that are loaded by the SharePoint
List source are derived from columns in the SharePoint list.
SharePoint supplies NULL values when data has not been
entered and a column does not have a default value. When a new column is added
to a list, all existing rows contain NULL in that column by default. The logic
of your package has to recognize the possibility of NULL values.
Previewing Data
You can easily preview the data in your data flow by adding
a Data Viewer.
.jpg)
Figure 13.
Configuring a Data Viewer to preview data.
To preview data by adding a Data Viewer to the data flow:
1. Double-click on the path (that is, the connecting arrow)
between the two components at the point in the data flow where you want to view
the data.
2. In the Data Flow
Path Editor, select Data Viewers,
then click Add.
3. In the Configure
Data Viewer dialog box, select Grid,
then click OK to close the dialog
box. Click OK again to close the Data Flow Path Editor.
When you run the package, you will see a grid that shows the
current state of the data as it flows from the source to the destination. This
can help you to ensure that you have correctly configured the properties of the
components and the logic of the package.
Setting Properties by Using Integration Services Expressions
You can set many of the custom properties of the SharePoint
List source and destination by using Integration Services expressions. Keep in
mind that, in the data flow, you have to create these expressions on the
containing Data Flow task, and not on the individual data flow component.
.jpg)
Figure 14. Using expressions
to set the properties of the SharePoint List source and destination.
You can use expressions to set the following properties of
the SharePoint List source:
- SiteUrl
- SiteListName
- SiteListViewName
- CamlQuery
You can use expressions to set the following properties of
the SharePoint List destination:
- SiteUrl
- SiteListName
- SiteListViewName
There are several ways to set an expression that offer
considerable flexibility to the package developer:
- You can use a package variable to set an
expression. The expression can take its value from a package variable, which
might also contain an expression. This expression can construct CAML
dynamically, or contain the results of a query sent to the SharePoint site by
an Execute SQL task.
- You can use a package configuration to set an
expression. Configurations simplify deployment by extracting values that you
may want to change from the package into a separate file or storage location.
Using SharePointUtility.dll Outside of Integration Services
The SharePoint List source and destination install and use
an assembly (SharePointUtility.dll)
of utility functions which can also be used outside of Integration Services for
accessing SharePoint Lists.
.jpg)
Figure 15. The
methods of the ListServiceUtility
class in the SharePointUtility.dll
assembly.
The ListServiceUtility
class in the SharePointUtility.dll
assembly has the following features:
- Although the methods of the class require parameter values, the parameters
do not use GUID values and do not expose SharePoint XML.
- When you request list data, the methods of the
class return rows with a dictionary for the columns and values. When you update
rows, you have to provide a list of dictionaries in return, where the names in
the dictionary are column names.
- When you work with the methods of the class, you never see
the ows_ prefix, and spaces are converted from their XML representation
(_x0020_) to a space character.
- Files can be uploaded and removed easily. The
methods also support SharePoint on a port other than port 80. (Older versions
of Windows only support port 80 for WebDAV).
- XML results are reformatted for update
operations. If an error occurs during the update, a different XML structure
that contains a description is returned, as in the following examples:
Normal result of an
update
<result ID=”1” >
<action>Success</action>
<row [attributes are columns, and their values]></row>
</result>
Result of an update
after an error
<result ID=”1” >
<action>Failure</action>
<errorCode>0x81020015</errorCode>
<errorDescription>The changes requested conflict with those made by another client.</errorDescription>
<row [attributes are columns, and their values]></row>
</result>
Conclusion
The SharePoint List Source and Destination Sample makes it
easy to get data out of or into a SharePoint list from an Integration Services
package. The sample also includes a utility library that simplifies working
with SharePoint lists outside of Integration Services.
If you have feedback or questions about these components,
please visit the Codeplex site at http://www.codeplex.com/SQLSrvIntegrationSrv.