Creating an Enterprise Search Crawl Log Viewer for SharePoint Server 2007

Summary: Create a Windows Forms application that has the ability to view and filter the entries in the Enterprise Search crawl log. (8 printed pages)

Joel Krist, Akona Consulting

July 2008

Applies to: Microsoft Office SharePoint Server 2007

Contents

  • Enterprise Search and the Crawl Log

  • Crawl Log Viewer User Interface

  • Code for the Crawl Log Viewer Sample

  • Classes Used by the Sample

  • CrawlLogFilterContext Class

  • CrawlLogData Class

  • ContentSourceWrapper Class

  • StatusMessageWrapper Class

  • Conclusion

  • Additional Resources

Download the sample code that accompanies this article: Enterprise Search Crawl Log Viewer Sample Code

Enterprise Search and the Crawl Log

Enterprise Search in Microsoft Office SharePoint Server 2007 stores information related to the crawling of content sources in the crawl log. The crawl log includes success, warning, and error details for individual crawled items. The SharePoint Shared Services Administration Crawl Log page provides a user interface (UI) for viewing the entries in the crawl log. Using the Crawl Log page, you can apply filters to narrow and focus the scope of the entries that are displayed.

Figure 1. Search Crawl Log page

Search Crawl Log page

Office SharePoint Server 2007 provides programmatic access to the contents of the search crawl log through the Enterprise Search Administration object model. The Crawl Log Viewer sample that accompanies this article shows how to use types from the Microsoft.Office.Server.Search.Administration namespace to create a Windows Forms application that provides the same ability as the Crawl Log page to view and filter the contents of the search crawl log.

Important

The Crawl Log Viewer sample provided with this article is intended only to show how the contents of the search crawl log can be accessed programmatically by using the SharePoint object model. The sample is not intended to be used as a tool by SharePoint administrators to run queries against the crawl log. Running queries against the crawl log with filters, especially in larger environments, can have a negative impact on crawl and query performance because of the load placed on Microsoft SQL Server.

Crawl Log Viewer User Interface

The UI of the Crawl Log Viewer sample is very similar to the UI provided by the SharePoint Shared Services Administration Crawl Log page. Figure 2 shows the Crawl Log Viewer UI.

Figure 2. Search Crawl Log Viewer UI

Search Crawl Log Viewer UI

To display entries from the search crawl log, a user must first enter the URL of the SharePoint site for which to load the search context. Then, the user clicks the Load Search Context button. The search context represents the search service instance for a Shared Services Provider (SSP) and includes, among other things, the defined content sources and status messages. After the search context is set, the user specifies the filters he or she wants, and then clicks the Filter button. Entries in the crawl log that match the specified filters are displayed with paging support in the results grid.

Code for the Crawl Log Viewer Sample

The Crawl Log Viewer sample was created by using Visual Studio 2008, and targets the .NET Framework 3.0. Because the sample makes use of the Enterprise Search Administration object model, you must build the sample and run it on a computer that has Microsoft Office SharePoint Server 2007 installed so that the required SharePoint assemblies are available to the sample.

Classes Used by the Sample

The Crawl Log Viewer sample uses the SearchContext class, LogViewer class, and CrawlLogFilters class, and the CrawlLogFilterProperty, StringFilterOperator and CatalogType enumerators defined in the Microsoft.Office.Server.Search.Administration namespace. The sample wraps the SharePoint log viewer related classes and types, and filter context information, with several helper classes. The following sections describe these classes.

CrawlLogFilterContext Class

The CrawlLogFilterContext class is used to store the context of the current crawl log filter session. The CrawlLogFilterContext class wraps an internal CrawlLogFilters instance that reflects the filter values specified by the user. It also exposes a method that allows the use of an internal instance of the LogViewer class to retrieve filtered views of the crawl log. In addition, the CrawlLogFilterContext class maintains a cache of previously returned results to improve performance and to support paging of crawl log entries.

When the user clicks the Load Search Context button, the sample creates a single instance of the CrawlLogFilterContext class. The following code example shows the instantiation of the CrawlLogFilterContext class. The code uses the site URL specified by the user to instantiate an SPSite object, which it then uses to create a SearchContext object. The SearchContext object is passed to the constructor of the CrawlLogFilterContext class.

// Create an SPSite object based on the specified URL.
using (SPSite site = new SPSite(textBoxSiteURL.Text))
{
    // Get the search context for the site.
    SearchContext searchContext = SearchContext.GetContext(site);

    // Create an instance of the CrawlLogFilterContext class which
    // wraps instances of the CrawlLogFilters and LogViewer classes and
    // caches viewed results.
    crawlLogFilterContext = new CrawlLogFilterContext(searchContext);

The following code example shows the constructor of the CrawlLogFilterContext class, which creates internal instances of the LogViewer class, CrawlLogFilters class, and CrawlLogData class.

public CrawlLogFilterContext(SearchContext searchContext)
{
    _logViewer = new LogViewer(searchContext);
    _crawlLogFilters = new CrawlLogFilters();
    _crawlLogDataCache = new List<CrawlLogData>();
}

The CrawlLogFilterContext class exposes methods and properties that allow access to the methods and properties of the underlying LogViewer and CrawlLogFilters objects. The following code example shows the implementation of the CrawlLogFilterContext.MessageType property, which is used to get and set the message type filter value. The property set accessor uses the AddFilter method of the internal CrawlLogFilters object to add the filter to its collection of filters.

public MessageType MessageType
{
    get { return _messageType; }
    set
    {
        _messageType = value;
        _crawlLogFilters.AddFilter(value);
    }
}

The following code example shows the button-click handler for the Filter button, using the CrawlLogFilterContext.MessageType property to add a message type filter.

switch (comboBoxStatusTypes.Text)
{
    case "Error":
        crawlLogFilterContext.MessageType = MessageType.Error;
        break;

    case "Success":
        crawlLogFilterContext.MessageType = MessageType.Success;
        break;

    case "Warning":
        crawlLogFilterContext.MessageType = MessageType.Warning;
        break;
}

The CrawlLogFilterContext class provides the GetCrawlLogData method to allow the return of filtered views of the crawl log. The GetCrawlLogData method returns a CrawlLogData instance that contains a DataTable object returned by the LogViewer.GetCurrentCrawlLogData method. The UI code binds the table to a DataGridView object to display the page of filtered results. The GetCrawlLogData method checks whether the page of results is already cached and, if so, returns a CrawlLogData instance that wraps the results. If the results are not cached, the method calls the GetCurrentCrawlLogData method of the internal LogViewer instance. Then, it passes in the internal CrawlLogFilters object that contains the filters specified by the user and the value for the index of the first crawl log entry to return.

public CrawlLogData GetCrawlLogData()
{
    // If the crawl log data for the current page is not 
    // cached yet, then...
    if (_currentPageNumber > _crawlLogDataCache.Count)
    {
        // Go get it and cache it.
        int nextStart = 0;

        // Get the index to start at from the end index of the
        // last cached page.
        if (_crawlLogDataCache.Count > 0)
            StartAt =
                _crawlLogDataCache[
                    _crawlLogDataCache.Count - 1].EndIndex + 1;

        // Use the LogViewer instance to request the filtered
        // view of log entries.
        DataTable crawlLogDataTable =
            _logViewer.GetCurrentCrawlLogData(
                _crawlLogFilters, out nextStart);

        // Trim the result set if we got back more than the
        // requested number of entries.
        while (crawlLogDataTable.Rows.Count > TotalEntries)
            crawlLogDataTable.Rows.RemoveAt(
                crawlLogDataTable.Rows.Count - 1);

        // Create a new CrawlLogData object based on the results and
        // add it to the cache. 
        CrawlLogData crawlLogData =
            new CrawlLogData(
                crawlLogDataTable, _startAt, nextStart == -1);

        _crawlLogDataCache.Add(crawlLogData);

        // Return the crawl log data so that it can be displayed.
        return crawlLogData;
    }
    else
        // Return the cached crawl log data.
        return _crawlLogDataCache[_currentPageNumber - 1];    
} 

CrawlLogData Class

The CrawlLogData class is used by the Crawl Log Viewer sample to support caching and paging of crawl log entries. It wraps indexing information about the page of results and the DataTable object that contains the page's crawl log entries.

class CrawlLogData
{
    int _startIndex;
    int _itemCount;
    bool _isLastPage;
    DataTable _crawlLogDataTable;

    public int StartIndex
    {
        get { return _startIndex; }
    }

    public int EndIndex
    {
        get { return _startIndex + _itemCount - 1; }
    }

    public int ItemCount
    {
        get { return _itemCount; }
    }

    public bool IsLastPage
    {
        get { return _isLastPage; }
    }

    public bool IsFirstPage
    {
        get { return _startIndex == 1; }
    }

    public DataTable CrawlLogDataTable
    {
        get { return _crawlLogDataTable; }
    }

    public CrawlLogData(DataTable crawlLogDataTable, int startIndex,
        bool isLastPage)
    {
        _startIndex = startIndex;
        _itemCount = crawlLogDataTable.Rows.Count;
        _isLastPage = isLastPage;

        _crawlLogDataTable = crawlLogDataTable;
    }
}

ContentSourceWrapper Class

The ContentSourceWrapper class is used by the Crawl Log Viewer sample to allow filtering of the crawl log by content source. The class is very simple and wraps the content source name and identifier.

class ContentSourceWrapper
{
    private string _Name;
    private int _ID;

    public ContentSourceWrapper(string Name, int ID)
    {
        _Name = Name;
        _ID = ID;
    }

    public string ContentSourceName
    {
        get { return _Name; }
        set { _Name = value; }
    }

    public int ContentSourceID
    {
        get { return _ID; }
        set { _ID = value; }
    }
}

The following code example shows instances of the ContentSourceWrapper class being added to the Content Sources combo box.

// Get the search context's content sources and add them to the
// combo box to allow filtering by content source.
Content content = new Content(searchContext);
                    
foreach (ContentSource cs in content.ContentSources)
    comboBoxContentSources.Items.Add(
        new ContentSourceWrapper(cs.Name, cs.Id));

The following code example shows the content source identifier of the item that is selected in the Content Source combo box being added as a filter.

if (comboBoxContentSources.Text != "All")
    crawlLogFilterContext.ContentSourceID =
((ContentSourceWrapper)comboBoxContentSources.SelectedItem).ContentSourceID;

StatusMessageWrapper Class

The StatusMessageWrapper class is used by the Crawl Log Viewer sample to allow filtering of the crawl log by status message. The class wraps the status message text and identifier.

class StatusMessageWrapper
{
    private string _Message;
    private int _ID;

    public StatusMessageWrapper(string Message, int ID)
    {
        _Message = Message;
        _ID = ID;
    }

    public string StatusMessage
    {
        get { return _Message; }
        set { _Message = value; }
    }

    public int StatusMessageID
    {
        get { return _ID; }
        set { _ID = value; }
    }
}

The following code example shows instances of the StatusMessageWrapper class being added to the Last Status Message combo box.

// Get all status messages for the search context and add them to the
// combo box to allow filtering by status message.
DataTable allStatusMessages =
    crawlLogFilterContext.LogViewer.GetAllStatusMessages();

foreach (DataRow row in allStatusMessages.Rows)
    if ((row["ErrorMsg"]).ToString() != String.Empty)
        comboBoxStatusMessages.Items.Add(
            new StatusMessageWrapper((row["ErrorMsg"]).ToString(),
                Convert.ToInt32(row["ErrorId"])));

The identifier of the selected status message is added to the collection of filters by using the CrawlLogFilterContext.StatusMessageID property.

if (comboBoxStatusMessages.Text != "All")
    crawlLogFilterContext.StatusMessageId =
((StatusMessageWrapper)comboBoxStatusMessages.SelectedItem).StatusMessageID;

Methods Implemented By the Sample

The following table provides details about key methods that are implemented by the sample.

Table 1. Methods implemented by the Crawl Log Viewer sample

Method Source File Description

buttonLoadSearchContext_Click

MainForm.cs

Implemented by the MainForm class. Handles the click event for the Load Search Context button. Creates a SearchContext object for a specified site and retrieves the associated content sources and status messages.

buttonFilter_Click

MainForm.cs

Implemented by the MainForm class. Handles the click event for the Filter button. Adds the selected filters to the CrawlLogFilterContext object's collection of filters, and calls the GetCrawlLogData method to retrieve the filtered view.

GetCrawlLogData

MainForm.cs

Implemented by the MainForm class. Gets the crawl log data by using the specified filters.

GetCrawlLogData

CrawlLogFilterContext.cs

Implemented by the CrawlLogFilterContext class. Returns a CrawlLogData instance representing the crawl log entries for the current page.

Conclusion

The Crawl Log Viewer sample uses the Enterprise Search Administration object model to provide you with the ability to view and filter the contents of the search crawl log. The sample provides several classes that wrap the SharePoint crawl log viewer related classes and types from the Microsoft.Office.Server.Search.Administration namespace.

Additional Resources

For more information, see the following resources: