Architect and Develop Search-Enabled Enterprise Applications

Author: Beat Schwegler, Technical Evangelist, Microsoft Western Europe HQ

October 2007

Summary: As a solution architect, you must ensure that your applications fit into your organization’s strategy for enterprise search. Because search is increasingly viewed as a business process, you must consider indexing and search functionality as seriously as you consider other functional and non-functional requirements.

Contents

Introduction
Navigation vs. Search
Web Search vs. Enterprise Search
Enabling Search in Your Applications
Adding Search as an Application Capability
Taking Advantage of the ‘Database of Intention’
Conclusion
Appendix A. Search Features in SharePoint Products and Technologies
About the Author

Introduction

Until recently, many solution architects have treated search capabilities in applications almost as an afterthought. Search features were typically added to an application to help the user find information stored by the solution, but often little thought was given to how:

  • Users might find data stored in other enterprise stores.
  • Information stored by the solution might be made available to enterprise-wide searches that are performed outside of the application.

Many organizations now, however, view the actions that are involved in searching for information as a business process in its own right. This view is supported by hard facts: An independent survey by IDC found that the average information worker in the US, salaried at $60,000, spends approximately nine-and-a-half hours per week searching for information (IDC: The Hidden Costs of Information Work, April 2006, Doc #201334). That equates to a business cost of about $15,000 per worker, per year. Of the time spent searching, the survey found that approximately three-and-a-half hours were completely wasted; effectively, each information worker wasted about $5,000 per year by searching but not finding what he or she was looking for.

When information workers are not even aware of existing content, because it is not returned by searches, they re-create that content; the costs involved in this loss of productivity far outweighs even the tangible costs of wasted searches.

Organizations now realize that they need enterprise-wide search capabilities to reduce the wasted costs, and to improve efficiency by surfacing knowledge to information workers when they need it.

The reasons for wasted searches are many and varied, including:

  • Restricted search. If enterprise applications are only capable of delivering search results from their own data, users must know from where to start their searches if they are to find useful results. In effect, users must almost know where the data is, before they start searching.
  • Restricted indexing. Even if organizations implement enterprise-wide search, they must ensure that all enterprise data is indexed, and therefore accessible, from the search solutions. Ideally, a user should be able to perform searches, and have results returned, regardless of the location or format of those results.
  • Poor ranking of search results. If the most relevant results for a search are not ranked appropriately and displayed near the top of the results list, users will soon cease to use those search capabilities.

As a solution architect, you must consider the requirements for search in your solutions as seriously as you would consider other functional and non-functional requirements. In essence, you must search-enable your solutions so that:

  • Your application data can be indexed and searched from other enterprise search solutions. You must also consider how the data from your solution is exposed to your organization’s enterprise search tools. Ideally, users who perform searches on the intranet, for example, should be able to see results from your solution as well as information from other systems and content stores.
  • Users can perform searches from within your solution. At a minimum, you should consider how a user will search the data from your solution. Ideally, though, you should integrate the search capabilities of your solution with an enterprise search solution, so that results from other applications and data stores are also returned to the user.

Furthermore, you must ensure that the search engines used by your solutions provide accurate ranking of results if you expect users to adopt and use those search capabilities.

Navigation vs. Search

As you architect your solution and consider how users will access information, you should think about the differences between navigating to information and searching for information. Although navigation and search are two different concepts, there is considerable overlap in how you should design these features into your solutions, so you must understand the logical overlap. The overlap is because both mechanisms have the same objective: to enable the user to locate information. Structured navigation helps users to browse to information that they are looking for; users often have only a general idea of where the information is located, and use navigation features, such as tree-views, organized links, and other visual metaphors to browse for information. Effectively, users navigate the structure to narrow down the information until they have homed-in on their target data. Navigation is therefore typically used to find information in a given context and provides a good mechanism for users to explore related data in that context. Search, on the other hand, is typically used as a starting point when users either do not know what context or structure contains the information they seek, or when multiple contexts may hold relevant data.

Regardless of the logical overlap, all information-based solutions should provide both context-specific and non-contextual methods of finding data. However, when you architect your solutions, you should also include the concept of switching from one mechanism to the other. For example, users might start navigating through a solution to attempt to locate information, but they encounter problems if:

  • They have taken a wrong path.
  • The data is stored by an external system.
  • The information base is too large and unwieldy.
  • Multiple contexts contain parts of what they are looking for.
  • They find some information, but it is not as relevant as knowledge in a different context.

Therefore, you should design your solution so that users can instigate a search as soon as they feel that the navigation approach does not work for them in any instance.

The converse switch, however, is equally as valid and important to your solution, if you want to provide rich access to information. A user often uses search facilities to find information and is presented with an overwhelming number of results. Users can often refine their search terms, or use features such as searching for a narrower term within the existing search results, but you can go further than that. For example, you might consider adding navigation features to the search results, such as enabling the user to pivot the results on specific groupings, or even on the properties and metadata of the returned information.

Topic Maps and Search

Topic maps are an emerging standard for dealing with searchable and navigable taxonomies. The topic map defines entities in the taxonomy as well as the relationships between entities. This enables the user to gain a picture of how different pieces of content relate to other pieces, such as by content term, author, site, or any other attribute that has been defined in the topic map. The user can use the topic map to discover authors and articles that they may not have otherwise seen. Effectively, search functions as a soft link between entities in the topic map; topic maps define the taxonomy (including attributes and relationships,) and the data is returned by search. For a good example of topic maps and search in practice, see http://msdn2.microsoft.com/en-us/skyscrapr/bb428857.aspx

As a solution architect, you should become familiar with the emerging topic map standards. Topic maps are based on an ISO standard, and define the entities and relationship in XML-Topic Map format (XTM.) For more information about topic maps, see http://www.topicmaps.org/

Web Search vs. Enterprise Search

Most users are familiar with using Web-based search engines. Users have become used to expecting highly relevant results shown at the top of the search results, and they do not care where the information is stored or what format it is in. They have also become accustomed to searching for a specific term or combination of terms, but they do not expect to have to type in words that match the results exactly. They expect the search facilities to be smart enough to deal with synonyms, closely related terms, and spelling errors. They even expect search to suggest alternative searches.

The experience users have had with searching the Web presents three issues to you as a solution architect for enterprise applications.

First, you need to design into your solution the functional features that users view as essential for all search tools. If your search features are not at least as rich as those offered by Web search engines, then your solution may be viewed unfavorably by the users.

Second, you must implement the essential functional features so that they work effectively with your organization’s information stores. That means solving problems such as deciding what data to index, determining how different files and data formats can be represented in search results, and working out how information relevancy is ranked in search results.

Third, you must consider many non-functional requirements specifically for the search features in your solution, including:

  • Search performance
  • Security and privacy of the data returned by search (including content previews)
  • Scalability of the search features
  • Flexibility of what users can do with the search results
  • Availability metrics and requirements for search
  • Manageability and maintainability of the search configurations and optimizations
  • Impact that indexing and search has on the other functional features of your solution

Another concept that you must consider is that finding information is not always the sole objective of performing a search. Finding an expert in a given field, or a business specialist who is very familiar with a given process, may be useful to information workers when they are searching for knowledge to enable them to complete a task or solve a business problem. Your search solutions, therefore, should include the concept of people search, and that adds another challenge, because you must design how that information is gathered, where it is stored, how it is indexed, and how it is presented in search results. However, this effort at the design stage is very worthwhile.

As with most architectural designs, most organizations have very specific search requirements, so you should consider all of these issues on a case-by-case basis.

Enabling Search in Your Applications

When you are architecting a solution, you do not need to re-invent the wheel, at least as far as search is concerned. You can use mature search services to add powerful search features to your solutions.

The SharePoint Products and Technologies suite provides a number of different servers that you can use to add powerful, robust, and flexible search features to your solution. All of the products in the suite use the same core indexing engine and search technologies. For more information about the products and their specific search features, see Appendix A.

For a complete picture, Figure 1 illustrates the logical architecture of search and indexing in Microsoft® Office SharePoint® Server 2007, Enterprise Edition.

Click here for larger image

Figure 1. Microsoft Office SharePoint Server 2007 Search Architecture (Click on the picture for a larger image)

Microsoft Office SharePoint Server 2007 crawls content from a variety of sources, including:

  • File shares
  • SharePoint sites and data
  • Other Web content
  • Data from relational databases and Web services
  • Documents from third-party systems
  • Data from proprietary systems

Each type of content source is accessed through a component called a protocol handler. For example, file shares are accessed by using the FILE protocol handler, Web sites are crawled over the HTTP protocol, and data in relational databases or Web services is access using the BDC (Business Data Catalog) handler. A number of protocol handlers are included with Microsoft Office SharePoint Server 2007 (depending on the product version and license), but it is important to understand that the architecture is extensible—if you need to index data in a custom third-party solution, you can plug a protocol handler into the architecture.

In addition to using a protocol handler to access the content source, Microsoft Office SharePoint Server 2007 uses iFilters to read the data and content. For example, PDF documents are accessed by using the PDF iFilter. As with protocol handlers, the iFilter architecture is extensible.

During the crawl, the gather service returns two streams of data—one for object metadata and permissions, and another for object content. The properties and permissions are stored in a property store table and the content is sent to the full-text index. The permissions information is used to filter the search results so that users see only content appropriate to their permission levels.

Before the content stream is added to the content index, word breakers and word stemmers are used to break the stream into words. Word stems are tokenized, so that different verb forms can be dealt with at query time. Noise words, such as “the” or “it” are removed because they are so common there is no value in indexing them.

Figure 2 shows a scaled-out physical architecture to illustrate the different search and indexing roles in a Microsoft Office SharePoint Server 2007 farm.

Note: A Microsoft Office SharePoint Server 2007 farm can be configured in a number of different ways, from a single server through to a fully scaled out architecture. Discussion of farm architecture is outside the scope of this paper, and the fully scaled-out architecture is only included for ease of explanation.

The Index Server performs all the tasks described. The full text index is then propagated to the query servers, as shown below (1). Propagation occurs on a continuous basis.

Click here for larger image

Figure 2. Microsoft Office SharePoint Server 2007 Farm Search and Indexing Architecture (Click on the picture for a larger image)

A query issued by a user (2) is received by a Web front end. The Web front end then contacts a query server component to retrieve data from the full text index (3a). It also contacts the database server to retrieve metadata and permissions (3b). The content and the metadata are returned to the Web front end (4a and 4b), and the Web server renders the results, applies security trimming, and returns the rows (including links to the source content) to the client that issued the query. The security trimming ensures that users only see results for which they have permissions. This solves two problems:

  • Users are not presented with an authentication dialog box when they click a result for which they do not have access.
  • Users are not made aware of even the existence of potentially sensitive information, just by performing a search.

If a result represents a file or a Web page, the link points the user directly to the content. If, however, the result represents an item in a line of business application, such as a row in database or a piece of data exposed by a Web service, then the BDC application definition files should provide actions to describe what happens when a user clicks the link. For example, the action could retrieve additional columns of data from the data source, or it could enable the user to drill-down into the data to find related information.

The following sections expand on the product-specific features and capabilities, and provide architectural guidance about how you can use them in your solutions. For now, from a solution architect point of view, you simply need to be aware that:

  • You can use SharePoint search to index your application’s data so that it is searchable outside of your solution.
  • You can issue queries that are served by SharePoint query servers in a number of ways from your solution, and render the results in your application; results from all content sources can be included if you want to provide enterprise search capabilities, or you can restrict the results to your solution data if required.

Building Applications on a Searchable Platform

The easiest way to include search capabilities in your solutions is to build your applications on Microsoft Windows SharePoint Services 3.0 or Microsoft Office SharePoint Server 2007. These products have evolved from simple collaboration platforms into proper application development platforms. The advantages of building on Windows SharePoint Services or Microsoft Office SharePoint Server 2007 include the fact that many typical but complex issues have been solved, such as:

  • How to implement document storage, source control, information versioning, and collaboration features
  • How to implement information security policies
  • How to categorize information
  • How to satisfy non-functional requirements, such as security, performance, and availability
  • How to implement physical data storage
  • How to include extensible workflows
  • How to implement standard user interfaces

You should be aware that SharePoint-based solutions are not restricted to document and collaboration scenarios. Anything from a help-desk application, through e-commerce solutions, to customer-relationship management systems can be built on SharePoint technologies. The custom list infrastructure makes the SharePoint products suitable even for solutions that traditionally required their own databases.

If you can build your solution on one of the SharePoint products, then you automatically have access to powerful search features, with very little development effort involved.

The level of search functionality depends on which product you use to build your solution. If you build on Microsoft Office SharePoint Server 2007, your solution natively has access to complete enterprise search capabilities, including people search, and search of file shares and external systems. If you build on Windows SharePoint Services, you still have access to powerful search features but are restricted to your solution’s own data and documents.

Making Proprietary File Formats Searchable

Regardless of whether you build your solution on a SharePoint-based platform, you must make architectural decisions about how files are stored by your application, if you want them to be indexed by Microsoft Office SharePoint Server 2007. The most important factor is the file formats that your solution supports. As mentioned previously, each file type requires an iFilter so that the indexing engine can read the document contents. Microsoft Office SharePoint Server 2007 and Windows SharePoint Services include iFilters for all Microsoft Office formats, and common Web formats (such as, HTML, XML, text). Adobe provides iFilters for PDF files, and other vendors are currently developing iFilters for their specific formats. If you have a proprietary format, you need to obtain an iFilter if it exists, or develop your own. Be aware that creating iFilters is not a simple task—your developers require an in-depth knowledge of how the binary data is stored, and how that represents file content and metadata. They must also develop in C++, rather than Visual C# or Visual Basic .NET.

Important: Microsoft Office SharePoint Server 2007 ships in both 32-bit and 64-bit versions and iFilters are sensitive to processor architecture. For example, you would need the 32-bit PDF iFilter if you index server is built on 32-bit architecture, and you would need the 64-bit version if you use a 64-bit architecture. At the time of writing, for example, only the 32-bit version of the Adobe PDF filter is available, so if you need to index PDF files now, then you should install the 32-bit version of Microsoft Office SharePoint Server 2007 as the index server. Note, though, that you can mix 32-bit and 64-bit versions for different server roles in a SharePoint farm. The query servers and Web front end architectures do not affect iFilter versions.

Enabling Search and Indexing of Databases and Web Services

Even if your solution is not built on the SharePoint platform, you can still have your application data indexed and make it searchable to enterprise search tools. Microsoft Office SharePoint Server 2007 includes the powerful BDC (Business Data Catalog), which can be used to access data in ODBC or OLE-DB compliant databases. The BDC can also be used to access data that is returned from Web services.

BDC features are used throughout Microsoft Office SharePoint Server 2007; they provide content from databases and Web services for use in SharePoint lists, and can drive Web parts so that users can retrieve and display data in Web pages. They can also be used as sources for key performance indicators. The enterprise search features include a protocol handler so that BDC applications can be indexed and searched.

From a solution architect point of view, you must be aware of some issues that arise when you use the BDC to expose your solution data for search.

First, the BDC can only index textual data (including characters, numeric data, and dates) from databases and Web services. The BDC does not index columns that contain binary data or files. If you need to have that type of data indexed, you should consider using the SharePoint platform for storage, so that you can easily configure the native SharePoint protocol handler to index your binary data.

Second, the BDC supports security at either the application or entity level. This security information is evaluated when users access the data through the BDC, but it is also incorporated into the security trimming algorithms that are used to render search results. If you need to protect search results with row-level security, you can still use the BDC as a content source for the indexing process, but you must also develop a custom security trimmer assembly that can trim the search results appropriately. It is a relatively simple task for a developer to create a custom security trimmerdevelopers can use Visual C# or Visual Basic.NET to achieve thisbut custom trimmers can seriously affect search performance. Effectively, the custom trimmer must implement a method for retrieving security information from the source database or Web service, and this method is called once for each row that is returned in the search results. The method typically calls a stored procedure or Web service operation to return the appropriate security metadata (which your developers must also create). Calling this method for each search result may not only adversely affect performance of the Microsoft Office SharePoint Server 2007 search components; it may also place a heavy load on the source database or Web service.

Adding Search as an Application Capability

In addition to exposing your solution data and files to the SharePoint search engine, you must also consider how users will perform searches from within your solution. The basic questions you must answer are:

  • Will users be restricted to searching the application’s data, or will they be able to perform enterprise-wide searches from your solution?
  • What components will be used to perform the searches?
  • How will search results be rendered and what additional functionality will be embedded in the results?

The answers depend on your specific requirements, but in general:

  • You can access the SharePoint query process programmatically in a number of different ways.
  • You can choose to have Microsoft Office SharePoint Server 2007 render your results in a Web page, or you can build your own user interface for results.

The following sections describe the different approaches you can take.

Using URL Syntax for Search

The easiest way to initiate a search by using SharePoint query technologies is to issue an HTTP:GET request to a pre-defined search results Web page that is provided by SharePoint.

Note: This approach is best suited to Web-based applications that are not based on SharePoint. If you are designing a SharePoint-based Web solution, you can use the built-in search Web parts.

When your solution issues the request, it should provide query string parameters that define the search. For example, the following URL performs a search for the keyword term London:

http://moss.litwareinc.com/results.aspx?k=London

Your solution can search for multiple terms by including space (%20) characters, as in the following URL:

http://moss.litwareinc.com/results.aspx?k=New%20York

It is a simple task to gather search terms from a user, and then issue an appropriate search query. Your solution can provide ASP.NET TextBox controls or other inputs, and an ASP.NET Button control or LinkButton control for starting the search. Your server-side code can collect the search terms from the inputs, then build the URL, and finally perform a Response.Redirect call or Server.Transfer call to the initiate the request.

The valid query strings are described in Table 1.

Query String Parameter

Description

k

Keyword to search for. Unlike previous versions of SharePoint, a logical and is used between multiple terms, as in the example for New York.

s

Scope used to restrict search results. As part of your design, you should consider how to use SharePoint scopes. Scopes represent a logical view of the index. Scopes are compiled, and consist of rules that define the view. The rules might pertain to a property that is included in the index, or they might relate to the content sources. For example, if you want to restrict search results to your solution’s data, you can create a single scope for your application in SharePoint, and then use its name in the URL queries.

v

View used to render the results. You can only use one of two supported values for this parameter:

§         v=relevance
This value sorts the results in order of relevance. If you do not specify this parameter, relevance is used as the default view.

§         v=date
This value sorts the results by date modified.

start

Page of results to display. The default is the first page.

 

Table 1. Search Parameters

Hosting Search and Results Pages

By default, the user is redirected to the SharePoint site to view the results. However, you might want the user to remain in your Web application. You could implement this scenario by embedding the SharePoint search and results pages in a separate frame within your application. Alternatively, if you are developing a Windows Forms–based solution, you can include a browser control for displaying the search and results pages from within your application.

Regardless of whether you host the search and results pages within your application or redirect the user to the SharePoint Search center to view the results, you can modify the look and feel of the results to match the style of your solution. Result customization can take a number of different forms, including setting properties of the result Web parts, through defining XSL transforms for displaying results, to creating new Web parts for rendering search results. For more information about customizing the results page, see:

Using URL-Based RSS Feeds for Search

Microsoft Office SharePoint Server 2007 and Windows SharePoint Services provide RSS feeds for many features and data. For example, a user can subscribe to an RSS feed for a task list, a document library, or any other type of list. Furthermore, the results of a specific search can be used as a source for an RSS feed. Users can then be notified through RSS when new results that match the search are discovered. As a solution architect, you might consider adding RSS-consumer functionality to your solution for search results. A simple way to do this is to issue an HTTP request to the URL that represents the RSS subscription. As you design your solution, you can determine the URLs used for RSS search feeds by performing a search in a Microsoft Office SharePoint Server 2007 site. You can then review the link for subscribing to the RSS feed from the results page.

Also consider, if your solution uses the HTTP:GET method for performing searches, whether you might want to rely on the links on the search results page that enable users to subscribe to search results through RSS, rather than developing specific code for achieving the same goal.

Using the Query Web Service for Search

If you want users to issue search queries to Microsoft Office SharePoint Server 2007, but remain in your application, and you want to have a lot of control over how the results are displayed and used, you can incorporate both the collecting of search terms and the display of results into your solution.

To achieve this, you can access the Query Web service from your solution and receive the search results programmatically. This approach works for many different types of solution, including Web applications, console applications, and Windows Forms executable.

To access the Enterprise Search Query Web service and its methods, you typically add a Web reference to the search.asmx Web service. The Web reference URL takes the following form:

http://Server_Name/[sites/][Site_Name/]_vti_bin/search.asmx

After you have set a Web reference, you can instantiate an object based on the QueryService class, set its properties, and invoke its methods to issue queries programmatically. There are two different methods for performing the query: the Query method, and the QueryEx method. The distinction between these two methods is that Query returns results as a string containing XML, whereas QueryEx returns an ADO.NET DataSet. In either case, your solution can manipulate and display the results to meet the specific requirements of your solution. For either method, your solution must construct a query packet that describes the search query. The query packet contains the query terms and additional query options. The following code sample shows the basic XML-based structure of a query packet.


<QueryPacket>
   <Query>
      <QueryId />
      <SupportedFormats>
            <Format />
      </SupportedFormats>
      <Context>
            <QueryText />
            <OriginatorContext />
      </Context>
      <Range>
            <StartAt />
            <Count />
      </Range>
      <Properties>
            <Property />
      </Properties>
      <SortByProperties> 
            <SortByProperty />
      </SortByProperties>
      <EnableStemming />
      <TrimDuplicates />
      <IncludeSpecialTermResults />
      <IgnoreAllNoiseQuery />
      <IncludeRelevantResults />
      <IncludeHighConfidenceResults />
   </Query>
</QueryPacket>

For more information about using the Query Web service to perform queries programmatically, see the Microsoft Office SharePoint Server 2007 SDK topic:

Using the Object Model for Search

If you are architecting your solution on the Microsoft Office SharePoint Server 2007 platform, your code has access to the full SharePoint in-process object model. The functionality and approach for issuing search queries by using the object model is similar to using the Web service, although the syntax and protocols are different. The object model supports both keyword and SQL syntax. The distinction between these two types of queries is that they use different syntax, and therefore have different capabilities. For example, keyword queries are easy to develop because you do not need to write search-specific SQL language constructs, but keyword syntax supports only phrase, word, or prefix exact matches. Also, while you can use keyword syntax to specify whether to include or exclude a particular search term, you cannot construct complex groupings of included and excluded terms.

In comparison, SQL queries provide support for constructing complex search queries, including logical operators, wildcards, full-text operators, sorting clauses, and comparison operators.

You can issue keyword queries by instantiating an object based on the KeywordQuery class, whereas you perform SQL queries by using the FullTextSQLQuery class. Both types of queries return a ResultTableCollection object. The ResultTableCollection class inherits from the ADO.NET DataTable class so your solution can access results by using standard ADO.NET techniques, such as building DataReader objects.

For more information about how to construct queries, execute them through the object model, and render results, see the Microsoft Office SharePoint Server 2007 SDK topic:

Taking Advantage of the ‘Database of Intention’

As with all search solutions, the power and flexibility of SharePoint indexes and searches are only as good as the data it contains. As a solution architect, you must ensure that the data in your application is indexed appropriately, and that the search capabilities in your solution yield relevant results to your users.

You must ensure that the search solution does not just match keywords to content, but that it takes the search experience to the next level by ensuring that the search solution also functions as a ‘database of intention’. So from the users’ perspective, the search engine appears to understand what they are looking for, and can make informed decisions when ranking the search results. The search engine should even be able to make alternative suggestions to the users.

As a solution architect, you can use many of the mature features provided by Microsoft Office SharePoint Server 2007 to achieve these goals. For example, you can define managed properties for files and other items, and you can then configure the gatherer service to index those properties. You can go further than that by creating SharePoint content types that contain managed properties. Content types and their properties not only help to enhance the search experience, they also help in taxonomy schemes, workflows, and other collaborative solutions, by adding levels of manageability and flexibility to your information systems.

You should also make use of search-specific configurations to enhance the relevancy and accuracy of search results. For example, you can add to the noise words used by the gatherer service, and you can specify synonyms for related terms. Furthermore, you can specify keywords that are important in your information domains, and even have those keywords associated with specific Best Bet sites or other content sources.

Additionally, you can specify given content stores as authoritative sites, for stores that should be highly ranked compared to other sites, because you know that the site contains high-quality information.

The most powerful and complex way of tuning relevancy is to modify the default ranking parameters. For example, you might want to change the weighting given to specific file types, or you might want to specify that the author should be given more weight than, say, the title property. You can only modify ranking parameters through custom code. You should research this area very carefully before doing so, because it can adversely affect query accuracy. However, if you do need to modify these parameters, for more information about how to do so, see Microsoft Office SharePoint Server 2007 SDK topics:

You should also architect your solution to require that users provide as much metadata as required, to help the ranking process. For example, many users do not fill in file properties, such as author, keywords, subject, and title, even though this data is invaluable to both the indexing and the ranking engine.

Finally, the most important task that you can do is to monitor the accuracy of search over time. Microsoft Office SharePoint Server 2007 provides many query reports, such as those that show common queries, those that report queries with zero results, and peak query usage statistics. You should interpret these reports based on what you know about your information domain. For example, if queries for Mountain Bike provide zero results, but your company is a bike manufacturer, then perhaps relevant data is not being indexed; you could therefore ensure that new content sources are included in the index.

Alternatively, you might see that queries are consistently pointing users to inappropriate sources when you know that other high quality information exists elsewhere; you could therefore set up authoritative sites, or associate the high-quality store with specific keywords and Best Bets.

Every organization has different requirements and experience of search relevancy, depending on the quality and properties of data. What they all have in common, however, is that relevance can only be improved by monitoring usage and reports over time. As your body of information evolves, so your indexing and search strategies will change. Consequently, you need to fine-tune the search solutions on a regular basis.

Conclusion

As a solution architect, you must ensure that your applications fit into your organization’s strategy for enterprise search. Because search is increasingly viewed as a business process, you must consider indexing and search functionality as seriously as you consider other functional and non-functional requirements. The two basic issues that you should consider are how data in your solution is indexed and exposed to enterprise search engines, and how users of your solution search for information, regardless of whether they search from your application or from the wider body of information that constitutes your company’s collective knowledge. The SharePoint Products and Technologies suite of services provide rich, robust functionality that you can use to meet these search requirements.

Appendix A. Search Features in SharePoint Products and Technologies

The SharePoint Products and Technologies suite includes:

  • Windows SharePoint Services 3.0 (WSS)
  • Microsoft Office SharePoint Server 2007 for Search (MOSS-FS) Standard Edition
  • Microsoft Office SharePoint Server 2007 for Search (MOSS-FS) Enterprise Edition
  • Microsoft Office SharePoint Server 2007 (MOSS) Standard Edition
  • Microsoft Office SharePoint Server 2007 (MOSS) Enterprise Edition

The following table lists the search-specific features and capabilities of each product:

Features

Windows SharePoint Services

MOSS-FS Standard

MOSS-FS Enterprise

MOSS Standard

MOSS Enterprise

Index SharePoint Sites and SharePoint Data

Index Web Sites

 

Index File Shares

 

Index Microsoft Exchange Server Public Folders

 

Index Lotus Notes Databases

 

Index Third-Party Document Repositories

 

Secure Content Access Control

Enhanced Search Center User Interface

 

 

 

Search for People and Expertise

 

 

 

Index Relational Data

 

 

 

 

Index Web Service Data

 

 

 

 

Item Limit

500,000

No Limit

No Limit

No Limit

No Limit

About the Author

Beat Schwegler is the Enterprise and Technical Evangelism Lead in the Developer and Platform Evangelism Group for Microsoft Western Europe. He provides advice and consulting to enterprise companies in software architecture and related topics and is a frequent speaker at international events and conferences. He has more than 14 years experience in professional software development and architecture and has been involved in a wide variety of projects, ranging from real-time building control systems, best selling shrink-wrapped products to large scale CRM and ERP systems.

 

Page view tracker