Author: Beat Schwegler, Technical Evangelist, Microsoft Western Europe HQ
October 2007
Summary: As
a solution architect, you must ensure that your applications fit into your
organization’s strategy for enterprise search. Because search is increasingly
viewed as a business process, you must consider indexing and search
functionality as seriously as you consider other functional and non-functional
requirements.
Contents
Introduction
Navigation vs. Search
Web Search vs. Enterprise
Search
Enabling Search in Your
Applications
Adding Search as an
Application Capability
Taking Advantage of the
‘Database of Intention’
Conclusion
Appendix A. Search
Features in SharePoint Products and Technologies
About the Author
Introduction
Until recently, many solution architects
have treated search capabilities in applications almost as an afterthought. Search
features were typically added to an application to help the user find
information stored by the solution, but often little thought was given to how:
- Users might find data stored in other
enterprise stores.
- Information stored by the solution might
be made available to enterprise-wide searches that are performed outside
of the application.
Many organizations now, however, view the
actions that are involved in searching for information as a business process
in its own right. This view is supported by hard facts: An independent survey
by IDC found that the average information worker in the US, salaried at $60,000, spends approximately nine-and-a-half hours per week searching for
information (IDC: The Hidden Costs of Information Work, April 2006, Doc #201334).
That equates to a business cost of about $15,000 per worker, per year. Of the
time spent searching, the survey found that approximately three-and-a-half
hours were completely wasted; effectively, each information worker wasted about
$5,000 per year by searching but not finding what he or she was looking
for.
When information workers are not even aware
of existing content, because it is not returned by searches, they re-create
that content; the costs involved in this loss of productivity far outweighs
even the tangible costs of wasted searches.
Organizations now realize that they need
enterprise-wide search capabilities to reduce the wasted costs, and to improve
efficiency by surfacing knowledge to information workers when they need it.
The reasons for wasted searches are many
and varied, including:
- Restricted search. If enterprise
applications are only capable of delivering search results from their own
data, users must know from where to start their searches if they are to
find useful results. In effect, users must almost know where the
data is, before they start searching.
- Restricted indexing. Even if
organizations implement enterprise-wide search, they must ensure that all
enterprise data is indexed, and therefore accessible, from the search
solutions. Ideally, a user should be able to perform searches, and have
results returned, regardless of the location or format of those results.
- Poor ranking of search results. If the
most relevant results for a search are not ranked appropriately and
displayed near the top of the results list, users will soon cease to use
those search capabilities.
As a solution architect, you must consider
the requirements for search in your solutions as seriously as you would
consider other functional and non-functional requirements. In essence, you must
search-enable your solutions so that:
- Your application data can be indexed and
searched from other enterprise search solutions. You must also consider
how the data from your solution is exposed to your organization’s
enterprise search tools. Ideally, users who perform searches on the
intranet, for example, should be able to see results from your solution as
well as information from other systems and content stores.
- Users can perform searches from within
your solution. At a minimum, you should consider how a user will
search the data from your solution. Ideally, though, you should integrate
the search capabilities of your solution with an enterprise search
solution, so that results from other applications and data stores are also
returned to the user.
Furthermore, you must ensure that the
search engines used by your solutions provide accurate ranking of results if
you expect users to adopt and use those search capabilities.
Navigation vs. Search
As you architect your solution and consider
how users will access information, you should think about the differences
between navigating to information and searching for information.
Although navigation and search are two different concepts, there is
considerable overlap in how you should design these features into your
solutions, so you must understand the logical overlap. The overlap is because
both mechanisms have the same objective: to enable the user to locate
information. Structured navigation helps users to browse to information that
they are looking for; users often have only a general idea of where the
information is located, and use navigation features, such as tree-views,
organized links, and other visual metaphors to browse for information.
Effectively, users navigate the structure to narrow down the information until
they have homed-in on their target data. Navigation is therefore typically used
to find information in a given context and provides a good mechanism for users
to explore related data in that context. Search, on the other hand, is
typically used as a starting point when users either do not know what context
or structure contains the information they seek, or when multiple contexts may
hold relevant data.
Regardless of the logical overlap, all
information-based solutions should provide both context-specific and
non-contextual methods of finding data. However, when you architect your
solutions, you should also include the concept of switching from one mechanism
to the other. For example, users might start navigating through a solution to
attempt to locate information, but they encounter problems if:
- They have taken a wrong path.
- The data is stored by an external system.
- The information base is too large and
unwieldy.
- Multiple contexts contain parts of what
they are looking for.
- They find some information, but it is not
as relevant as knowledge in a different context.
Therefore, you should design your solution
so that users can instigate a search as soon as they feel that the navigation
approach does not work for them in any instance.
The converse switch, however, is equally as
valid and important to your solution, if you want to provide rich access to
information. A user often uses search facilities to find information and is
presented with an overwhelming number of results. Users can often refine their
search terms, or use features such as searching for a narrower term within the
existing search results, but you can go further than that. For example, you
might consider adding navigation features to the search results, such as
enabling the user to pivot the results on specific groupings, or even on the
properties and metadata of the returned information.
Topic Maps and Search
Topic maps are an emerging standard for
dealing with searchable and navigable taxonomies. The topic map defines
entities in the taxonomy as well as the relationships between entities. This
enables the user to gain a picture of how different pieces of content relate to
other pieces, such as by content term, author, site, or any other attribute
that has been defined in the topic map. The user can use the topic map to discover
authors and articles that they may not have otherwise seen. Effectively, search
functions as a soft link between entities in the topic map; topic maps define
the taxonomy (including attributes and relationships,) and the data is returned
by search. For a good example of topic maps and search in practice, see http://msdn2.microsoft.com/en-us/skyscrapr/bb428857.aspx
As a solution architect, you should become
familiar with the emerging topic map standards. Topic maps are based on an ISO
standard, and define the entities and relationship in XML-Topic Map format
(XTM.) For more information about topic maps, see http://www.topicmaps.org/
Web
Search vs. Enterprise Search
Most users are familiar with using
Web-based search engines. Users have become used to expecting highly relevant
results shown at the top of the search results, and they do not care where the
information is stored or what format it is in. They have also become accustomed
to searching for a specific term or combination of terms, but they do not
expect to have to type in words that match the results exactly. They expect the
search facilities to be smart enough to deal with synonyms, closely related
terms, and spelling errors. They even expect search to suggest alternative
searches.
The experience users have had with
searching the Web presents three issues to you as a solution architect for
enterprise applications.
First, you need to design into your
solution the functional features that users view as essential for all search
tools. If your search features are not at least as rich as those offered by Web
search engines, then your solution may be viewed unfavorably by the users.
Second, you must implement the essential
functional features so that they work effectively with your organization’s
information stores. That means solving problems such as deciding what data to
index, determining how different files and data formats can be represented in
search results, and working out how information relevancy is ranked in search
results.
Third, you must consider many
non-functional requirements specifically for the search features in your
solution, including:
- Search performance
- Security and privacy of the data returned
by search (including content previews)
- Scalability of the search features
- Flexibility of what users can do with the
search results
- Availability metrics and requirements for
search
- Manageability and maintainability of the
search configurations and optimizations
- Impact that indexing and search has on
the other functional features of your solution
Another concept that you must consider is
that finding information is not always the sole objective of performing a search.
Finding an expert in a given field, or a business specialist who is very
familiar with a given process, may be useful to information workers when they
are searching for knowledge to enable them to complete a task or solve a
business problem. Your search solutions, therefore, should include the concept
of people search, and that adds another challenge, because you must
design how that information is gathered, where it is stored, how it is indexed,
and how it is presented in search results. However, this effort at the design
stage is very worthwhile.
As with most architectural designs, most
organizations have very specific search requirements, so you should consider
all of these issues on a case-by-case basis.
Enabling Search in Your Applications
When you are architecting a solution, you
do not need to re-invent the wheel, at least as far as search is concerned. You
can use mature search services to add powerful search features to your
solutions.
The SharePoint Products and Technologies
suite provides a number of different servers that you can use to add powerful,
robust, and flexible search features to your solution. All of the products in
the suite use the same core indexing engine and search technologies. For more
information about the products and their specific search features, see
Appendix A.
For a complete picture, Figure 1
illustrates the logical architecture of search and indexing in Microsoft®
Office SharePoint® Server 2007, Enterprise Edition.
Figure 1. Microsoft Office SharePoint Server 2007 Search Architecture (Click on the picture for a
larger image)
Microsoft Office
SharePoint Server 2007 crawls content from a variety of sources, including:
- File shares
- SharePoint sites and data
- Other Web content
- Data from relational databases and Web
services
- Documents from third-party systems
- Data from proprietary systems
Each type of content source is accessed
through a component called a protocol handler. For example, file shares
are accessed by using the FILE protocol handler, Web sites are crawled over the
HTTP protocol, and data in relational databases or Web services is access using
the BDC (Business Data Catalog) handler. A number of protocol handlers are
included with Microsoft Office SharePoint Server 2007 (depending on the product
version and license), but it is important to understand that the architecture
is extensible—if you need to index data in a custom third-party solution, you
can plug a protocol handler into the architecture.
In addition to using a protocol handler to
access the content source, Microsoft Office SharePoint Server 2007 uses
iFilters to read the data and content. For example, PDF documents are accessed
by using the PDF iFilter. As with protocol handlers, the iFilter architecture
is extensible.
During the crawl, the gather service
returns two streams of data—one for object metadata and permissions, and
another for object content. The properties and permissions are stored in a property
store table and the content is sent to the full-text index. The permissions
information is used to filter the search results so that users see only content
appropriate to their permission levels.
Before the content stream is added to the
content index, word breakers and word stemmers are used to break the stream
into words. Word stems are tokenized, so that different verb forms can be dealt
with at query time. Noise words, such as “the” or “it” are removed because they
are so common there is no value in indexing them.
Figure 2 shows a scaled-out physical
architecture to illustrate the different search and indexing roles in a
Microsoft Office SharePoint Server 2007 farm.
Note: A
Microsoft Office SharePoint Server 2007 farm can be configured in a number of
different ways, from a single server through to a fully scaled out
architecture. Discussion of farm architecture is outside the scope of this
paper, and the fully scaled-out architecture is only included for ease of explanation.
The Index
Server performs all the tasks described. The full text index is then
propagated to the query servers, as shown below (1). Propagation occurs on a
continuous basis.
Figure 2. Microsoft Office SharePoint Server 2007 Farm Search and Indexing Architecture (Click on the picture for a
larger image)
A query issued by
a user (2) is received by a Web front end. The Web front end then contacts a
query server component to retrieve data from the full text index (3a). It also
contacts the database server to retrieve metadata and permissions (3b). The
content and the metadata are returned to the Web front end (4a and 4b), and the
Web server renders the results, applies security trimming, and returns the rows
(including links to the source content) to the client that issued the query.
The security trimming ensures that users only see results for which they have permissions.
This solves two problems:
- Users are not presented with an
authentication dialog box when they click a result for which they do not
have access.
- Users are not made aware of even the
existence of potentially sensitive information, just by performing a
search.
If a result represents a file or a Web
page, the link points the user directly to the content. If, however, the result
represents an item in a line of business application, such as a row in database
or a piece of data exposed by a Web service, then the BDC application
definition files should provide actions to describe what happens when a
user clicks the link. For example, the action could retrieve additional columns
of data from the data source, or it could enable the user to drill-down into the
data to find related information.
The following sections expand on the
product-specific features and capabilities, and provide architectural guidance
about how you can use them in your solutions. For now, from a solution
architect point of view, you simply need to be aware that:
- You can use SharePoint search to index
your application’s data so that it is searchable outside of your solution.
- You can issue queries that are served by
SharePoint query servers in a number of ways from your solution, and render
the results in your application; results from all content sources can be
included if you want to provide enterprise search capabilities, or you can
restrict the results to your solution data if required.
Building Applications on a
Searchable Platform
The easiest way to include search
capabilities in your solutions is to build your applications on Microsoft
Windows SharePoint Services 3.0 or Microsoft Office SharePoint Server 2007.
These products have evolved from simple collaboration platforms into proper
application development platforms. The advantages of building on Windows
SharePoint Services or Microsoft Office SharePoint Server 2007 include the fact
that many typical but complex issues have been solved, such as:
- How to implement document storage, source
control, information versioning, and collaboration features
- How to implement information security
policies
- How to categorize information
- How to satisfy non-functional
requirements, such as security, performance, and availability
- How to implement physical data storage
- How to include extensible workflows
- How to implement standard user interfaces
You should be aware that SharePoint-based
solutions are not restricted to document and collaboration scenarios. Anything
from a help-desk application, through e-commerce solutions, to
customer-relationship management systems can be built on SharePoint
technologies. The custom list infrastructure makes the SharePoint products
suitable even for solutions that traditionally required their own databases.
If you can build your solution on one of
the SharePoint products, then you automatically have access to powerful search
features, with very little development effort involved.
The level of search functionality depends
on which product you use to build your solution. If you build on Microsoft
Office SharePoint Server 2007, your solution natively has access to complete
enterprise search capabilities, including people search, and search of file
shares and external systems. If you build on Windows SharePoint Services, you
still have access to powerful search features but are restricted to your
solution’s own data and documents.
Making Proprietary File Formats
Searchable
Regardless of whether you build your
solution on a SharePoint-based platform, you must make architectural decisions
about how files are stored by your application, if you want them to be indexed
by Microsoft Office SharePoint Server 2007. The most important factor is the file
formats that your solution supports. As mentioned previously, each file type
requires an iFilter so that the indexing engine can read the document contents.
Microsoft Office SharePoint Server 2007 and Windows SharePoint Services include
iFilters for all Microsoft Office formats, and common Web formats (such as,
HTML, XML, text). Adobe provides iFilters for PDF files, and other vendors are
currently developing iFilters for their specific formats. If you have a
proprietary format, you need to obtain an iFilter if it exists, or develop your
own. Be aware that creating iFilters is not a simple task—your developers
require an in-depth knowledge of how the binary data is stored, and how that
represents file content and metadata. They must also develop in C++, rather
than Visual C# or Visual Basic .NET.
Important: Microsoft
Office SharePoint Server 2007 ships in both 32-bit and 64-bit versions and
iFilters are sensitive to processor architecture. For example, you would need
the 32-bit PDF iFilter if you index server is built on 32-bit architecture, and
you would need the 64-bit version if you use a 64-bit architecture. At the time
of writing, for example, only the 32-bit version of the Adobe PDF filter is
available, so if you need to index PDF files now, then you should install the
32-bit version of Microsoft Office SharePoint Server 2007 as the index server.
Note, though, that you can mix 32-bit and 64-bit versions for different server
roles in a SharePoint farm. The query servers and Web front end architectures
do not affect iFilter versions.
Enabling Search and Indexing of
Databases and Web Services
Even if your solution is not built on the
SharePoint platform, you can still have your application data indexed and make
it searchable to enterprise search tools. Microsoft Office SharePoint Server
2007 includes the powerful BDC (Business Data Catalog), which can be used to
access data in ODBC or OLE-DB compliant databases. The BDC can also be used to
access data that is returned from Web services.
BDC features are used throughout Microsoft
Office SharePoint Server 2007; they provide content from databases and Web
services for use in SharePoint lists, and can drive Web parts so that users can
retrieve and display data in Web pages. They can also be used as sources for
key performance indicators. The enterprise search features include a protocol
handler so that BDC applications can be indexed and searched.
From a solution architect point of view,
you must be aware of some issues that arise when you use the BDC to expose your
solution data for search.
First, the BDC can only index textual data
(including characters, numeric data, and dates) from databases and Web
services. The BDC does not index columns that contain binary data or files. If
you need to have that type of data indexed, you should consider using the
SharePoint platform for storage, so that you can easily configure the native
SharePoint protocol handler to index your binary data.
Second, the BDC supports security at either
the application or entity level. This security information is evaluated when
users access the data through the BDC, but it is also incorporated into the
security trimming algorithms that are used to render search results. If you
need to protect search results with row-level security, you can still use the
BDC as a content source for the indexing process, but you must also develop a
custom security trimmer assembly that can trim the search results
appropriately. It is a relatively simple task for a developer to create a
custom security trimmer—developers can use Visual C# or Visual Basic.NET to achieve this—but
custom trimmers can seriously affect search performance. Effectively, the
custom trimmer must implement a method for retrieving security information from
the source database or Web service, and this method is called once for each row
that is returned in the search results. The method typically calls a stored
procedure or Web service operation to return the appropriate security metadata
(which your developers must also create). Calling this method for each search result
may not only adversely affect performance of the Microsoft Office SharePoint
Server 2007 search components; it may also place a heavy load on the source
database or Web service.
Adding Search as an Application Capability
In addition to exposing your solution data
and files to the SharePoint search engine, you must also consider how users
will perform searches from within your solution. The basic questions you must
answer are:
- Will users be restricted to searching the
application’s data, or will they be able to perform enterprise-wide
searches from your solution?
- What components will be used to perform
the searches?
- How will search results be rendered and
what additional functionality will be embedded in the results?
The answers depend on your specific
requirements, but in general:
- You can access the SharePoint query
process programmatically in a number of different ways.
- You can choose to have Microsoft Office
SharePoint Server 2007 render your results in a Web page, or you can build
your own user interface for results.
The following sections describe the
different approaches you can take.
Using URL Syntax for Search
The easiest way to initiate a search by
using SharePoint query technologies is to issue an HTTP:GET request to a
pre-defined search results Web page that is provided by SharePoint.
Note: This
approach is best suited to Web-based applications that are not based on
SharePoint. If you are designing a SharePoint-based Web solution, you can use
the built-in search Web parts.
When your solution issues the request, it
should provide query string parameters that define the search. For example, the
following URL performs a search for the keyword term London:
http://moss.litwareinc.com/results.aspx?k=London
Your solution can search for multiple terms
by including space (%20) characters, as in the following URL:
http://moss.litwareinc.com/results.aspx?k=New%20York
It is a simple task to gather search terms
from a user, and then issue an appropriate search query. Your solution can
provide ASP.NET TextBox controls or other inputs, and an ASP.NET Button
control or LinkButton control for starting the search. Your server-side
code can collect the search terms from the inputs, then build the URL, and
finally perform a Response.Redirect call or Server.Transfer call
to the initiate the request.
The valid query strings are described in
Table 1.
|
Query String Parameter
|
Description
|
|
k
|
Keyword to search for. Unlike previous
versions of SharePoint, a logical and is used between multiple
terms, as in the example for New York.
|
|
s
|
Scope used to restrict search results. As
part of your design, you should consider how to use SharePoint scopes. Scopes
represent a logical view of the index. Scopes are compiled, and consist of
rules that define the view. The rules might pertain to a property that is
included in the index, or they might relate to the content sources. For
example, if you want to restrict search results to your solution’s data, you
can create a single scope for your application in SharePoint, and then use
its name in the URL queries.
|
|
v
|
View used to render the results. You can
only use one of two supported values for this parameter:
§
v=relevance
This value sorts the results in order of relevance. If you do not specify
this parameter, relevance is used as the default view.
§
v=date
This value sorts the results by date modified.
|
|
start
|
Page of results to display. The default
is the first page.
|
Table 1. Search Parameters
Hosting Search and Results Pages
By default, the user is redirected to the
SharePoint site to view the results. However, you might want the user to remain
in your Web application. You could implement this scenario by embedding the
SharePoint search and results pages in a separate frame within your application.
Alternatively, if you are developing a Windows Forms–based solution, you can
include a browser control for displaying the search and results pages from
within your application.
Regardless of whether you host the search
and results pages within your application or redirect the user to the
SharePoint Search center to view the results, you can modify the look and feel
of the results to match the style of your solution. Result customization can
take a number of different forms, including setting properties of the result
Web parts, through defining XSL transforms for displaying results, to creating
new Web parts for rendering search results. For more information about
customizing the results page, see:
Using URL-Based RSS Feeds for
Search
Microsoft Office SharePoint Server 2007 and
Windows SharePoint Services provide RSS feeds for many features and data. For
example, a user can subscribe to an RSS feed for a task list, a document
library, or any other type of list. Furthermore, the results of a specific
search can be used as a source for an RSS feed. Users can then be notified
through RSS when new results that match the search are discovered. As a
solution architect, you might consider adding RSS-consumer functionality to
your solution for search results. A simple way to do this is to issue an HTTP
request to the URL that represents the RSS subscription. As you design your
solution, you can determine the URLs used for RSS search feeds by performing a
search in a Microsoft Office SharePoint Server 2007 site. You can then review
the link for subscribing to the RSS feed from the results page.
Also consider, if your solution uses the
HTTP:GET method for performing searches, whether you might want to rely on the
links on the search results page that enable users to subscribe to search
results through RSS, rather than developing specific code for achieving the
same goal.
Using the Query Web Service for
Search
If you want users to issue search queries
to Microsoft Office SharePoint Server 2007, but remain in your
application, and you want to have a lot of control over how the results are
displayed and used, you can incorporate both the collecting of search terms and
the display of results into your solution.
To achieve this, you can access the Query
Web service from your solution and receive the search results programmatically.
This approach works for many different types of solution, including Web
applications, console applications, and Windows Forms executable.
To access the
Enterprise Search Query Web service and its methods, you typically add a Web
reference to the search.asmx Web service. The Web reference URL takes the
following form:
http://Server_Name/[sites/][Site_Name/]_vti_bin/search.asmx
After you have set a Web reference, you can
instantiate an object based on the QueryService class, set its
properties, and invoke its methods to issue queries programmatically. There are
two different methods for performing the query: the Query method, and
the QueryEx method. The distinction between these two methods is that Query
returns results as a string containing XML, whereas QueryEx returns an
ADO.NET DataSet. In either case, your solution can manipulate and
display the results to meet the specific requirements of your solution. For
either method, your solution must construct a query packet that describes the
search query. The query packet contains the query terms and additional query
options. The following code sample shows the basic XML-based structure of a
query packet.
<QueryPacket>
<Query>
<QueryId />
<SupportedFormats>
<Format />
</SupportedFormats>
<Context>
<QueryText />
<OriginatorContext />
</Context>
<Range>
<StartAt />
<Count />
</Range>
<Properties>
<Property />
</Properties>
<SortByProperties>
<SortByProperty />
</SortByProperties>
<EnableStemming />
<TrimDuplicates />
<IncludeSpecialTermResults />
<IgnoreAllNoiseQuery />
<IncludeRelevantResults />
<IncludeHighConfidenceResults />
</Query>
</QueryPacket>
For more information about using the Query
Web service to perform queries programmatically, see the Microsoft Office
SharePoint Server 2007 SDK topic:
Using the Object Model for Search
If you are architecting your solution on
the Microsoft Office SharePoint Server 2007 platform, your code has access to
the full SharePoint in-process object model. The functionality and approach for
issuing search queries by using the object model is similar to using the Web
service, although the syntax and protocols are different. The object model
supports both keyword and SQL syntax. The distinction between these two types
of queries is that they use different syntax, and therefore have different
capabilities. For example, keyword queries are easy to develop because you do
not need to write search-specific SQL language constructs, but keyword syntax
supports only phrase, word, or prefix exact matches. Also, while you can use keyword
syntax to specify whether to include or exclude a particular search term, you
cannot construct complex groupings of included and excluded terms.
In comparison, SQL queries provide support
for constructing complex search queries, including logical operators,
wildcards, full-text operators, sorting clauses, and comparison operators.
You can issue keyword queries by
instantiating an object based on the KeywordQuery class, whereas you
perform SQL queries by using the FullTextSQLQuery class. Both types of
queries return a ResultTableCollection object. The
ResultTableCollection class inherits from the ADO.NET DataTable
class so your solution can access results by using standard ADO.NET techniques,
such as building DataReader objects.
For more information about how to construct
queries, execute them through the object model, and render results, see the
Microsoft Office SharePoint Server 2007 SDK topic:
Taking Advantage of the ‘Database of Intention’
As with all search solutions, the power and
flexibility of SharePoint indexes and searches are only as good as the data it
contains. As a solution architect, you must ensure that the data in your
application is indexed appropriately, and that the search capabilities in your
solution yield relevant results to your users.
You must ensure that the search solution
does not just match keywords to content, but that it takes the search
experience to the next level by ensuring that the search solution also functions
as a ‘database of intention’. So from the users’ perspective, the search engine
appears to understand what they are looking for, and can make informed
decisions when ranking the search results. The search engine should even be
able to make alternative suggestions to the users.
As a solution architect, you can use many
of the mature features provided by Microsoft Office SharePoint Server 2007 to
achieve these goals. For example, you can define managed properties for files
and other items, and you can then configure the gatherer service to index those
properties. You can go further than that by creating SharePoint content types
that contain managed properties. Content types and their properties not
only help to enhance the search experience, they also help in taxonomy schemes,
workflows, and other collaborative solutions, by adding levels of manageability
and flexibility to your information systems.
You should also make use of search-specific
configurations to enhance the relevancy and accuracy of search results. For
example, you can add to the noise words used by the gatherer service, and you
can specify synonyms for related terms. Furthermore, you can specify keywords
that are important in your information domains, and even have those keywords
associated with specific Best Bet sites or other content sources.
Additionally, you can specify given content
stores as authoritative sites, for stores that should be highly ranked
compared to other sites, because you know that the site contains high-quality
information.
The most powerful and complex way of tuning
relevancy is to modify the default ranking parameters. For example, you might
want to change the weighting given to specific file types, or you might want to
specify that the author should be given more weight than, say, the title
property. You can only modify ranking parameters through custom code. You
should research this area very carefully before doing so, because it can
adversely affect query accuracy. However, if you do need to modify these
parameters, for more information about how to do so, see Microsoft Office
SharePoint Server 2007 SDK topics:
You should also architect your solution to
require that users provide as much metadata as required, to help the ranking
process. For example, many users do not fill in file properties, such as author,
keywords, subject, and title, even though this data is
invaluable to both the indexing and the ranking engine.
Finally, the most important task that you
can do is to monitor the accuracy of search over time. Microsoft Office SharePoint
Server 2007 provides many query reports, such as those that show common
queries, those that report queries with zero results, and peak query usage
statistics. You should interpret these reports based on what you know about
your information domain. For example, if queries for Mountain Bike
provide zero results, but your company is a bike manufacturer, then perhaps
relevant data is not being indexed; you could therefore ensure that new content
sources are included in the index.
Alternatively, you might see that queries
are consistently pointing users to inappropriate sources when you know that
other high quality information exists elsewhere; you could therefore set up
authoritative sites, or associate the high-quality store with specific keywords
and Best Bets.
Every organization has different
requirements and experience of search relevancy, depending on the quality and
properties of data. What they all have in common, however, is that relevance
can only be improved by monitoring usage and reports over time. As your body of
information evolves, so your indexing and search strategies will change.
Consequently, you need to fine-tune the search solutions on a regular basis.
Conclusion
As a solution architect, you must ensure
that your applications fit into your organization’s strategy for enterprise
search. Because search is increasingly viewed as a business process, you must
consider indexing and search functionality as seriously as you consider other
functional and non-functional requirements. The two basic issues that you
should consider are how data in your solution is indexed and exposed to
enterprise search engines, and how users of your solution search for
information, regardless of whether they search from your application or from
the wider body of information that constitutes your company’s collective
knowledge. The SharePoint Products and Technologies suite of services provide
rich, robust functionality that you can use to meet these search requirements.
Appendix A. Search Features in SharePoint Products and Technologies
The SharePoint Products and Technologies suite
includes:
- Windows SharePoint Services 3.0 (WSS)
- Microsoft Office SharePoint Server 2007
for Search (MOSS-FS) Standard Edition
- Microsoft Office SharePoint Server 2007
for Search (MOSS-FS) Enterprise Edition
- Microsoft Office SharePoint Server 2007
(MOSS) Standard Edition
- Microsoft Office SharePoint Server 2007
(MOSS) Enterprise Edition
The following table lists the search-specific
features and capabilities of each product:
|
Features
|
Windows SharePoint Services
|
MOSS-FS Standard
|
MOSS-FS Enterprise
|
MOSS Standard
|
MOSS Enterprise
|
|
Index SharePoint Sites and SharePoint
Data
|
•
|
•
|
•
|
•
|
•
|
|
Index Web Sites
|
|
•
|
•
|
•
|
•
|
|
Index File Shares
|
|
•
|
•
|
•
|
•
|
|
Index Microsoft Exchange Server Public
Folders
|
|
•
|
•
|
•
|
•
|
|
Index Lotus Notes Databases
|
|
•
|
•
|
•
|
•
|
|
Index Third-Party Document Repositories
|
|
•
|
•
|
•
|
•
|
|
Secure Content Access Control
|
•
|
•
|
•
|
•
|
•
|
|
Enhanced Search Center User Interface
|
|
|
|
•
|
•
|
|
Search for People and Expertise
|
|
|
|
•
|
•
|
|
Index Relational Data
|
|
|
|
|
•
|
|
Index Web Service Data
|
|
|
|
|
•
|
|
Item Limit
|
500,000
|
No
Limit
|
No
Limit
|
No
Limit
|
No
Limit
|
About the Author
Beat Schwegler is the Enterprise and Technical Evangelism Lead in the Developer and Platform Evangelism Group for Microsoft Western Europe. He provides advice and consulting to enterprise companies in software architecture and related topics and is a frequent speaker at international events and conferences. He has more than 14 years experience in professional software development and architecture and has been involved in a wide variety of projects, ranging from real-time building control systems, best selling shrink-wrapped products to large scale CRM and ERP systems.