Find Similar (FAST Search Server 2010 for SharePoint)

The find similar features enable you to search for documents that are similar to already retrieved query results.

The similarity evaluation is based on a statistical measure. FAST Search Server 2010 for SharePoint automatically creates a similarity component that is added to the query.

In this article
Type of Find Similar Query
Item Similarity Vector Reference
Sorting of Find Similar Query Results

Applies to: SharePoint Server 2010

Type of Find Similar Query

This property represents the type of find similar query to perform.

The document vectors for each item, sorted by decreasing weights, can be used to build three types of similarity searches for an item d, given an original query Q. These similarity search requests are transformed to a new unique query, using the following rewrite of the query (shown using a symbolic representation, not the exact query language):

  • FindSimilar: Query = Q OR <s1,w1> [OR <sm,wm>]* The similarity vectors are added to the query using an OR operator. This means that the original query is included in the rewritten query, but the new query can match similar items even if the original query is not met.

  • RefineSimilar: Query = Q AND (<s1,w1> [OR <sm,wm>]*) The query will match if the original query conditions and the similarity vector conditions are met. For example, refine the original query to contain items similar to the item indicated in <SimilarTo>.

  • ExcludeSimilar: Query = Q ANDNOT (<s1,w1> [OR <sm,wm>]*) The query will match if the original query conditions are met, but not the similarity conditions.

    <s,w> indicates the item's similarity vector as computed during item processing.

Default: FindSimilar

Note

The similarity component that is added to the query (<s1,w1> [OR <sm,wm>]*), is querying the default full-text index.

Item Similarity Vector Reference

This property represents a similarity reference when searching for similar items. This is a similarity vector representation that is returned for each item in the query result in the docvector managed property.

The value is a string formatted according to the following format:

[string1,weight1][string2,weight2]...[stringN,weightN]

When performing a find similar query, the SimilarTo element should contain a string parameter with the value of the docvector managed property of the item that is to be used as the similarity reference. The similarity vector consists of a set of "term,weight" expressions, indicating the most important terms or concepts in the item and the corresponding perceived importance (weight). Terms can be single words or phrases.

The weight is a float value between 0 and 1, where 1 indicates the highest relevance.

The similarity vector is created during item processing and indicates the most important terms or concepts in the item and the corresponding weight.

Sorting of Find Similar Query Results

The FindSimilar property specifies whether query results based on similarity are sorted by similarity or rank.

When performing a find similar query, the results can be sorted in two ways:

  • By relevance score (rank). This is the sorting method for normal queries, and corresponds to SortSimilar="False".

  • By similarity. This is the default sorting for similarity queries, where the most similar items are listed first. This corresponds to SortSimilar="True".

Default: True