Ranking and Sorting (FAST Search Server 2010 for SharePoint)
Updated: August 2011
Learn how to sort the query results for FAST Search Server 2010 for SharePoint.
You can specify how to sort the query results for FAST Search Server 2010 for SharePoint in the following four ways:
Sort by Rank Enables you to sort the query result by relevance (rank).
Sort by Managed Property Enables you to sort the query result based on the value of one or more managed properties.
Sort by Formula Enables you to sort the query result by a formula specified in the query request.
Sort in Random Order Enables you to sort the query result in random order, or add a random component to the sort order.
Applies to: Microsoft FAST Search Server 2010 for SharePoint
If you use the Query Web service, you specify the sort criteria by using the SortByProperties Element in Microsoft.Search.Query Schema. This element contains one or more SortByProperty elements, each representing one level in the sort specification.
If you use the Query object model, you specify the sort criteria by using the SortList property of the KeywordQuery class. This property returns a collection of Sort objects, each representing one level in the sort specification.
If you have multiple levels in the sort specification, the sorting is performed in the sequence it appears in the sort specification. After sorting based on the first level, the next level is relevant only for results that have the same value for the sort criteria specified for the first level.
You can specify individual sort direction for each level in the sort specification.
The default sorting mechanism is to sort query results by relevancy (rank). This means that FAST Search Server 2010 for SharePoint places the most relevant results first in the query result set. For more information about how to tune the relevance ranking, see Improving Relevance for FAST Search Server 2010 for SharePoint.
If you want to use a rank profile that is different from the default rank profile, you can specify the name of the rank profile in the sort specification.
If you sort by rank, the result is always sorted in descending order.
In addition to using a rank profile, you can also impact the rank calculation in the query string. You can do that in two ways, as follows.
You can specify query result sorting based on the value of one or more managed properties in the query result. This means that FAST Search Server 2010 for SharePoint performs the sorting based on all results that match the query.
You can sort based on text and numeric properties. For text properties, the sorting is based on standard text string sorting. For numeric properties (including managed properties of type Datetime), the sorting is based on numeric value.
You can specify query result sorting based on a sort specification that uses a mathematical formula to create the sorting value.
Sort by formula is an extension of the single-level and multilevel sorting functionality for query results. The feature enables you to specify a formula instead of a managed property as sorting criteria.
By using the sort by formula feature you can apply mathematical operations on the value of one or more managed properties for each item in the query result.
The following are examples of features that can be implemented by using this feature:
K-nearest neighbor algorithm to classify documents.
Euclidean distance or Manhattan distance to calculate geographical distances.
Preferred value, for example, to sort documents based on how far a given managed property value is from a preferred value.
This feature does not include control of statistical dynamic rank parameters such as term frequency and proximity.
The formula is evaluated left to right and uses standard mathematical-operator precedence. That is, functions and parenthetical groups are evaluated first, multiplication and division operations are performed next, and addition and subtraction operations are performed last.
The final result of a formula must be in the value range of a 32-bit signed integer. Otherwise, the sorting may not be correct.
Specifying the Sort Formula in a Query
You specify a sort formula instead of a managed property in the sorting specification of the query request.
The sort specification has the following format:
In the format, <sort-formula> is the sort formula expression.
The square brackets are part of the sort specification syntax.
The default sort direction is Descending. You may also use a formula that sort by ascending value, for example, if the formula specifies a geographical distance.
Following is an example that shows how to specify sort by formula with ascending sort order in a Query Web service request using the SortByProperties element.
<SortByProperties> <SortByProperty name="[formula:abs(2000-size)]" direction="Ascending" /> </SortByProperties>
For more information on how to add the sort specification to a query, see How to Specify Sorting in a Query Request.
Using Managed Properties in the Formula
You can apply a sort formula on the value of managed properties of type Integer, Decimal, and Datetime. You must enable sorting for the specified managed property in the index schema.
For managed properties of type Decimal, the value is multiplied by 10^(decimal digits) before being used in the formula evaluation.
For managed properties of type Datetime, the value is converted to the number of 100 nanoseconds since January 1 29000 BC before being used in the formula evaluation. There are 366 days in the year.
Sort Formula Expressions
Table 1 lists the functions that can be used in the sort formula expression. The expression must not contain spaces.
By default a division by zero results in an exception, and the query returns with an error. By using the errtolast operator, you can avoid the query error and instead place the failing items at the end of the result set.
A special keyword that represents the dynamic rank of an item.
Example: abs(rank-100) will use the distance from rank value 100 as the sorting criteria.
Specifies that numbers can be given as integer or double values.
Examples: 503, 3.14, 5.4352262
Specifies that any character sequence not recognized as a function name is treated as a managed property name. You must enable sorting for the specified managed property in the Index Schema (FAST Search Server 2010 for SharePoint).
Example: You can define a managed property named height with sorting enabled. This enables you to use "height" as an expression in the formula. The formula will use the value of the height managed property.
( and )
Used to group calculations ensuring correct precedence.
The square root of n.
The exponential function that is equivalent to pow(2.71828182846,n).
The natural logarithm of n.
The absolute value of n.
The ceiling of n. That is, if n is not a whole number, round up to the next whole number. If n is a whole number, use n.
The floor of n. That is, if n is not a whole number, round down to the next whole number. If n is a whole number, use n.
The rounding of n to the nearest even whole number. Also known as "Bankers rounding" or "Round half to even".
The sine of n radians.
The cosine of n radians.
The tangent of n radians.
The arcsine, in radians, of n.
The arccosine, in radians, of n.
The arctangent, in radians, of n.
The value of x raised to the power of y.
A two-argument arctangent of the angle in radians between the positive x axis and the specified Cartesian coordinate (x,y).
An operator that can be used to provide discrete values for given value distribution ranges for an expression.
The expression b can be a managed property or any other formula expression. The arguments n1, n2, … represent numeric thresholds. You can specify an arbitrary number of bucket thresholds.
A given value for the input expression b is rounded down to the closest numeric threshold given. If lower than the lowest threshold given, the resulting value is zero.
An operator that can be used to control how to handle formula exceptions; x can be any formula expression. If the calculation of this formula expression leads to a mathematical exception for an item in the result set, such as division by zero, these items appear at the end of the sort list, regardless of specified sort direction.
Performance Characteristics for Sort by Formula
Using a sort formula implies that the formula calculations are applied to all matching items in the result set. This means that the query performance impact depends on the number of items that match the query.
Long formulas with many operators require more processing time than short formulas.
Using Sort by Formula for Geographical Distance
You can use sort by formula to apply a ranking based on distance. This requires that you include managed properties that represent the latitude and longitude of each item.
For example, you can use one of the following standard formulas:
Euclidian distance. See Example 2.
You must use managed properties of type integer for the latitude and longitude values. On the content side, you may convert the latitude and longitude values to integer values in an external pipeline extensibility component. On the query side you may perform the same conversion in the query client.
The following examples show how to specify the sort formula in a Query Web service request.
Example 1. Place the items that have the height managed property closest to 20 on top of the result list.
<SortByProperties> <SortByProperty name="[formula:abs(20-height)]" direction="Ascending" /> </SortByProperties>
Example 2. Sort by true 3-D Euclidean distance from a given base position (for example, user's position), based on position information that is provided in the managed properties latitude, longitude and height. The following formula provides the 3-D Euclidian distance, given that the base position is 50/100/200 (latitude/longitude/height):
If you want to apply a distance-based result sorting (not combining the distance with other parameters in a formula), you can remove the sqrt() component, as this does not change the sorting sequence. This improves the query performance.
<SortByProperties> <SortByProperty name="[formula:pow(50-latitude,2)+pow(100-longitude,2)+pow(200-height,2)]" direction="Ascending" /> </SortByProperties>
Example 3. Round the values of size into buckets, rounding values down to one of the values 0, 5, 15, 50, 100. Sort with largest values first.
Using Sort by Formula in the Query Web Service
Follow the steps in Walkthrough: Querying FAST Search Server From a Client Application, and extend the code to add a sort by formula specification as described previously in Example 1.
Replace the definition of the queryXML2 string, to add a sort by formula specification in the query.
// queryXML2 is the part of the XML after the query string. string queryXML2 = @" </QueryText> </Context> <ResultProvider>FASTSearch</ResultProvider> <Range> <Count>10</Count> </Range> <SortByProperties> <SortByProperty name='[formula:abs(20-height)]' direction='Ascending' /> </SortByProperties> </Query> </QueryPacket>";
You may apply random sorting of the query result, or add a random component to the result sorting.
The random sort specification has the following format:
[random:seed=<seed>:hashfield=<managed property>:addtorankmax=<max random value>]
The square brackets are part of the sort specification syntax.
Table 2 explains the parameters to the random sort specification.
The seed for the random value generation.
The seed value is input to a function that generates a random number. This random number is used in the final sorting.
Using only the seed option will give you a randomly sorted query result set. The sorting order for the same query (when using the same seed) may change after an index update.
A managed property that is used as the hash value for the random generation.
You can use this parameter to ensure that the sorting order for the same query (when using the same seed) does not change after an index update.
The managed property must be of type integer and must have sorting enabled in the index schema.
You may fill this managed property with random or unique values (for example a sequence number populated by an item processing stage).
Use this parameter if you want to apply a limited randomization of the query result, where the result is still mainly sorted by rank.
A random value between 0 (zero) and the value specified for this parameter will be added to the rank value of each item in the result set. This allows you to randomize results within an interval.
It is the original (prior to the sorting) rank value that is provided with the query result, not the rank including the random value.
By providing the same seed for equal queries, items will be presented in the same order. This enables you to preserve the same random order when paging through search results. Use the hashfield parameter if you want to preserve the same random order when an index update accidentally occurs between the queries.
The following examples show how to specify random sorting in a Query Web service request.
Example 1. Random sorting of the entire result set.
Example 2. Random sorting of the entire result set. Preserve same random sequence for the same query with the same seed even if an index switch occurs. A custom managed property named hashvalue must be available in the index schema, and populated with random or sequential numeric values for all indexed items.
<SortByProperties> <SortByProperty name="[random:seed=6543:hashfield=hashvalue]" /> </SortByProperties>
Example 3. Add a random value between 0 and 200 to the rank value for each item. This will reorder the result set to a limited extent.
Changed reference from Federation OM to Query OM
Added section on random sorting.
Added new sub-section on random ranking.
Added web service example.
Changed how to apply sort by formula, no longer using FQLFormula as sort direction. This enables using descending and ascending sort direction in a formula
More information on rank and managed property based sorting. Documentation of additional sort formula operators.