mdq.Similarity (Transact-SQL)

Returns a similarity score. This score indicates the similarity between two strings that are compared by a specified match algorithm. This function is in the mdq schema and is available only in the Master Data Services database. 

Topic link iconTransact-SQL Syntax Conventions


mdq.Similarity (input1,input2,method,containmentBias , [ minScoreHint ] )

input1

Is an expression of character or binary data to be compared with input2. input1 can be a constant, variable, or column. input1 must be of a data type that can implicitly convert to nvarchar, except text, ntext, and image.

input1 is nvarchar(4000) with no default.

input2

Is an expression of character or binary data to be compared with input1. input2 can be a constant, variable, or column. input2 must be of a data type that can implicitly convert to nvarchar, except text, ntext, and image.

input2 is nvarchar(4000) with no default.

method

Represents an enumeration of the match algorithm to use in the comparison between input1 and input2. method is a value of type tinyint. Valid values are:

  • 0 for the Levenshtein edit distance algorithm.

  • 1 for the Jaccard similarity coefficient algorithm.

  • 2 for a form of the Jaro-Winkler distance algorithm.

  • 3 for a longest common subsequence algorithm.

containmentBias

Specifies how exact the fuzzy index is when it is comparing strings of different lengths. Values must be in the range 0.0 to 1.0 inclusive. A value of 0.0 implies that the index will reduce the score of matches that have different string lengths. A value of 1.0 implies that the index is less exact. containmentBias is an expression of type float with no default. containmentBias is used only for the Jaccard similarity coefficient and longest common subsequence algorithms.

minScoreHint

Specifies the minimum similarity score to use. The valid range is 0.0 to 1.0. When a value is specified, any match lower than that value will cause the function to return a result of 0. If a match is higher than the specified value, the result will be the similarity score.

This parameter is used as a performance hint. If the internal algorithms can immediately determine whether the inputs could never reach the specified minScoreHint value, then the routine will exit early. This saves the cost of computing the exact score.

minScoreHint is an optional parameter of type float. It has no default value. Specify 0.0 if you do not have a specific value to use.

Requires membership in the public role.

The following example compares two strings by using the match algorithms of Levenshtein edit distance (0), Jaccard similarity coefficient (1), a form of Jaro-Winkler distance (2), and longest common subsequence (3).

USE MDM_Sample;
GO

--Using Levenshtein edit distance algorithm.
SELECT mdq.Similarity(N'Alexia Geogio', N'Alexandra George', 0, 0.0, 0.0);
--Using Jaccard similarity coefficient algorithm.
SELECT mdq.Similarity(N'Alexia Geogio', N'Alexandra George', 1, 0.0, 0.0);
--Using Jaro-Winkler distance algorithm.
SELECT mdq.Similarity(N'Alexia Geogio', N'Alexandra George', 2, 0.0, 0.0);
--Using longest common subsequence algorithm.
SELECT mdq.Similarity(N'12345', N'93459', 3, 0.0, 0.0);
Show: