SharePoint Search SQL Query Language Information
Last modified: April 21, 2010
Applies to: SharePoint Server 2010
Microsoft SharePoint Server 2010 search provides content crawling and search features that support full-text searching. The query language used by SharePoint Server search extends the standard SQL-92 and SQL-99 database query syntax to enhance its usefulness with text-based searches.
Identifiers specify the names of columns (sometimes referred to as properties), catalogs, and aliases. Literals, by contrast, specify string and numeric values.
You can create identifiers that are up to 128 characters in length, and in one of two types, distinguished by the characters used in the identifier name:
Regular identifiers Contain only the characters A-Z, a-z, 0-9, and underscore, and begin with a letter. You do not need to enclose regular identifiers in double quotation marks.
Delimited identifiers Can contain any valid Unicode character, and must be enclosed in double quotation marks.
Identifiers that you create as Uniform Resource Names (URNs) must contain special characters, are always delimited identifiers, and must be enclosed in double quotation marks.
You can use the asterisk (*) as a special column identifier when you want to specify that SharePoint Server search includes all of the indexed properties in the query. Although it is not a regular identifier, it does not require double quotation marks.
A literal is a string of characters that represents a value in a query statement. You use literals to compare column values or to specify search terms. SharePoint Server search supports the following types of literals.
String literals are not limited in length, and can contain either American National Standard Institute (ANSI) or Unicode characters. You must enclose string literals in single quotation marks. To include a single quotation mark inside a string literal, use two single quotation marks. Represent an empty string as two consecutive single quotation marks ('').
Numeric literals represent numbers-including positive and negative integers, decimal numbers, and currency values-and can contain the digits 0-9, a period, and the letter E (or e). You can define numeric literals by using scientific notation (for example, 2.3E-05). Do not enclose a numeric literal in single quotation marks or it is interpreted as a string literal and is compared by using string comparison techniques. Currency values cannot contain currency symbols.
A hexadecimal literal represents an unsigned integer specified in hexadecimal notation, and can contain the digits 0-9 and the letters A-F and a-f. Hexadecimal literals must begin with 0x.
The SQL-92 standard requires that hexadecimal literals be enclosed in single quotation marks, however, SharePoint Server search does not support that notation.
Boolean literals represent logical values, and can be either TRUE or FALSE. Do not enclose a Boolean literal in single quotation marks, or it is interpreted as a string literal.
Date literals represent specific dates, time stamps, or relative times, and are enclosed in single quotation marks. You must put dates in the form year/month/day or year-month-day, where the month, day, and year are numbers. Specify the year with a four-digit value, for example, 2004. Time values must be in the form hours:minutes:seconds. Relative time syntax is based on the DATEADD Function in SharePoint Search SQL Syntax.
SharePoint Server search queries are not case-sensitive. This includes query keywords (SELECT is identical to Select, select, and sELect) and search terms. If an identifier or literal uses Unicode characters that have case-mapping semantics, then the search engine considers that all the cases are equivalent.
SharePoint Server search queries are not sensitive to case, nor are they sensitive to accents when using either the FREETEXT predicate or CONTAINS predicate. A search for the term "resume" also returns documents that contain "résumé". If an identifier or literal contains accented Unicode characters, the search query also matches entries that contain Unicode combined characters that use the base character of the accented character in the identifier or literal.
This is not the case for the character ranges 0x2e81-f8ff and 0x1100-0x11ff.
When used in a relational database, rows that are returned by a search query must meet all the conditions called for by the query. In contrast, an SharePoint Server search query can return documents that meet the search conditions to varying degrees.
For example, a search for the term "program" in a relational database produces records that contain that specific spelling of the word. Whether a record contains one or one hundred instances of the word has no impact on the results. In contrast, SharePoint Server search returns a relevance value associated with the matching documents. The relevance of documents having "program" in the title is higher than those that contain the word only in the last paragraph. Similarly, documents containing variations of the search term, for example "programs" and "programming", also match and are returned by the query.
SharePoint Server search queries return integer relevance values in the column named "rank".
Rank values returned by the query are integers ranging from 0 to 1000.
Higher rank values indicate documents that match the search conditions better than others.
Rank values apply only to the current query, so they cannot be compared for results across queries.
Rank values are relative to the other documents matching the query. Therefore, the rank value of a particular document depends on the other documents that also match the query.
The rank value for items matching a purely relational predicate is 1000.
SharePoint Server search supports full Unicode searching. You can index documents that use Unicode, and create search queries that use Unicode identifiers and literals. However, the language setting of an application or the operating system can have far-reaching effects on how SharePoint Server search works.
Document and System Locale Settings
When the operating system—or even an application—is set to use a particular language and locale, many settings are affected. These settings include numeric format, date format, currency format, uppercase and lowercase mapping, dictionary sort ordering, tokenization, and others. Although these settings help SharePoint Server search provide excellent localized support, unexpected results can occur when documents from one locale are searched by using a system set to another locale.
For example, the list of noise words (words discarded during indexing and from queries because they give no meaning or context) in each language is very different. In German, the word die is equivalent to the English word the. If you index a German document and then search for "die" by using an English query system, documents might be returned, even though the word should be ignored. A German system that is given the same query would return an error stating that the search query contained only noise words.
When the IFilter object processes a document's text properties and content, it reports the language of that document to the content indexer. By using this information, SharePoint Server search can apply the appropriate word breaker and noise words list.
You can specify the language used in search queries. Both the FREETEXT and CONTAINS predicates for the WHERE clause support specifying a language. You can indicate the query language by providing a numeric locale identifier (LCID) in the CONTAINS or FREETEXT predicate, by using the following syntax:
CONTAINS | FREETEXT
For more information, see the syntax for the CONTAINS and FREETEXT predicates.