1.1 Glossary

This document uses the following terms:

access URL: An internal Uniform Resource Locator (URL) that is used by a crawler to identify and gain access to an item.

anchor content source: A content source that is used to import the anchor text from links between items into the full-text index catalog.

anchor crawl: A process in which anchor text from links between items is added to a full-text index catalog.

authority hops: The number of site levels to be navigated from a start address to a specific item.

base64 encoding: A binary-to-text encoding scheme whereby an arbitrary sequence of bytes is converted to a sequence of printable ASCII characters, as described in [RFC4648].

binary large object (BLOB): A discrete packet of data that is stored in a database and is treated as a sequence of uninterpreted bytes.

Business Data Connectivity (BDC): A shared service that stores information about business application data that exists outside a server farm. It can be used to display business data in lists, Web Parts, search results, user profiles, and custom applications. Previously referred to as Business Data Catalog.

certificate: When referring to X.509v3 certificates, that information consists of a public key, a distinguished name (DN) of some entity assumed to have control over the private key corresponding to the public key in the certificate, and some number of other attributes and extensions assumed to relate to the entity thus referenced. Other forms of certificates can bind other pieces of information.

content source: A set of options for specifying the type of content to be crawled and the start addresses for the content to be indexed. A content source is defined by the protocol handler that is used to access specific systems, such as SharePoint sites, file systems, and external websites. A content source can contain up to 500 start addresses.

cookie: A small data file that is stored on a user's computer and carries state information between participating protocol servers and protocol clients.

crawl: The process of traversing a URL space to acquire items to record in a search catalog.

crawl account: A user account that has access to all of the content that is traversed by a crawl component.

crawl mapping: A mapping that associates an access URL, which is used to obtain an item from a content source, and a display URL, which is the address of the item.

crawl queue: A data structure that stores the list of items to crawl next.

crawl rule: A set of preferences that applies to a specific URL or range of URLs. A crawl rule can be used to include or exclude items in a crawl and to specify the content access account to use when crawling that URL or range of URLs.

crawler: A process that browses and indexes content from a content source.

delete crawl: A process that is started automatically after a content source or start address deletion occurs and removes associated items from a search catalog.

display URL: The URL that is displayed on a search results page for each search result. This can be different than an access URL. See also access URL.

domain: A set of users and computers sharing a common namespace and management infrastructure. At least one computer member of the set must act as a domain controller (DC) and host a member list that identifies all members of the domain, as well as optionally hosting the Active Directory service. The domain controller provides authentication of members, creating a unit of trust for its members. Each domain has an identifier that is shared among its members. For more information, see [MS-AUTHSOD] section 1.1.1.5 and [MS-ADTS].

exclusion list: A list of items to exclude from query results and to remove from a search index the next time that a crawl occurs.

file: A single, discrete unit of content.

file extension: The sequence of characters in a file's name between the end of the file's name and the last "." character. Vendors of applications choose such sequences for the applications to uniquely identify files that were created by those applications. This allows file management software to determine which application are to be used to open a file.

forms authentication: An authentication method in which protocol clients redirect unauthenticated requests to an HTML form by using HTTP. If the protocol client authenticates the request, the system issues a cookie that stores the credentials or a key for reacquiring the identity. In subsequent requests, the cookie is submitted in request headers and the requests are authenticated and authorized by an ASP.NET event handler that uses the validation method that is specified by the protocol client.

full crawl: A crawl process that indexes all of the items in a specified content source, regardless of whether the item was modified.

full-text index catalog: A collection of full-text index components and other files that are organized in a specific directory structure and contain the data that is needed to perform queries.

globally unique identifier (GUID): A term used interchangeably with universally unique identifier (UUID) in Microsoft protocol technical documents (TDs). Interchanging the usage of these terms does not imply or require a specific algorithm or mechanism to generate the value. Specifically, the use of this term does not imply or require that the algorithms described in [RFC4122] or [C706] must be used for generating the GUID. See also universally unique identifier (UUID).

host hop: The process of traversing to a server with a different host name during a crawl.

HTTP GET: An HTTP method for retrieving a resource, as described in [RFC2616].

HTTP POST: An HTTP method, as described in [RFC2616].

Hypertext Transfer Protocol (HTTP): An application-level protocol for distributed, collaborative, hypermedia information systems (text, graphic images, sound, video, and other multimedia files) on the World Wide Web.

Hypertext Transfer Protocol Secure (HTTPS): An extension of HTTP that securely encrypts and decrypts web page requests. In some older protocols, "Hypertext Transfer Protocol over Secure Sockets Layer" is still used (Secure Sockets Layer has been deprecated). For more information, see [SSL3] and [RFC5246].

inclusion list: A list of items to include in query results and to add to a search index the next time that a crawl occurs.

incremental crawl: A crawl process that includes logic to index only a subset of the items in a content source that is crawled based on item modifications.

index server: A server that is assigned the task of crawling.

item: A unit of content that can be indexed and searched by a search application.

metadata index: A data structure that is stored on a back-end database server. It stores properties that are associated with each item, and the attributes of those properties.

page hop: The process of traversing from one item to another during a crawl. See also site hop.

query component: A search component that contains and manages an index partition, which contains all or a subset of the data that is collected by a crawl component. A query component can also process requests for that data.

registry: A local system-defined database in which applications and system components store and retrieve configuration data. It is a hierarchical data store with lightly typed elements that are logically stored in tree format. Applications use the registry API to retrieve, modify, or delete registry data. The data stored in the registry varies according to the version of the operating system.

search catalog: All of the crawl data that is associated with a specific search application. A search catalog provides information that is used to generate query results.

search query: A complete set of conditions that are used to generate search results, including query text, sort order, and ranking parameters.

search service account: A user account under which a search service runs.

search service application: A shared service application that provides indexing and querying capabilities.

security trimmer: A filter that is used to limit search results to only those resources that a user can view, based on the user's permission level and the access control list (ACL) for a resource. A security trimmer helps to ensure that search results display only those resources that a user has permission to view.

share: A resource offered by a Common Internet File System (CIFS) server for access by CIFS clients over the network. A share typically represents a directory tree and its included files (referred to commonly as a "disk share" or "file share") or a printer (a "print share"). If the information about the share is saved in persistent store (for example, Windows registry) and reloaded when a file server is restarted, then the share is referred to as a "sticky share". Some share names are reserved for specific functions and are referred to as special shares: IPC$, reserved for interprocess communication, ADMIN$, reserved for remote administration, and A$, B$, C$ (and other local disk names followed by a dollar sign), assigned to local disk devices.

site collection: A set of websites that are in the same content database, have the same owner, and share administration settings. A site collection can be identified by a GUID or the URL of the top-level site for the site collection. Each site collection contains a top-level site, can contain one or more subsites, and can have a shared navigational structure.

SOAP: A lightweight protocol for exchanging structured information in a decentralized, distributed environment. SOAP uses XML technologies to define an extensible messaging framework, which provides a message construct that can be exchanged over a variety of underlying protocols. The framework has been designed to be independent of any particular programming model and other implementation-specific semantics. SOAP 1.2 supersedes SOAP 1.1. See [SOAP1.2-1/2003].

SOAP action: The HTTP request header field used to indicate the intent of the SOAP request, using a URI value. See [SOAP1.1] section 6.1.1 for more information.

SOAP body: A container for the payload data being delivered by a SOAP message to its recipient. See [SOAP1.2-1/2007] section 5.3 for more information.

SOAP fault: A container for error and status information within a SOAP message. See [SOAP1.2-1/2007] section 5.4 for more information.

start address: A URL that identifies a point at which to start a crawl. Administrators specify start addresses when they create or edit a content source.

Status-Code: A 3-digit integer result code in an HTTP response message, as described in [RFC2616].

Uniform Resource Identifier (URI): A string that identifies a resource. The URI is an addressing mechanism defined in Internet Engineering Task Force (IETF) Uniform Resource Identifier (URI): Generic Syntax [RFC3986].

Uniform Resource Locator (URL): A string of characters in a standardized format that identifies a document or resource on the World Wide Web. The format is as specified in [RFC1738].

Universal Naming Convention (UNC): A string format that specifies the location of a resource. For more information, see [MS-DTYP] section 2.2.57.

URL encode: The process of encoding characters that have reserved meanings for a Uniform Resource Locator (URL), as described in [RFC1738].

URL space: A list of Uniform Resource Locators (URLs) that contains information about the links from each URL to other URLs.

web server: A server computer that hosts websites and responds to requests from applications.

web service: A unit of application logic that provides data and services to other applications and can be called by using standard Internet transport protocols such as HTTP, Simple Mail Transfer Protocol (SMTP), or File Transfer Protocol (FTP). Web services can perform functions that range from simple requests to complicated business processes.

Web Services Description Language (WSDL): An XML format for describing network services as a set of endpoints that operate on messages that contain either document-oriented or procedure-oriented information. The operations and messages are described abstractly and are bound to a concrete network protocol and message format in order to define an endpoint. Related concrete endpoints are combined into abstract endpoints, which describe a network service. WSDL is extensible, which allows the description of endpoints and their messages regardless of the message formats or network protocols that are used.

website: A group of related webpages that is hosted by a server on the World Wide Web or an intranet. Each website has its own entry points, metadata, administration settings, and workflows. Also referred to as site.

white space: A character that represents a blank space in typography and is not rendered on a screen.

WSDL operation: A single action or function of a web service. The execution of a WSDL operation typically requires the exchange of messages between the service requestor and the service provider.

X.509: An ITU-T standard for public key infrastructure subsequently adapted by the IETF, as specified in [RFC3280].

XML namespace: A collection of names that is used to identify elements, types, and attributes in XML documents identified in a URI reference [RFC3986]. A combination of XML namespace and local name allows XML documents to use elements, types, and attributes that have the same names but come from different sources. For more information, see [XMLNS-2ED].

XML namespace prefix: An abbreviated form of an XML namespace, as described in [XML].

XML schema: A description of a type of XML document that is typically expressed in terms of constraints on the structure and content of documents of that type, in addition to the basic syntax constraints that are imposed by XML itself. An XML schema provides a view of a document type at a relatively high level of abstraction.

MAY, SHOULD, MUST, SHOULD NOT, MUST NOT: These terms (in all caps) are used as defined in [RFC2119]. All statements of optional behavior use either MAY, SHOULD, or SHOULD NOT.