Understanding how search engines work

Expression Studio 4.0

Search engines use an automated program called a "spider" or a "crawler" to roam the web and collect site data. A crawler records the text in your site and the location of the text in your site. It also follows the links in your site to the target pages of those links, where it starts the process over for the new page. The data collected by the crawler is then added to the database of the search engine, where it is indexed.

When a user initiates a search, the search engine queries its database for sites that contain the terms that the user has provided. Those sites are ranked according to the search engine's algorithm, and then provided to the user in a results page.

What crawlers check

When a search-engine spider crawls your site, it collects text from the following page elements:

  • The <title> tag.

  • The <meta name="description"> tag.

  • The <meta name="keywords"> tag.

  • The heading tags (<h1> through <h6>).

  • Hyperlinks (<a> tags).

Crawlers will also collect paragraph text, although that text may not figure as prominently in the results page as the first five tag types.

What crawlers ignore

You can specify what pages and what links a crawler should not gather by using the following site tools:

  • Robots.txt, a file that lists the web pages that crawlers should ignore.

  • A noindex attribute, which tells crawlers to ignore the page that contains the noindex attribute.

  • A nofollow attribute, which tells crawlers to ignore the target page of the hyperlink that contains the nofollow attribute. (If that target page can be reached from another link, one that does not contain a nofollow attribute, the target page will be indexed.)


Different search engines use different content-ranking algorithms. Generally, the following factors determine how your page is ranked in the results page.


Search engines use the following factors to weight each instance of a search term:

  • Location   Location refers to where a term appears in a page. If a specific search term is in the <title> tag of a page on your site, search engines will generally assume that your page is very relevant to the term.

  • Frequency   Frequency refers to the number of times a term occurs in a page. If a specific search term appears more than once in a page, search engines will probably give your page a higher relevancy score.

By combining location and frequency factors, search engines can weight each instance of the search term to determine how relevant your page is. For example, a term may appear several times in a page but only in the body text. Because that term does not appear in the <title> tag or in any heading tags, a search engine may assume that the term is not the central focus of the page and may give less weight to those instances.


Authority refers to relevance granted by other sites. If your page has a high relevancy score for a term, your page ranking is higher if pages in other sites with high relevancy scores have links to your page.

Authority is also granted by the users of search engines. The more people who select your site from results page of a search engine (often referred to as "click-throughs"), the higher the authority granted to your page by that search engine.

See also

Send feedback about this topic to Microsoft. © 2011 Microsoft Corporation. All rights reserved.

Community Additions