How to Optimize SharePoint Server 2007 Web Content Management Sites for Search Engines

Summary: Learn techniques to create Microsoft Office SharePoint Server 2007 Web content management (WCM) sites that are optimized for search engines, and what to avoid to help improve search ratings. (5 printed pages)

Avneesh Kaushik, Microsoft Global Services India (MGSI)

July 2008

Applies to: Microsoft Office SharePoint Server 2007

Contents

  • Overview of Search Engine Optimizations

  • Search Visibility in SharePoint Server

  • Welcome Page Redirect

  • <Meta> Tags

  • Enterprise Search in SharePoint Server for an Internet-Facing WCM Site

  • Conclusion

  • About the Author

  • Additional Resources

Overview of Search Engine Optimizations

Search engine optimization is the process of optimizing sites and pages for search engines to result in better relevance and ranking for the site. The standard best practices or rules for search engine optimization that you might apply for Web pages are equally relevant and important for Web content management (WCM) sites you develop in Microsoft Office SharePoint Server 2007. A page or site that is not optimized properly for search engines can appear much lower is the search results or have lower rank, which can result in loss of traffic to the site. A site or page that is better optimized for search engines appears higher in the search results and will help increase traffic to your site.

Note

This article does not address how to configure Enterprise Search in Microsoft Office SharePoint Server 2007.

Improvements for Ranking

You can improve the ranking of documents on your WCM sites in the following ways:

  • Include keywords, key phrases, and a description that reflects the page’s content.

  • Use the Robots Exclusion Standard, for example, robots.txt (see About /robots.txt).

  • Place your content as high up in the page (HTML) as possible to get it more relevance. A typical page has a lot of content (HTML) and code (ECMAScript—JScript or JavaScript—or XML) in it, and this mix of content and code can affect how a search engine processes the page.

  • Use proper semantic code (appropriate HTML elements) such as headlines. For example, use headline elements tagged as <h1> to <h6>. Use proper HTML list items; for example, ordered, unordered, or definition lists tagged with <ol>, <ul>, or <dl> tags. Use alt and title attributes with image (<img>) tags. Use a Favorites icon for bookmarking and to keep your error log clean. For more information, see How to Add a Shortcut Icon to a Web Page.

  • Use the Platform for Internet Content Selection (PICS) specification (see W3C and ICRA tools) to provide a signal to search engines and filter programs that your Web site is safe for children.

  • Consider keyword density. Because of a general misuse of HTML keyword <meta> tags, current crawlers compare the number of times a word or a phrase appears in an HTML page to the number of times it appears in its <meta> tag to properly determine its relevancy. Using too few or too many keywords can have a negative effect.

  • Use descriptive text in your hyperlinks.

  • Consider the levels of directories or subsites in a path to a Web page, or the URL depth.

  • Use descriptive page titles.

  • Build some automated process in Office SharePoint Server 2007 to create or update the sitemap file for a WCM site. The sitemap file is a simple XML file that contains URLs for the site’s pages and some metadata to help search engines crawl or discover pages in a site (for more information, see What are Sitemaps?). You can achieve this by using custom workflows or by customizing a default SharePoint publishing workflow.

  • Use valid HTML or XHTML. Even if most ASP.NET 2.0 controls are XHTML-compliant, Office SharePoint Server 2007 and SharePoint controls are not XHTML-compliant. However, considering XHTML when you are designing master pages and page layouts will always help. If you need to get compliant output from controls that are not compliant, you can refer to Scott Guthrie's article CSS Control Adapter Toolkit for ASP.NET 2.0 for possible options. You can also code all custom Web Parts and field controls to be XHTML-compliant.

Practices to Avoid

Try to avoid doing the following in the Web pages for your WCM site:

  • Naming all pages in the site with the same page title.

  • Including a specific keyword or keywords or a phrase too often in the <meta> tags or content of a Web page, also called keyword stuffing. In this scenario, the crawler might determine that these keywords or phrases are suspect, and they might be discarded when the search engine is calculating relevance.

  • Using hidden text to fill a page with keywords that a search engine can recognize but that are not visible to a visitor.

  • Using complex URLs, in which a page is multiple levels deep in a site (for example, http://someserver.com/subsite/pages/somepage.aspx) might not be easily crawled. You can use a combination of a URL rewriter and a sitemap file to address this in Office SharePoint Server. In addition, this unwanted behavior on the part of the crawler highlights the importance of using proper site structure (which is part of the information architecture).

  • Using temporary redirects (this can be a significant issue with a SharePoint landing page). For more information, see Welcome Page Redirect.

  • Using complex pages. Take care to keep the pages in a SharePoint site simple, minimizing the use of items such as inline styles or ECMAScript (JScript, JavaScript). The more elements that a page contains, the more difficult it can be for the crawlers to process the page properly.

Search Visibility in SharePoint Server

By default, the setting for search visibility is on. However, you can set the options for search visibility to "no", which will exclude that site from search results. control what is crawled (and not crawled) on your site. To exclude the site from third–party search engines or crawlers, add <noindex> and <nofollow> <meta> tags in the master page or by using the MetaTagsGenerator control, available on the SharePoint 2007 WCM Utilities page on CodePlex.

Figure 1. Setting search visibility options

Setting search visibility options

Welcome Page Redirect

The landing page or welcome page in Office SharePoint Server 2007 is a 302 redirect page. This can cause problems, as pages with temporary redirects can affect the page’s ranking adversely. If you must use redirection, it is always a better to use 301 permanent redirects. Unfortunately, it is not possible to change this behavior in Office SharePoint Server; however, you can use any URL rewriter and make your welcome or landing page a permanent redirect.

Following is a sample HTTPModule that you can add to the Office SharePoint Server application to handle 302 redirects for the top-level Web site (root) site collection. You must write more complex conditions to handle subsites, and additional code to discover landing or welcome pages for subsites.

If you use a URL rewriter, you need to add only a line to the web.config file for each subsite.

To create the HTTPModule

  1. Create a class library project named MOSSRedirectModule1.

  2. Add the following code to a new class named HTTPModule.cs.

    using System;
    using System.Collections;
    using System.Web;
    using Microsoft.SharePoint;
    using Microsoft.SharePoint.Publishing;
    namespace MOSSRedirectModule
    {
       public class HttpModule : System.Web.IHttpModule {
          /// <summary>
          /// Initializes an instance of this class.
          /// </summary>
          public HttpModule() {      }
          /// <summary>
          /// Disposes of any resources used.
          /// </summary>
          public void Dispose() {      }
          /// <summary>
          /// Initializes the module by hooking the application's BeginRequest event.
          /// </summary>
          /// <param name="context">The HttpApplication this module is bound to.</param>
          public void Init(System.Web.HttpApplication context) {
             context.BeginRequest += new EventHandler(context_BeginRequest);
          }
    
          void context_BeginRequest(object sender, EventArgs e) {
             HttpRequest Request = HttpContext.Current.Request;
             HttpResponse Response = HttpContext.Current.Response;
             String ResponsePath = null;
             if (Request.Url.OriginalString == "http://server:port/")//Replace with code to read server from web.config.
                {
                    SPSecurity.RunWithElevatedPrivileges(delegate() {
                    SPSite siteCollection = new SPSite("http:// server:port/");
                       SPWeb rootWebSite = siteCollection.RootWeb;
                       SPWeb webSite = siteCollection.OpenWeb();
                       PublishingWeb PubWeb = PublishingWeb.GetPublishingWeb(webSite);
                       ResponsePath = PubWeb.DefaultPage.Name;
                    });
                    RedirectPermanent("http://server:port/Pages/" + ResponsePath);
             }
          }
    
          private void RedirectPermanent(string ResponsePath) {
             if (!String.IsNullOrEmpty(ResponsePath)) {
                HttpResponse Response = HttpContext.Current.Response;
                Response.StatusCode = 301;
                Response.StatusDescription = "Moved Permanently(1)";
                Response.RedirectLocation = ResponsePath;
                Response.Write("html");
                Response.End();
             }
          }
       }
    }
    

To configure the HTTPModule

  1. Copy the MOSSRedirectModule DLL to the app_bin directory of the application or to the global assembly cache. You must give the DLL a strong name before you can add it to the global assembly cache. For more information, see Signing an Assembly with a Strong Name.

  2. Add the following tags to the web.config file for every front-end Web server in the farm.

    <compilation batch="false" debug="false">
        <assemblies>
    <add assembly="MOSSRedirectModule, Version=1.0.0.0, Culture=neutral, PublicKeyToken=xxxxxx"/>
        </assemblies>
    </compilation>
    <httpModules>
        <clear />
    <add name="HttpModule" type="MOSSRedirectModule.HttpModule" />
    </httpModules>
    

After you finish these steps, you can use a tool such as Fiddler to verify whether the module is working properly.

<Meta> Tags

There are a few page properties which can affect how your page ranks in a search engine. When you create a page, you can set Title and Description properties that appear on the final page as <Description> and <title> <meta> tags, described earlier in this document.

Missing in Office SharePoint Server is the ability to add any other custom tag, such as Keywords, by default. You can achieve the same goal by using the MetaTagsGenerator control, available on the SharePoint 2007 WCM Utilities page on CodePlex. Keywords no longer play as important a role in relevance because of the extensive misuse of this approach by webmasters, and the modern crawler looks at options such as keyword density to calculate relevance. Most crawlers ignore the <keywords> tag completely.

Figure 2. Set Title and Description properties

Set Title and Description properties

Enterprise Search in SharePoint Server for an Internet-Facing WCM Site

You should be aware of a few things when configuring Enterprise Search in Microsoft Office SharePoint Server 2007. One consideration is that it will take some amount of work to configure Enterprise Search to work with forms authentication sites. You must extend the site and configure Integrated Windows authentication, as the search crawler needs Integrated Windows authentication to work. The default authentication should be Integrated Windows authentication, and the extended zone should be forms authentication. Office SharePoint Server can change the URL port to show the correct URL when content is crawled by using Windows authorization, and is searched from a forms authentication-extended site. For example, if you crawl a site such as http://someserver:81, and then try to search the content for the site from a forms authentication extended site such as http://someserver:82, the results will show port 82, not 81.

It is possible to crawl forms authentication sites with Microsoft Office SharePoint Server Service Pack 1 (SP1); however, it is a simple crawl that does not provide any security or rich metadata information. For more information, see Prepare to Crawl Host-Named Sites that Use Forms Authentication, and "Crawling Content" in Forms Authentication in SharePoint Products and Technologies (Part 3): Forms Authentication vs. Windows Authentication.

Another thing to keep in mind when setting up a content source that points to an Internet-facing site (accessed by using http://www.someURL.com) is that you should add a rule to crawl the SharePoint site as a normal HTTP site. If you do not, the crawler will try to crawl it as a SharePoint site. The crawler will then have trouble because of the lack of Integrated Windows authentication capabilities in most Internet-facing sites.

Conclusion

With the increase in popularity of Microsoft Office SharePoint Server 2007 as a platform for Internet-facing sites, it has become very important to optimize Web pages and sites for search engines. Office SharePoint Server 2007 provides many options to craft and adjust pages to facilitate crawling, but you need focused effort and awareness to do the right things in the right way. Otherwise, you can affect a site's rankings adversely, and impact the success of the site. By implementing the tips for search engine optimization presented in this article, you can improve a site's ranking in search results.

About the Author

Avneesh Kaushik is a senior consultant working with Microsoft Global Services India (MGSI) in the IW/Collaboration service line. He has helped design and deploy the Hawaiian Airlines WCM site.

Additional Resources

For more information, see the following resources: