Internationalized Domain Name Support in Internet Explorer 7

Eric Lawrence
Microsoft Corporation

September 26, 2006

Introduction

Browser support for navigating to URLs written in users' native languages is critical for making the Internet truly international. Internet Explorer 7 permits navigation to Internationalized Domain Names (IDN) composed of Unicode characters from all of the world's languages. IDN relies upon a standardized mechanism known as "Punycode" for encoding Unicode domain names using only the ASCII characters that are permitted by the global DNS system.

This change has implications for legacy URL compatibility, because earlier versions of Internet Explorer encoded domain names in different formats (ANSI and UTF-8).

In order to mitigate the security threat posed by Unicode look-alike characters, Internet Explorer may render some existing IDN names in the encoded format to help prevent spoofing attacks.

The remainder of this article will help you understand how to address the compatibility impact of this feature and how to generate URLs which can be correctly navigated in Internet Explorer 7.

Understanding the Compatibility Impact

End Users and Network Administrator

As a user of Internet Explorer, you may experience compatibility impact of IDN support in the following ways:

  • Symptom: Internet Explorer shows the Web address in encoded format (starting with "xn--") and an Information Bar appears. The Information Bar contains the text: "This Web address contains letters or symbols that cannot be displayed with the current language settings."

    Cause: To protect the user against being "spoofed" by a misleading domain name, Internet Explorer will display the encoded format when the domain name contains characters not used by the user's list of preferred content languages.

    Look-alike attacks (sometimes called "homograph" attacks) are possible within the ASCII character set (the usual examples are www.example.com vs. www.examp1e.com, where the second URL replaces the letter L with the number 1). But, with IDN, the character set expands from a few dozen characters to many thousands of characters from all of the world's languages, thereby increasing the risk of spoofing attacks.

    Workaround: Workarounds: You may add languages to the set of configured languages by clicking on the Information Bar or by choosing Tools | Internet Options and clicking the Languages button on the General tab. Note that the order of configured languages is important, so you should ensure that your preferred content language appears first in the list.

    Network administrators may choose to disable the information bar by setting the DWORD named DisableIDNPrompt to 1 under HKLM or HKCU \Software\Policies\Microsoft\Windows\CurrentVersion\Internet Settings.

  • Symptom: When navigating to an intranet Web site where the domain name contains Unicode characters, the browser may fail to find the site and instead show an HTTP/404 error page.

    Cause: Previous versions of Internet Explorer used a different URL format when representing intranet domain names containing non-ASCII characters. Some environments may not yet support the new IDN Punycode standard and hence some sites may not be reachable using the IDN Punycode address format.

    Workaround: You may revert to IE6 handling of Unicode domain names using the Internet Control panel. Choose Tools | Internet Options. On the Advanced tab, scroll to the International section. Uncheck Send IDN server names and/or Send IDN server names through proxy connections depending on whether the target domain is reached via proxy server.

    Network administrators may choose to disable Punycode and revert to IE6 behavior by setting the DWORD named EnablePunycode under HKLM or HKCU\Software\Microsoft\Windows\CurrentVersion\Internet Settings.

    • If the value is 0, then Punycode is never used.
    • If the value is 1, then Punycode is used when talking directly to origin servers.
    • If the value is 2, then Punycode is used when talking to a proxy server.
    • If the value is 3 (the default), then Punycode is used when talking to both origin and proxy servers.

Web site Developer

As a Web site developer, you may experience compatibility impact of IDN support in the following ways:

  • Symptom: Internet Explorer shows the Web address in encoded format (starting with "xn--") and an Information Bar appears. The Information Bar contains the text: "This Web address contains letters or symbols that cannot be displayed with the current language settings."

    Cause: Encoded addresses cannot be visually confused with non-encoded addresses and this feature makes it difficult for a fraudster to launch a "phishing" attack which attempts to trick a user into revealing confidential information.

    Look-alike attacks (sometimes called "homograph" attacks) are possible within the legacy ASCII domain name character set (the usual examples are www.example.com vs. www.examp1e.com). But, with IDN, the character set expands from a few dozen characters to many thousands of characters from all of the world's languages, thereby increasing the risk of spoofing attacks.

    Internet Explorer includes a number of restrictions on allowable IDN addresses in order to protect the user from spoofing attacks.

    A domain name is displayed in encoded form if any of the following are true:

    1. The domain name contains characters outside of the user's chosen languages. Note that ASCII-only labels are always permitted for compatibility with existing sites.

    2. The domain name contains characters which are not part of any language.

    3. Any one of the labels contains a mix of scripts that do not appear together within a single language. For instance, Greek characters cannot mix with Cyrillic within a single label. (A label is a segment of a domain name, delimited by dots. www.microsoft.com contains three labels, "www", "microsoft" and "com".) This restriction helps to prevent attacks where a fraudster registers a domain name which is identical to a well-known domain name, except that (for instance) a Latin "a" character has been replaced with a visually-identical Cyrillic "a" character.

    Workaround: Ensure that your domain name does not contain characters from languages within a single label. If you need to use multiple languages, use one label per language.

    Ensure that the domain name is written using characters from the language your target audience is most likely to have configured in their browser.

    Many Web site owners choose to use IDN domain names as redirects to ASCII-based domain names. ASCII urls are reachable in all browser versions, and such domain names do not show an Information bar in any locale.

  • Symptom: When using scripting to retrieve URL properties on DHTML objects, the URL may be unexpectedly returned as Unicode.

    Cause: URL properties are converted into Unicode form when assigned to object model properties. This is particularly relevant if your code attempts to perform comparisons between URL strings and strings elsewhere in script code.

    Workaround: Ensure URL handling routines are written robustly to handle differences in domain name encoding. In particular, ensure that you use Unicode rather than Punycode when comparing JScript strings to URLs. Test scripts with all expected combinations of browser languages to ensure that URL handling code was not written with incorrect assumptions.

Application Developer

As an application developer, you may experience compatibility impact of IDN support in the following ways:

  • Symptom: When using WININET to connect to a Web site where the domain name contains Unicode characters, the networking component may fail to connect to the site.

    Cause: Previous versions of WININET used a different URL format when representing intranet domain names containing non-ASCII characters. Some environments may not yet support the new IDN Punycode standard and hence some sites may not be reachable using the IDN Punycode address format.

    Workaround: Before making an HTTP request, configure IDN using the InternetSetOption function.

    • When dwIDNSettings is set to 0, then Punycode is not used.
    • If the value is 1, then Punycode is used when talking directly to origin servers.
    • If the value is 2, then Punycode is used when talking to a proxy server.
    • If the value is 3 (the default), then Punycode is used when talking to both origin and proxy servers.
  • Symptom: When using code to retrieve URL properties on DHTML objects, the URL may be unexpectedly returned in PunyCode or Unicode, depending on the user's settings.

    Cause: To help prevent spoofing attacks in the Internet Explorer UI, URL properties are converted into their display form when assigned to object model properties. This is particularly relevant if your code attempts to perform comparisons between URL strings and strings elsewhere in code.

    Workaround: Ensure URL handling routines are written robustly to handle differences in domain name encoding.  In particular, ensure that you use Unicode rather than Punycode when comparing strings to URLs.  Test code with all expected combinations of browser languages to ensure that URL handling code was not written with incorrect assumptions.  When writing native code applications, use the IURI API to parse the URL components.

How you can take advantage of IDN Support

End User

  • Internet Explorer now permits navigation to International Domain Names written with Unicode characters from all of the world's languages.

  • If you do not need to navigate to International Domain Names, you can enhance security by setting the Always show encoded addresses option in the International section of the Advanced tab in the Internet Control panel. When this option is set, all IDN Web addresses will be shown in encoded form.

Network Administrator

  • If your users do not need to navigate to International Domain Names, you may choose to force all IDN Web addresses to appear in encoded form by setting the DWORD named ShowPunycode to 1 under HKLM or HKCU \Software\Policies\Microsoft\Windows\CurrentVersion\Internet Settings.

Web site Developer

  • By formatting your International domain names in Punycode, you can ensure that users of pre-IE7 browsers can navigate to your sites. In Internet Explorer 7, such links will appear in Unicode if the characters are permitted by the user's configured display language.

Application Developer

Eric Lawrence is a program manager on the Internet Explorer team.