9 out of 16 rated this helpful - Rate this topic

How to: Verify That Strings Are in Valid E-Mail Format

Updated: December 2011

The following example verifies that a string is in valid email format.

The example defines an IsValidEmail method, which returns true if the string contains a valid email address and false if it does not, but takes no other action.

To verify that the email address is valid, the IsValidEmail method calls the Regex.Replace(String, String, MatchEvaluator) method with the (@)(.+)$ regular expression pattern to separate the domain name from the email address. The third parameter is a MatchEvaluator delegate that represents the method that processes and replaces the matched text. The regular expression pattern is interpreted as follows.

Pattern

Description

(@)

Match the @ character. This is the first capturing group.

(.+)

Match one or more occurrences of any character. This is the second capturing group.

$

End the match at the end of the string.

The domain name along with the @ character is passed to the DomainMapper method, which uses the IdnMapping class to translate Unicode characters that are outside the US-ASCII character range to Punycode. The method also sets the invalid flag to True if the IdnMapping.GetAscii method detects any invalid characters in the domain name. The method returns the Punycode domain name preceded by the @ symbol to the IsValidEmail method.

The IsValidEmail method then calls the Regex.IsMatch(String, String) method to verify that the address conforms to a regular expression pattern.

Note that the IsValidEmail method does not perform authentication to validate the email address. It merely determines whether its format is valid for an email address.


using System;
using System.Globalization;
using System.Text.RegularExpressions;

public class RegexUtilities
{
   bool invalid = false;

   public bool IsValidEmail(string strIn)
   {
       invalid = false;
       if (String.IsNullOrEmpty(strIn))
          return false;

       // Use IdnMapping class to convert Unicode domain names.
       strIn = Regex.Replace(strIn, @"(@)(.+)$", this.DomainMapper);
       if (invalid) 
          return false;

       // Return true if strIn is in valid e-mail format.
       return Regex.IsMatch(strIn, 
              @"^(?("")(""[^""]+?""@)|(([0-9a-z]((\.(?!\.))|[-!#\$%&'\*\+/=\?\^`\{\}\|~\w])*)(?<=[0-9a-z])@))" + 
              @"(?(\[)(\[(\d{1,3}\.){3}\d{1,3}\])|(([0-9a-z][-\w]*[0-9a-z]*\.)+[a-z0-9]{2,17}))$", 
              RegexOptions.IgnoreCase);
   }

   private string DomainMapper(Match match)
   {
      // IdnMapping class with default property values.
      IdnMapping idn = new IdnMapping();

      string domainName = match.Groups[2].Value;
      try {
         domainName = idn.GetAscii(domainName);
      }
      catch (ArgumentException) {
         invalid = true;      
      }      
      return match.Groups[1].Value + domainName;
   }
}


In this example, the regular expression pattern ^(?("")(""[^""]+?""@)|(([0-9a-z]((\.(?!\.))|[-!#\$%&'\*\+/=\?\^`\{\}\|~\w])*)(?<=[0-9a-z])@))(?(\[)(\[(\d{1,3}\.){3}\d{1,3}\])|(([0-9a-z][-\w]*[0-9a-z]*\.)+[a-z0-9]{2,17}))$ is interpreted as shown in the following table. Note that the regular expression is compiled using the RegexOptions.IgnoreCase flag.

Pattern

Description

^

Begin the match at the start of the string.

(?("")

Determine whether the first character is a quotation mark. (?("") is the beginning of an alternation construct.

((?("")(""[^""]+?""@)

If the first character is a quotation mark, match a beginning quotation mark followed by at least one occurrence of any character other than a quotation mark, followed by an ending quotation mark. The string should conclude with an at sign (@).

|(([0-9a-zA-Z]

If the first character is not a quotation mark, match any alphabetic character from a to z or any numeric character from 0 to 9.

(\.(?!\.))

If the next character is a period, match it. If it is not a period, look ahead to the next character and continue the match. (?!\.) is a zero-width negative lookahead assertion that prevents two consecutive periods from appearing in the local part of an email address.

|[-!#\$%&'\*\+/=\?\^`\{\}\|~\w]

If the next character is not a period, match any word character or one of the following characters: -!#$%'*+=?^`{}|~.

((\.(?!\.))|[-!#\$%'\*\+/=\?\^`\{\}\|~\w])*

Match the alternation pattern (a period followed by a non-period, or one of a number of characters) zero or more times.

@

Match the @ character.

(?<=[0-9a-z])

Continue the match if the character that precedes the @ character is A through Z, a through z, or 0 through 9. The (?<=[0-9a-z]) construct defines a zero-width positive lookbehind assertion.

(?(\[)

Check whether the character that follows @ is an opening bracket.

(\[(\d{1,3}\.){3}\d{1,3}\])

If it is an opening bracket, match the opening bracket followed by an IP address (four sets of one to three digits, with each set separated by a period) and a closing bracket.

|(([0-9a-z][-\w]*[0-9a-z]*\.)+[a-z0-9]{2,17})

If the character that follows @ is not an opening bracket, match one alphanumeric character with a value of A-Z, a-z, or 0-9, followed by zero or more occurrences of a word character or a hyphen, followed by zero or one alphanumeric character with a value of A-Z, a-z, or 0-9, followed by a period. This pattern can be repeated one or more times, and should be followed by two to seventeen alphabetic (a-z, A-Z) characters. This portion of the regular expression is designed to capture the domain name.

$

End the match at the end of the string.

The IsValidEmail and DomainMapper methods can be included in a library of regular expression utility methods, or they can be included as private static or instance methods in the application class. If they are used in a regular expression library, you can call them by using code such as the following:


public class Application
{
   public static void Main()
   {
      RegexUtilities util = new RegexUtilities();
      string[] emailAddresses = { "david.jones@proseware.com", "d.j@server1.proseware.com", 
                                  "jones@ms1.proseware.com", "j.@server1.proseware.com", 
                                  "j@proseware.com9", "js#internal@proseware.com", 
                                  "j_9@[129.126.118.1]", "j..s@proseware.com", 
                                  "js*@proseware.com", "js@proseware..com", 
                                  "js@proseware.com9", "js@proseware.com9", "j.s@server1.proseware.com" };

      foreach (var emailAddress in emailAddresses) {
         if (util.IsValidEmail(emailAddress))
            Console.WriteLine("Valid: {0}", emailAddress);
         else
            Console.WriteLine("Invalid: {0}", emailAddress);
      }                                            
   }
}
// The example displays the following output:
//       Valid: david.jones@proseware.com
//       Valid: d.j@server1.proseware.com
//       Valid: jones@ms1.proseware.com
//       Invalid: j.@server1.proseware.com
//       Invalid: j@proseware.com9
//       Valid: js#internal@proseware.com
//       Valid: j_9@[129.126.118.1]
//       Invalid: j..s@proseware.com
//       Invalid: js*@proseware.com
//       Invalid: js@proseware..com
//       Invalid: js@proseware.com9
//       Valid: j.s@server1.proseware.com


Date

History

Reason

December 2011

Added IDNA support.

Customer feedback.

September 2011

Revised the regular expression to handle consecutive quotation marks.

Customer feedback.

Did you find this helpful?
(1500 characters remaining)
Community Content Add
Annotations FAQ
May not cover all valid domains (name@a.domain.com)

Unless I'm mistaken, the domain portion of this expression requires all parts to be at least 2 characters in length. The domain name of "a.domain.com" is valid but will not pass this RegEx.

Characters and Domain Names

No, you're not mistaken. "a.domain.com" is indeed valid but will be rejected by this regex. The regex also requires that the top-level domain name have no more than six characters. This also isn't a requirement, and some top-level domains (or their Punycode equivalents) do have more than six characters. We'll modify the regular expression to address this issue in the documentation refresh that is currently scheduled for late November. 

--Ron Petrusha
Common Language Runtime User Education
Microsoft Corporation

NOTE: This issue has been addressed. As of December 2011, the regular expression no longer requires that each element of a domain name have at least two elements.

Email Regex that DOES include unicode domains

I was wondering if anybody has found a solution that validates an email that includes unicode characters as in from a unicode domain? I have searched at length and have yet to find a solution that works.

Thanks in advance!

Handling Internationalized Domain Names

The easiest way to handle internationalized domain names (domain names with Unicode characters outside the ASCII range) is to use the IdnMapping class in the .NET Framework to convert the internationalized domain name to Punycode. We'll update the regular expression to support internationalized domain names for the documentation refresh in late November. We plan to update the first portion of the regular expression to validate email aliases that contain Unicode characters outside the ASCII character range sometime early next year.

--Ron Petrusha
Common Language Runtime User Education
Microsoft Corporation

NOTE: IDNA support (support for Unicode characters outside the US ASCII range) has been added to the regular expression as of December 2011.

Good example how not verify email address format

This would be valid if all domains and mailboxes were ASCII-encoded.

The problem is, non-ASCII (Unicode, that is) domain names (including Top Level Domain names) are here to stay. There are quite a few (i.e. Chinese or Arabic) TLD's and soon it would be many more. Right now all we can say about email address format is that it would consist of one or more Unicode characters, followed by AT sign (@) which is followed by one or more Unicode characters, a dot (domain labels separator) and then either another set of labels or top level domain. And of course top level domain could be one (right now it seems that it is at least two but I would not be so sure about the future) or more Unicode characters.

I18n is sometimes pain in the back.

Unicode and Domain Names 

This regular expression indeed does not support non-ASCII characters in domain names, and they are certainly becoming more and more common. We will modify this example (using the IdnMapping class as part of the IsValidEmail method) to support Unicode domain names outside the ASCII character range. See my response to the "Email Regex that DOES include unicode domains" post.

By the way, thanks to everyone for pointing out the lack of Unicode support. Without your feedback, we would have overlooked a rather glaring limitation.

--Ron Petrusha
Common Language Runtime User Education

Microsoft Corporation


NOTE: IDNA support (support for Unicode characters outside the US ASCII range) has been added to the regular expression as of December 2011.

Why Doesn't System.Net.MailMessage Support This?

Why not expose this (or the exact analogous logic used internally by System.Net.MailMessage's contructor) as a public (static?) method of  System.Net.MailMessage?  (E.g.,  System.Net.MailMessage.IsValidEmailAddress(string).)   So that we (1) know that we're getting the exact same logic that  System.Net.MailMessage uses to accept/reject email address strings, and (2) we don't have to depend on exception-throwing to do so.

Think Int32.TryParse, but for email address formatted strings.

System.Net.MailMessage and Regular Expressions

If you want to validate a email address using classes in the System.Net namespace, the best choice is probably the MailAddress class rather than the MailMessage class. The MailAddress class specfically represents an email address, and its constructor throws a FormatException if the parameter that represents an email address is invalid. However, it does not rely on regular expressions to validate an email address.

The value of this regex (despite limitations that developers have continually pointed out and that we have tried to address) is that it offers an alternative if you choose not to use the MailAddress for one reason or another (such as that throwing an exception is too expensive for your application). And it offers a fairly complex regex pattern that we've tried to document and explain in detail for developers who would like to learn more about building complex regular expression patterns. 

--Ron Petrusha
Common Language Runtime User Education
Microsoft Corporation

Emails with underscore get rejected
Example $0$0 $0 $0test_@domain.com$0
Verify Email address sample recoded in PowerShell
<#
.SYNOPSIS
This script validates email addresses based on
MSFT published Regular Expression. This is a
re-write with PowerShell of an existing bit of
MSDN sample code
.DESCRIPTION
This script first creates a function to validate
an email address. It uses a large regex that is
documented at the MSDN page noted below. The script
then creates an array of email addreses and then
validates them against the function and displays
the results.
.NOTES
File Name : Confirm-ValidEmailAddress.ps1
Author : Thomas Lee - tfl@psp.co.uk
Requires : PowerShell Version 2.0
.LINK
This script posted to:
http://www.pshscripts.blogspot.com
MSDN sample posted to:
http://msdn.microsoft.com/en-us/library/01escwtf.aspx
.EXAMPLE
Valid: david.jones@proseware.com
Valid: d.j@server1.proseware.com
Valid: jones@ms1.proseware.com
Invalid: j.@server1.proseware.com
Invalid: j@proseware.com9
Valid: js#internal@proseware.com
Valid: j_9@[129.126.118.1]
Invalid: j..s@proseware.com
Invalid: js*@proseware.com
Invalid: js@proseware..com
Invalid: js@proseware.com9
Valid: j.s@server1.proseware.com
Valid: tfl@psp.co.uk
Valid: cuddly.penguin@cookham.net
#>

Function IsValidEmail {
Param ([string] $In)
# Returns true if In is in valid e-mail format.
[system.Text.RegularExpressions.Regex]::IsMatch($In,
"^(?("")(""[^""]+?""@)|(([0-9a-zA-Z]((\.(?!\.))|[-!#\$%&'\*\+/=\?\^`\{\}\|~\w])*)(?<=[0-9a-zA-Z])@))" +
"(?(\[)(\[(\d{1,3}\.){3}\d{1,3}\])|(([0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,6}))$");
} # End of IsValidEmail

[string[]] $emailAddresses = "david.jones@proseware.com", "d.j@server1.proseware.com",
"jones@ms1.proseware.com", "j.@server1.proseware.com",
"j@proseware.com9", "js#internal@proseware.com",
"j_9@[129.126.118.1]", "j..s@proseware.com",
"js*@proseware.com", "js@proseware..com",
"js@proseware.com9", "j.s@server1.proseware.com",
"tfl@psp.co.uk", "cuddly.penguin@cookham.net"

ForEach ($emailAddress in $emailAddresses) {
if (IsValidEmail($emailAddress)) {
"Valid: {0}" -f $emailAddress
}
else {
"Invalid: {0}" -f $emailAddress
}
}
One easy solution

Due to people run into this page just to check email addresses and not necessarily search RegEx samples, I would like to share the shortest solution: 

   try
   {
var addr = new System.Net.Mail.MailAddress(email);
// Valid address
   }
catch
   {
// The address is invalid
   }

I reckon it wouldn't work slower and definitely will be more robust solution. I used Reflector to have a look into the code and found out that you gays implemented gentle parsing without regular expressions. May be this is the key how to meet RFC ( RFC 822, 2821, 2822, 3696, etc.) :)

 

Better ASCII Validator

I found a better validator posted here: http://www.rhyous.com/2010/06/15/regular-expressions-in-cincluding-a-new-comprehensive-email-pattern/. It allows a single character subdomain.

- Paul.