A character class defines a set of characters. Character class subtraction yields a set of characters that is the result of excluding the characters in one character class from another character class.
A character class subtraction expression has the following form:
[base_group-[excluded_group]]
The square brackets ([]) and hyphen (-) are mandatory. The base_group is a positive or negative character group as described in the Character Class Syntax table. The excluded_group component is another positive or negative character group, or another character class subtraction expression (that is, you can nest character class subtraction expressions).
For example, suppose you have a base group that consists of the character range from "a" through "z". To define the set of characters that consists of the base group except for the character "m", use [a-z-[m]]. To define the set of characters that consists of the base group except for the set of characters "d", "j", and "p", use [a-z-[djp]]. To define the set of characters that consists of the base group except for the character range from "m" through "p", use [a-z-[m-p]].
Consider the nested character class subtraction expression, [a-z-[d-w-[m-o]]]. The expression is evaluated from the innermost character range outward. First, the character range from "m" through "o" is subtracted from the character range "d" through "w", which yields the set of characters from "d" through "l" and "p" through "w". That set is then subtracted from the character range from "a" through "z", which yields the set of characters [abcmnoxyz].
You can use any character class with character class subtraction. To define the set of characters that consists of all Unicode characters from \u0000 through \uFFFF except white-space characters (\s), the characters in the punctuation general category (\p{P}), the characters in the IsGreek named block (\p{IsGreek}), and the Unicode NEXT LINE control character (\x85), use [\u0000-\uFFFF-[\s\p{P}\p{IsGreek}\x85]].
Choose character classes for a character class subtraction expression that will yield useful results. Avoid an expression that yields an empty set of characters, which cannot match anything, or an expression that is equivalent to the original base group. For example, the empty set is the result of the expression [\p{IsBasicLatin}-[\x00-\x7F]], which subtracts all characters in the IsBasicLatin character range from the IsBasicLatin general category. Similarly, the original base group is the result of the expression [a-z-[0-9]]. This is because the base group, which is the character range of letters from "a" through "z", does not contain any characters in the excluded group, which is the character range of decimal digits from "0" through "9".
The following example defines a regular expression, ^[0-9-[2468]]+$, that matches zero and odd numbers in an input string. The regular expression is interpreted as shown in the following table.
Element | Description |
|---|
^ | Begin the match at the start of the input string. |
[0-9-[2468]]+ | Match one or more occurrences of any character from 0 to 9 except for 2, 4, 6, and 8. In other words, match one or more occurrences of zero or an odd number. |
$ | End the match at the end of the input string. |
Imports System.Text.RegularExpressions
Module Example
Public Sub Main()
Dim inputs() As String = { "123", "13579753", "3557798", "335599901" }
Dim pattern As String = "^[0-9-[2468]]+$"
For Each input As String In inputs
Dim match As Match = Regex.Match(input, pattern)
If match.Success Then Console.WriteLine(match.Value)
Next
End Sub
End Module
' The example displays the following output:
' 13579753
' 335599901
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string[] inputs = { "123", "13579753", "3557798", "335599901" };
string pattern = @"^[0-9-[2468]]+$";
foreach (string input in inputs)
{
Match match = Regex.Match(input, pattern);
if (match.Success)
Console.WriteLine(match.Value);
}
}
}
// The example displays the following output:
// 13579753
// 335599901
Back to Top