
Updated: July 2010
Represents text as a series of Unicode characters.
Assembly: mscorlib (in mscorlib.dll)
A string is a sequential collection of Unicode characters that is used to represent text. A String object is a sequential collection of System.Char objects that represent a string. The value of the String object is the content of the sequential collection, and that value is immutable (that is, it is read-only). For more information about the immutability of strings, see the Immutability and the StringBuilder Class section in this topic.
This topic includes the following sections:
Instantiating a String Object
Char Objects and Unicode Characters
Strings and Embedded Null Characters
Strings and Indexes
Null Strings and Empty Strings
Immutability and the StringBuilder Class
Ordinal vs. Culture-Sensitive Operations
Normalization
String Operations by Category
Instantiating a String Object
You can instantiate a String object in the following ways:
By assigning a string literal to a String variable. This is the most commonly used method for creating a string. The following example uses assignment to create several strings. Note that in C#, because the backslash (\) is an escape character, literal backslashes in a string must be escaped or the entire string must be @-quoted.
Dim string1 As String = "This is a string created by assignment." Console.WriteLine(string1) Dim string2 As String = "The path is C:\PublicDocuments\Report1.doc" Console.WriteLine(string2) ' The example displays the following output: ' This is a string created by assignment. ' The path is C:\PublicDocuments\Report1.doc
By calling a String class constructor. The following example instantiates strings by calling several class constructors. Note that some of the constructors include pointers to character arrays or signed byte arrays as parameters. Visual Basic does not support calls to these constructors.
Dim chars() As Char = { "w"c, "o"c, "r"c, "d"c } ' Create a string from a character array. Dim string1 As New String(chars) Console.WriteLine(string1) ' Create a string that consists of a character repeated 20 times. Dim string2 As New String("c"c, 20) Console.WriteLine(string2) ' The example displays the following output: ' word ' cccccccccccccccccccc
By using the string concatenation operator (+ in C# and & or + in Visual Basic) to create a single string from any combination of String instances and string literals. The following example illustrates the use of the string concatenation operator.
Dim string1 As String = "Today is " + Date.Now.ToString("D") + "." Console.WriteLine(string1) Dim string2 As String = "This is one sentence. " + "This is a second. " string2 += "This is a third sentence." Console.WriteLine(string2) ' The example displays output like the following: ' Today is Tuesday, July 06, 2011. ' This is one sentence. This is a second. This is a third sentence.
By retrieving a property or calling a method that returns a string. The following example uses the methods of the String class to extract a substring from a larger string.
By calling a formatting method to convert a value or object to its string representation. The following example uses the Composite Formatting feature to embed the string representation of two objects into a string.
Dim dateAndTime As DateTime = #07/06/2011 7:32:00AM# Dim temperature As Double = 68.3 Dim result As String = String.Format("At {0:t} on {0:D}, the temperature was {1:F1} degrees Fahrenheit.", dateAndTime, temperature) Console.WriteLine(result) ' The example displays the following output: ' At 7:32 AM on Wednesday, July 06, 2011, the temperature was 68.3 degrees Fahrenheit.
Char Objects and Unicode Characters
Each character in a string is defined by a Unicode scalar value, also called a Unicode code point or the ordinal (numeric) value of the Unicode character. Each code point is encoded by using UTF-16 encoding, and the numeric value of each element of the encoding is represented by a Char object.
A single Char object usually represents a single code point; that is, the numeric value of the Char equals the code point. For example, the code point for the character "a" is U+0061. However, a code point might require more than one encoded element (more than one Char object). The Unicode standard defines three types of characters that correspond to multiple Char objects: graphemes, Unicode supplementary code points, and characters in the supplementary planes.
A grapheme is represented by a base character followed by one or more combining characters. For example, the character ä is represented by a Char object whose code point is U+0062 followed by a Char object whose code point is U+0308. This character can also be defined by a single Char object that has a code point of U+00E4. As the following example shows, a culture-sensitive comparison for equality indicates that these two representations are equal, although an ordinary ordinal comparison does not. However, if the two strings are normalized, an ordinal comparison also indicates that they are equal. (For more information on normalizing strings, see the Normalization section.)
Imports System.Globalization Imports System.IO Module Example Public Sub Main() Dim sw As New StreamWriter(".\graphemes.txt") Dim grapheme As String = ChrW(&H0061) + ChrW(&h0308) sw.WriteLine(grapheme) Dim singleChar As String = ChrW(&h00e4) sw.WriteLine(singleChar) sw.WriteLine("{0} = {1} (Culture-sensitive): {2}", grapheme, singleChar, String.Equals(grapheme, singleChar, StringComparison.CurrentCulture)) sw.WriteLine("{0} = {1} (Ordinal): {2}", grapheme, singleChar, String.Equals(grapheme, singleChar, StringComparison.Ordinal)) sw.WriteLine("{0} = {1} (Normalized Ordinal): {2}", grapheme, singleChar, String.Equals(grapheme.Normalize(), singleChar.Normalize(), StringComparison.Ordinal)) sw.Close() End Sub End Module ' The example produces the following output: ' ä ' ä ' ä = ä (Culture-sensitive): True ' ä = ä (Ordinal): False ' ä = ä (Normalized Ordinal): True
A Unicode supplementary code point (a surrogate pair) is represented by a Char object whose code point is a high surrogate followed by a Char object whose code point is a low surrogate. The code points of high surrogates range from U+D800 to U+DBFF. The code points of low surrogates range from U+DC00 to U+DFFF. Surrogate pairs are used to represent characters in the 16 Unicode supplementary planes. The following example creates a surrogate character and passes it to the Char.IsSurrogatePair(Char, Char) method to determine whether it is a surrogate pair.
Module Example Public Sub Main() Dim surrogate As String = ChrW(&hD800) + ChrW(&hDC03) For ctr As Integer = 0 To surrogate.Length - 1 Console.Write("U+{0:X2} ", Convert.ToUInt16(surrogate(ctr))) Next Console.WriteLine() Console.WriteLine(" Is Surrogate Pair: {0}", Char.IsSurrogatePair(surrogate(0), surrogate(1))) End Sub End Module ' The example displays the following output: ' U+D800 U+DC03 ' Is Surrogate Pair: True
Strings and Embedded Null Characters
In the .NET Framework, a String object can include embedded null characters, which count as a part of the string's length. However, in some languages such as C and C++, a null character indicates the end of a string; it is not considered a part of the string and is not counted as part of the string's length. This means that the following common assumptions that C and C++ programmers or libraries written in C or C++ might make about strings are not necessarily valid when applied to String objects:
The value returned by the strlen or wcslen functions does not necessarily equal String.Length.
The string created by the strcpy_s or wcscpy_s functions is not necessarily identical to the string created by the String.Copy method.
You should ensure that native C and C++ code that instantiates String objects, and code that is passed String objects through platform invoke, do not assume that an embedded null character marks the end of the string.
Embedded null characters in a string are also treated differently when a string is sorted (or compared) and when a string is searched. Null characters are ignored when performing culture-sensitive comparisons between two strings, including comparisons using the invariant culture. They are considered only for ordinal or case-insensitive ordinal comparisons. On the other hand, embedded null characters are always considered when searching a string with methods such as Contains, StartsWith, and IndexOf.
Strings and Indexes
An index is the position of a Char object (not a Unicode character) in a String. An index is a zero-based, nonnegative number that starts from the first position in the string, which is index position zero. A number of search methods, such as IndexOf and LastIndexOf, return the index of a character or substring in the string instance.
The Chars property lets you access individual Char objects by their index position in the string. Because the Chars property is the default property (in Visual Basic) or the indexer (in C#), you can access the individual Char objects in a string by using code such as the following. This code looks for white space or punctuation characters in a string to determine how many words the string contains.
Module Example Public Sub Main() Dim s1 As String = "This string consists of a single short sentence." Dim nWords As Integer = 0 s1 = s1.Trim() For ctr As Integer = 0 To s1.Length - 1 If Char.IsPunctuation(s1(ctr)) Or Char.IsWhiteSpace(s1(ctr)) nWords += 1 End If Next Console.WriteLine("The sentence{2} {0}{2}has {1} words.", s1, nWords, vbCrLf) End Sub End Module ' The example displays the following output: ' The sentence ' This string consists of a single short sentence. ' has 8 words.
Because the String class implements the IEnumerable interface, you can also iterate through the Char objects in a string by using a foreach construct, as the following example shows.
Module Example Public Sub Main() Dim s1 As String = "This string consists of a single short sentence." Dim nWords As Integer = 0 s1 = s1.Trim() For Each ch In s1 If Char.IsPunctuation(ch) Or Char.IsWhiteSpace(ch) Then nWords += 1 End If Next Console.WriteLine("The sentence{2} {0}{2}has {1} words.", s1, nWords, vbCrLf) End Sub End Module ' The example displays the following output: ' The sentence ' This string consists of a single short sentence. ' has 8 words.
Consecutive index values might not correspond to consecutive Unicode characters, because a Unicode character might be encoded as more than one Char object. To work with Unicode characters instead of Char objects, use the System.Globalization.StringInfo and TextElementEnumerator classes. The following example illustrates the difference between code that works with Char objects and code that works with Unicode characters. It compares the number of characters or text elements in each word of a sentence. The string includes two sequences of a base character followed by a combining character.
Imports System.Collections.Generic Imports System.Globalization Module Example Public Sub Main() ' First sentence of The Mystery of the Yellow Room, by Leroux. Dim opening As String = "Ce n'est pas sans une certaine émotion que "+ "je commence à raconter ici les aventures " + "extraordinaires de Joseph Rouletabille." ' Character counters. Dim nChars As Integer = 0 ' Objects to store word count. Dim chars As New List(Of Integer)() Dim elements As New List(Of Integer)() For Each ch In opening ' Skip the ' character. If ch = ChrW(&h0027) Then Continue For If Char.IsWhiteSpace(ch) Or Char.IsPunctuation(ch) Then chars.Add(nChars) nChars = 0 Else nChars += 1 End If Next Dim te As TextElementEnumerator = StringInfo.GetTextElementEnumerator(opening) Do While te.MoveNext() Dim s As String = te.GetTextElement() ' Skip the ' character. If s = ChrW(&h0027) Then Continue Do If String.IsNullOrEmpty(s.Trim()) Or (s.Length = 1 AndAlso Char.IsPunctuation(Convert.ToChar(s))) elements.Add(nChars) nChars = 0 Else nChars += 1 End If Loop ' Display character counts. Console.WriteLine("{0,6} {1,20} {2,20}", "Word #", "Char Objects", "Characters") For ctr As Integer = 0 To chars.Count - 1 Console.WriteLine("{0,6} {1,20} {2,20}", ctr, chars(ctr), elements(ctr)) Next End Sub End Module ' The example displays the following output: ' Word # Char Objects Characters ' 0 2 2 ' 1 4 4 ' 2 3 3 ' 3 4 4 ' 4 3 3 ' 5 8 8 ' 6 8 7 ' 7 3 3 ' 8 2 2 ' 9 8 8 ' 10 2 1 ' 11 8 8 ' 12 3 3 ' 13 3 3 ' 14 9 9 ' 15 15 15 ' 16 2 2 ' 17 6 6 ' 18 12 12
Null Strings and Empty Strings
A string that has been declared but has not been assigned a value is Nothing. Attempting to call methods on that string throws a NullReferenceException. A null string is different from an empty string, which is a string whose value is "" or String.Empty. In some cases, passing either a null string or an empty string as an argument in a method call throws an exception. For example, passing a null string to the Int32.Parse method throws an ArgumentNullException, and passing an empty string throws a FormatException. In other cases, a method argument can be either a null string or an empty string. For example, if you are providing an IFormattable implementation for a class, you want to equate both a null string and an empty string with the general ("G") format specifier.
The String class includes the following two convenience methods that enable you to test whether a string is Nothing or empty:
IsNullOrEmpty, which indicates whether a string is either Nothing or is equal to String.Empty. This method eliminates the need to use code such as the following:
IsNullOrWhiteSpace, which indicates whether a string is Nothing, equals String.Empty, or consists exclusively of white-space characters. This method eliminates the need to use code such as the following:
The following example uses the IsNullOrEmpty method in the IFormattable.ToString implementation of a custom Temperature class. The method supports the "G", "C", "F", and "K" format strings. If an empty format string or a format string whose value is Nothing is passed to the method, its value is changed to the "G" format string.
Public Overloads Function ToString(fmt As String, provider As IFormatProvider) As String _ Implements IFormattable.ToString If String.IsNullOrEmpty(fmt) Then fmt = "G" If provider Is Nothing Then provider = CultureInfo.CurrentCulture Select Case fmt.ToUpperInvariant() ' Return degrees in Celsius. Case "G", "C" Return temp.ToString("F2", provider) + "°C" ' Return degrees in Fahrenheit. Case "F" Return (temp * 9 / 5 + 32).ToString("F2", provider) + "°F" ' Return degrees in Kelvin. Case "K" Return (temp + 273.15).ToString() Case Else Throw New FormatException( String.Format("The {0} format string is not supported.", fmt)) End Select End Function
Immutability and the StringBuilder Class
A String object is called immutable (read-only), because its value cannot be modified after it has been created. Methods that appear to modify a String object actually return a new String object that contains the modification.
Because strings are immutable, string manipulation routines that perform repeated additions or deletions to what appears to be a single string can extract a significant performance penalty. For example, the following code uses a random number generator to create a string with 1000 characters in the range 0x0001 to 0x052F. Although the code appears to use string concatenation to append a new character to the existing string named str, it actually creates a new String object for each concatenation operation.
Imports System.IO Imports System.Text Module Example Public Sub Main() Dim rnd As New Random() Dim str As String = String.Empty Dim sw As New StreamWriter(".\StringFile.txt", False, Encoding.Unicode) For ctr As Integer = 0 To 1000 str += ChrW(rnd.Next(1, &h0530)) If str.Length Mod 60 = 0 Then str += vbCrLf Next sw.Write(str) sw.Close() End Sub End Module
You can use the StringBuilder class instead of the String class for operations that make multiple changes to the value of a string. Unlike instances of the String class, StringBuilder objects are mutable; when you concatenate, append, or delete substrings from a string, the operations are performed on a single string. When you have finished modifying the value of a StringBuilder object, you can call its StringBuilder.ToString method to convert it to a string. The following example replaces the String used in the previous example to concatenate 1000 random characters in the range to 0x0001 to 0x052F with a StringBuilder object.
Imports System.IO Imports System.Text Module Example Public Sub Main() Dim rnd As New Random() Dim sb As New StringBuilder() Dim sw As New StreamWriter(".\StringFile.txt", False, Encoding.Unicode) For ctr As Integer = 0 To 1000 sb.Append(ChrW(rnd.Next(1, &h0530))) If sb.Length Mod 60 = 0 Then sb.AppendLine() Next sw.Write(sb.ToString()) sw.Close() End Sub End Module
Ordinal vs. Culture-Sensitive Operations
Members of the String class perform either ordinal or culture-sensitive (linguistic) operations on a String object. An ordinal operation acts on the numeric value of each Char object. A culture-sensitive operation acts on the value of the String object, and takes culture-specific casing, sorting, formatting, and parsing rules into account. Culture-sensitive operations execute in the context of an explicitly declared culture or the implicit current culture. The two kinds of operations can produce very different results when they are performed on the same string.
Security Note |
|---|
If your application makes a security decision about a symbolic identifier such as a file name or named pipe, or about persisted data such as the text-based data in an XML file, the operation should use an ordinal comparison instead of a culture-sensitive comparison. This is because a culture-sensitive comparison can yield different results depending on the culture in effect, whereas an ordinal comparison depends solely on the binary value of the compared characters. |
Important |
|---|
Most methods that perform string operations include an overload that has a parameter of type StringComparison, which enables you to specify whether the method performs an ordinal or culture-sensitive operation. In general, you should call this overload to make the intent of your method call clear. For best practices and guidance for using ordinal and culture-sensitive operations on strings, see Best Practices for Using Strings in the .NET Framework. |
Operations for casing, parsing and formatting, comparison and sorting, and testing for equality can be either ordinal or culture-sensitive. The following sections discuss each category of operation.
Casing
Casing rules determine how to change the capitalization of a Unicode character; for example, from lowercase to uppercase. Often, a casing operation is performed before a string comparison. For example, a string might be converted to uppercase so that it can be compared with another uppercase string. You can convert the characters in a string to lowercase by calling the ToLower or ToLowerInvariant method, and you can convert them to uppercase by calling the ToUpper or ToUpperInvariant method. In addition, you can use the TextInfo.ToTitleCase method to convert a string to title case.
Casing operations can be based on the rules of the current culture, a specified culture, or the invariant culture. The following example illustrates some of the differences in casing rules between cultures when converting strings to uppercase.
Imports System.Globalization Imports System.IO Module Example Public Sub Main() Dim sw As New StreamWriter(".\case.txt") Dim words As String() = { "file", "sıfır", "Dženana" } Dim cultures() As CultureInfo = { CultureInfo.InvariantCulture, New CultureInfo("en-US"), New CultureInfo("tr-TR") } For Each word In words sw.WriteLine("{0}:", word) For Each culture In cultures Dim name As String = If(String.IsNullOrEmpty(culture.Name), "Invariant", culture.Name) Dim upperWord As String = word.ToUpper(culture) sw.WriteLine(" {0,10}: {1,7} {2, 38}", name, upperWord, ShowHexValue(upperWord)) Next sw.WriteLine() Next sw.Close() End Sub Private Function ShowHexValue(s As String) As String Dim retval As String = Nothing For Each ch In s Dim bytes() As Byte = BitConverter.GetBytes(ch) retval += String.Format("{0:X2} {1:X2} ", bytes(1), bytes(0)) Next Return retval End Function End Module ' The example displays the following output: ' file: ' Invariant: FILE 00 46 00 49 00 4C 00 45 ' en-US: FILE 00 46 00 49 00 4C 00 45 ' tr-TR: FİLE 00 46 01 30 00 4C 00 45 ' ' sıfır: ' Invariant: SıFıR 00 53 01 31 00 46 01 31 00 52 ' en-US: SIFIR 00 53 00 49 00 46 00 49 00 52 ' tr-TR: SIFIR 00 53 00 49 00 46 00 49 00 52 ' ' Dženana: ' Invariant: DžENANA 01 C5 00 45 00 4E 00 41 00 4E 00 41 ' en-US: DŽENANA 01 C4 00 45 00 4E 00 41 00 4E 00 41 ' tr-TR: DŽENANA 01 C4 00 45 00 4E 00 41 00 4E 00 41
Parsing and Formatting
Formatting and parsing are inverse operations. Formatting rules determine how to convert a value, such as a date and time or a number, to its string representation, whereas parsing rules determine how to convert a string representation to a value such as a date and time. Both formatting and parsing rules are dependent on cultural conventions. The following example illustrates the ambiguity that can arise when interpreting a culture-specific date string. Without knowing the conventions of the culture that was used to produce a date string, it is not possible to know whether 03/01/2011, 3/1/2011, and 01/03/2011 represent January 3, 2011 or March 1, 2011.
Imports System.Globalization Module Example Public Sub Main() Dim dat As Date = #3/1/2011# Dim cultures() As CultureInfo = { CultureInfo.InvariantCulture, New CultureInfo("en-US"), New CultureInfo("fr-FR") } For Each culture In cultures Console.WriteLine("{0,-12} {1}", If(String.IsNullOrEmpty(culture.Name), "Invariant", culture.Name), dat.ToString("d", culture)) Next End Sub End Module ' The example displays the following output: ' Invariant 03/01/2011 ' en-US 3/1/2011 ' fr-FR 01/03/2011
Similarly, as the following example shows, a single string can produce different dates depending on the culture whose conventions are used in the parsing operation.
Imports System.Globalization Module Example Public Sub Main() Dim dateString As String = "07/10/2011" Dim cultures() As CultureInfo = { CultureInfo.InvariantCulture, CultureInfo.CreateSpecificCulture("en-GB"), CultureInfo.CreateSpecificCulture("en-US") } Console.WriteLine("{0,-12} {1,10} {2,8} {3,8}", "Date String", "Culture", "Month", "Day") Console.WriteLine() For Each culture In cultures Dim dat As Date = DateTime.Parse(dateString, culture) Console.WriteLine("{0,-12} {1,10} {2,8} {3,8}", dateString, If(String.IsNullOrEmpty(culture.Name), "Invariant", culture.Name), dat.Month, dat.Day) Next End Sub End Module ' The example displays the following output: ' Date String Culture Month Day ' ' 07/10/2011 Invariant 7 10 ' 07/10/2011 en-GB 10 7 ' 07/10/2011 en-US 7 10
String Comparison and Sorting
Sort rules determine the alphabetic order of Unicode characters and how two strings compare to each other. For example, the String.Compare(String, String, StringComparison) method compares two strings based on the StringComparison parameter. If the parameter value is StringComparison.CurrentCulture, the method performs a linguistic comparison that uses the conventions of the current culture; if the parameter value is StringComparison.Ordinal, the method performs an ordinal comparison. Consequently, as the following example shows, if the current culture is U.S. English, the first call to the String.Compare(String, String, StringComparison) method (using culture-sensitive comparison) considers "a" less than "A", but the second call to the same method (using ordinal comparison) considers "a" greater than "A".
Imports System.Globalization Imports System.Threading Module Example Public Sub Main() Thread.CurrentThread.CurrentCulture = CultureInfo.CreateSpecificCulture("en-US") Console.WriteLine(String.Compare("A", "a", StringComparison.CurrentCulture)) Console.WriteLine(String.Compare("A", "a", StringComparison.Ordinal)) End Sub End Module ' The example displays the following output: ' 1 ' -32
The .NET Framework supports word, string, and ordinal sort rules:
A word sort performs a culture-sensitive comparison of strings in which certain nonalphanumeric Unicode characters might have special weights assigned to them. For example, the hyphen (-) might have a very small weight assigned to it so that "coop" and "co-op" appear next to each other in a sorted list. For a list of the String methods that compare two strings using word sort rules, see the String Operations by Category section.
A string sort also performs a culture-sensitive comparison. It is similar to a word sort, except that there are no special cases, and all nonalphanumeric symbols come before all alphanumeric Unicode characters. Two strings can be compared using string sort rules by calling the CompareInfo.Compare method overloads that have an options parameter that is supplied a value of CompareOptions.StringSort. Note that this is the only method that the .NET Framework provides to compare two strings using string sort rules.
An ordinal sort compares strings based on the numeric value of each Char object in the string. An ordinal comparison is automatically case-sensitive because the lowercase and uppercase versions of a character have different code points. However, if case is not important, you can specify an ordinal comparison that ignores case. This is equivalent to converting the string to uppercase by using the invariant culture and then performing an ordinal comparison on the result. For a list of the String methods that compare two strings using ordinal sort rules, see the String Operations by Category section.
A culture-sensitive comparison is any comparison that explicitly or implicitly uses a CultureInfo object, including the invariant culture that is specified by the CultureInfo.InvariantCulture property. The implicit culture is the current culture, which is specified by the Thread.CurrentCulture and CultureInfo.CurrentCulture properties. A culture-sensitive comparison is generally appropriate for sorting, whereas an ordinal comparison is not. An ordinal comparison is generally appropriate for determining whether two strings are equal (that is, for determining identity), whereas a culture-sensitive comparison is not.
Note |
|---|
The culture-sensitive sorting and casing rules used in string comparison depend on the version of the .NET Framework. In the .NET Framework 4, sorting, casing, normalization, and Unicode character information is synchronized with Windows 7 and conforms to the Unicode 5.1 standard. |
For more information about word, string, and ordinal sort rules, see the System.Globalization.CompareOptions topic. For recommendations on when to use each rule, see Best Practices for Using Strings in the .NET Framework.
Ordinarily, you do not call string comparison methods such as Compare directly to determine the sort order of strings. Instead, comparison methods are called by sorting methods such as Array.Sort or List(Of T).Sort. The following example performs four different sorting operations (word sort using the current culture, word sort using the invariant culture, ordinal sort, and string sort using the invariant culture) without explicitly calling a string comparison method. Note that each type of sort produces a unique ordering of strings in its array.
Imports System.Collections Imports System.Collections.Generic Imports System.Globalization Module Example Public Sub Main() Dim strings() As String = { "coop", "co-op", "cooperative", "co" + ChrW(&h00AD) + "operative", "cœur", "coeur" } ' Perform a word sort using the current (en-US) culture. Dim current(strings.Length - 1) As String strings.CopyTo(current, 0) Array.Sort(current, StringComparer.CurrentCulture) ' Perform a word sort using the invariant culture. Dim invariant(strings.Length - 1) As String strings.CopyTo(invariant, 0) Array.Sort(invariant, StringComparer.InvariantCulture) ' Perform an ordinal sort. Dim ordinal(strings.Length - 1) As String strings.CopyTo(ordinal, 0) Array.Sort(ordinal, StringComparer.Ordinal) ' Perform a string sort using the current culture. Dim stringSort(strings.Length - 1) As String strings.CopyTo(stringSort, 0) Array.Sort(stringSort, new SCompare()) ' Display array values Console.WriteLine("{0,13} {1,13} {2,15} {3,13} {4,13}", "Original", "Word Sort", "Invariant Word", "Ordinal Sort", "String Sort") Console.WriteLine() For ctr As Integer = 0 To strings.Length - 1 Console.WriteLine("{0,13} {1,13} {2,15} {3,13} {4,13}", strings(ctr), current(ctr), invariant(ctr), ordinal(ctr), stringSort(ctr)) Next End Sub End Module ' IComparer<String> implementation to perform string sort. Friend Class SCompare : Implements IComparer(Of String) Public Function Compare(x As String, y As String) As Integer _ Implements IComparer(Of String).Compare Return CultureInfo.CurrentCulture.CompareInfo.Compare(x, y, CompareOptions.StringSort) End Function End Class ' The example displays the following output: ' Original Word Sort Invariant Word Ordinal Sort String Sort ' ' coop cœur cœur co-op co-op ' co-op coeur coeur coeur cœur ' cooperative coop coop coop coeur ' cooperative co-op co-op cooperative coop ' cœur cooperative cooperative cooperative cooperative ' coeur cooperative cooperative cœur cooperative
Caution |
|---|
If your primary purpose in comparing strings is to determine whether they are equal, you should call the String.Equals method. Typically, you should use Equals to perform an ordinal comparison. The String.Compare method is intended primarily to sort strings. |
String search methods, such as String.StartsWith and String.IndexOf, also can perform culture-sensitive or ordinal string comparisons. The following example illustrates the differences between ordinal and culture-sensitive comparisons using the IndexOf method. A culture-sensitive search in which the current culture is English (United States) considers the substring "oe" to match the ligature "œ". Because a soft hyphen (U+00AD) is a zero-width character, the search treats the soft hyphen as equivalent to Empty and finds a match at the beginning of the string. An ordinal search, on the other hand, does not find a match in either case.
Module Example Public Sub Main() ' Search for "oe" and "œu" in "œufs" and "oeufs". Dim s1 As String = "œufs" Dim s2 As String = "oeufs" FindInString(s1, "oe", StringComparison.CurrentCulture) FindInString(s1, "oe", StringComparison.Ordinal) FindInString(s2, "œu", StringComparison.CurrentCulture) FindInString(s2, "œu", StringComparison.Ordinal) Console.WriteLine() Dim softHyphen As String = ChrW(&h00AD) Dim s3 As String = "co" + softHyphen + "operative" FindInString(s3, softHyphen, StringComparison.CurrentCulture) FindInString(s3, softHyphen, StringComparison.Ordinal) End Sub Private Sub FindInString(s As String, substring As String, options As StringComparison) Dim result As Integer = s.IndexOf(substring, options) If result <> -1 Console.WriteLine("'{0}' found in {1} at position {2}", substring, s, result) Else Console.WriteLine("'{0}' not found in {1}", substring, s) End If End Sub End Module ' The example displays the following output: ' 'oe' found in œufs at position 0 ' 'oe' not found in œufs ' 'œu' found in oeufs at position 0 ' 'œu' not found in oeufs ' ' '' found in cooperative at position 0 ' '' found in cooperative at position 2
Testing for Equality
Use the String.Compare method to determine the relationship of two strings in the sort order. Typically, this is a culture-sensitive operation. In contrast, call the String.Equals method to test for equality. Because the test for equality usually compares user input with some known string, such as a valid user name, a password, or a file system path, it is typically an ordinal operation.
Caution |
|---|
It is possible to test for equality by calling the String.Compare method and determining whether the return value is zero. However, this practice is not recommended. To determine whether two strings are equal, you should call one of the overloads of the String.Equals method. The preferred overload to call is either the instance Equals method or the static Equals method, because both methods include a System.StringComparison parameter that explicitly specifies the type of comparison. |
The following example illustrates the danger of performing a culture-sensitive comparison for equality when an ordinal one should be used instead. In this case, the intent of the code is to prohibit file system access from URLs that begin with "FILE://" or "file://" by performing a case-insensitive comparison of the beginning of a URL with the string "FILE://". However, if a culture-sensitive comparison is performed using the Turkish (Turkey) culture on a URL that begins with "file://", the comparison for equality fails, because the Turkish uppercase equivalent of the lowercase "i" is "İ" instead of "I". As a result, file system access is inadvertently permitted. On the other hand, if an ordinal comparison is performed, the comparison for equality succeeds, and file system access is denied.
Imports System.Globalization Imports System.Threading Module Example Public Sub Main() Thread.CurrentThread.CurrentCulture = CultureInfo.CreateSpecificCulture("tr-TR") Dim filePath As String = "file://c:/notes.txt" Console.WriteLine("Culture-sensitive test for equality:") If Not TestForEquality(filePath, StringComparison.CurrentCultureIgnoreCase) Then Console.WriteLine("Access to {0} is allowed.", filePath) Else Console.WriteLine("Access to {0} is not allowed.", filePath) End If Console.WriteLine() Console.WriteLine("Ordinal test for equality:") If Not TestForEquality(filePath, StringComparison.OrdinalIgnoreCase) Then Console.WriteLine("Access to {0} is allowed.", filePath) Else Console.WriteLine("Access to {0} is not allowed.", filePath) End If End Sub Private Function TestForEquality(str As String, cmp As StringComparison) As Boolean Dim position As Integer = str.IndexOf("://") If position < 0 Then Return False Dim substring As String = str.Substring(0, position) Return substring.Equals("FILE", cmp) End Function End Module ' The example displays the following output: ' Culture-sensitive test for equality: ' Access to file://c:/notes.txt is allowed. ' ' Ordinal test for equality: ' Access to file://c:/notes.txt is not allowed.
Normalization
Some Unicode characters have multiple representations. For example, any of the following code points can represent the letter "ắ":
U+1EAF
U+0103 U+0301
U+0061 U+0306 U+0301
Multiple representations for a single character complicate searching, sorting, matching, and other string operations.
The Unicode standard defines a process called normalization that returns one binary representation of a Unicode character for any of its equivalent binary representations. Normalization can use several algorithms, called normalization forms, that follow different rules. The .NET Framework supports Unicode normalization forms C, D, KC, and KD. When strings have been normalized to the same normalization form, they can be compared by using ordinal comparison. For more information about normalization and normalization forms, see System.Text.NormalizationForm.
You can determine whether a string is normalized to normalization form C by calling the String.IsNormalized method, or you can call the String.IsNormalized(NormalizationForm) method to determine whether a string is normalized to a specified normalization form. You can also call the String.Normalize method to convert a string to normalization form C, or you can call the String.Normalize(NormalizationForm) method to convert a string to a specified normalization form.
The following example illustrates string normalization. It defines the letter “ố” in three different ways in three different strings, and uses an ordinal comparison for equality to determine that each string differs from the other two strings. It then converts each string to the supported normalization forms, and again performs an ordinal comparison of each string in a specified normalization form. In each case, the second test for equality shows that the strings are equal.
Imports System.Globalization Imports System.IO Imports System.Text Module Example Private sw As StreamWriter Public Sub Main() sw = New StreamWriter(".\TestNorm1.txt") ' Define three versions of the same word. Dim s1 As String = "sống" ' create word with U+1ED1 Dim s2 AS String = "s" + ChrW(&h00F4) + ChrW(&h0301) + "ng" Dim s3 As String = "so" + ChrW(&h0302) + ChrW(&h0301) + "ng" TestForEquality(s1, s2, s3) sw.WriteLine() ' Normalize and compare strings using each normalization form. For Each formName In [Enum].GetNames(GetType(NormalizationForm)) sw.WriteLine("Normalization {0}:", formName) Dim nf As NormalizationForm = CType([Enum].Parse(GetType(NormalizationForm), formName), NormalizationForm) Dim sn() As String = NormalizeStrings(nf, s1, s2, s3) TestForEquality(sn) sw.WriteLine(vbCrLf) Next sw.Close() End Sub Private Sub TestForEquality(ParamArray words As String()) For ctr As Integer = 0 To words.Length - 2 For ctr2 As Integer = ctr + 1 To words.Length - 1 sw.WriteLine("{0} ({1}) = {2} ({3}): {4}", words(ctr), ShowBytes(words(ctr)), words(ctr2), ShowBytes(words(ctr2)), words(ctr).Equals(words(ctr2), StringComparison.Ordinal)) Next Next End Sub Private Function ShowBytes(str As String) As String Dim result As String = Nothing For Each ch In str result += String.Format("{0} ", Convert.ToUInt16(ch).ToString("X4")) Next Return result.Trim() End Function Private Function NormalizeStrings(nf As NormalizationForm, ParamArray words() As String) As String() For ctr As Integer = 0 To words.Length - 1 If Not words(ctr).IsNormalized(nf) Then words(ctr) = words(ctr).Normalize(nf) End If Next Return words End Function End Module ' The example displays the following output: ' sống (0073 1ED1 006E 0067) = sống (0073 00F4 0301 006E 0067): False ' sống (0073 1ED1 006E 0067) = sống (0073 006F 0302 0301 006E 0067): False ' sống (0073 00F4 0301 006E 0067) = sống (0073 006F 0302 0301 006E 0067): False ' ' Normalization FormC: ' ' sống (0073 1ED1 006E 0067) = sống (0073 1ED1 006E 0067): True ' sống (0073 1ED1 006E 0067) = sống (0073 1ED1 006E 0067): True ' sống (0073 1ED1 006E 0067) = sống (0073 1ED1 006E 0067): True ' ' ' Normalization FormD: ' ' sống (0073 006F 0302 0301 006E 0067) = sống (0073 006F 0302 0301 006E 0067): True ' sống (0073 006F 0302 0301 006E 0067) = sống (0073 006F 0302 0301 006E 0067): True ' sống (0073 006F 0302 0301 006E 0067) = sống (0073 006F 0302 0301 006E 0067): True ' ' ' Normalization FormKC: ' ' sống (0073 1ED1 006E 0067) = sống (0073 1ED1 006E 0067): True ' sống (0073 1ED1 006E 0067) = sống (0073 1ED1 006E 0067): True ' sống (0073 1ED1 006E 0067) = sống (0073 1ED1 006E 0067): True ' ' ' Normalization FormKD: ' ' sống (0073 006F 0302 0301 006E 0067) = sống (0073 006F 0302 0301 006E 0067): True ' sống (0073 006F 0302 0301 006E 0067) = sống (0073 006F 0302 0301 006E 0067): True ' sống (0073 006F 0302 0301 006E 0067) = sống (0073 006F 0302 0301 006E 0067): True
String Operations by Category
The String class provides members for comparing strings, testing strings for equality, finding characters or substrings in a string, modifying a string, extracting substrings from a string, combining strings, formatting values, copying a string, and normalizing a string.
Comparing Strings
You can compare strings to determine their relative position in the sort order by using the following String methods:
Compare returns an integer that indicates the relationship of one string to a second string in the sort order.
CompareOrdinal returns an integer that indicates the relationship of one string to a second string based on a comparison of their code points.
CompareTo returns an integer that indicates the relationship of the current string instance to a second string in the sort order. The CompareTo method provides the IComparable and IComparable(Of T) implementations for the String class.
Testing Strings for Equality
You call the Equals method to determine whether two strings are equal. The instance Equals and the static Equals overloads let you specify whether the comparison is culture-sensitive or ordinal, and whether case is considered or ignored. Most tests for equality are ordinal, and comparisons for equality that determine access to a system resource (such as a file system object) should always be ordinal.
Finding Characters in a String
The String class includes two kinds of search methods:
Methods that return a Boolean value to indicate whether a particular substring is present in a string instance. These include the Contains, EndsWith, and StartsWith methods.
Methods that indicate the starting position of a substring in a string instance. These include the IndexOf, IndexOfAny, LastIndexOf, and LastIndexOfAny methods.
Modifying a String
The String class includes the following methods that appear to modify the value of a string:
Insert inserts a string into the current String instance.
PadLeft inserts one or more occurrences of a specified character at the beginning of a string.
PadRight inserts one or more occurrences of a specified character at the beginning of a string.
Remove deletes a substring from the current String instance.
Replace replaces a substring with another substring in the current String instance.
ToLower and ToLowerInvariant convert all the characters in a string to lowercase.
ToUpper and ToUpperInvariant convert all the characters in a string to uppercase.
Trim removes all occurrences of a character from the beginning and end of a string.
TrimEnd removes all occurrences of a character from the end of a string.
TrimStart removes all occurrences of a character from the beginning of a string.
Important |
|---|
All string modification methods return a new String object. They do not modify the value of the current instance. |
Extracting Substrings from a String
The String.Split method separates a single string into multiple strings. Overloads of the method allow you to specify multiple delimiters, to determine the maximum number of substrings that the method extracts, and to determine whether empty strings (which occur when delimiters are adjacent) are included among the returned strings.
Combining Strings
The following String methods can be used for string concatenation:
Formatting Values
The String.Format method uses the composite formatting feature to replace one or more placeholders in a string with the string representation of some object or value. The Format method is often used to do the following:
To embed the string representation of a numeric value in a string.
To embed the string representation of a date and time value in a string.
To embed the string representation of an enumeration value in a string.
To embed the string representation of some object that supports the IFormattable interface in a string.
To right-justify or left-justify a substring in a field within a larger string.
Copying a String
You can call the following String methods to make a copy of a string:
Normalizing a String
In Unicode, a single character can have multiple code points. Normalization converts these equivalent characters into the same binary representation. The String.Normalize method performs the normalization, and the String.IsNormalized method determines whether a string is normalized.
Windows 7, Windows Vista SP1 or later, Windows XP SP3, Windows XP SP2 x64 Edition, Windows Server 2008 (Server Core Role not supported), Windows Server 2008 R2 (Server Core Role not supported), Windows Server 2003 SP2
The .NET Framework does not support all versions of every platform. For a list of the supported versions, see .NET Framework System Requirements.