UTF8Encoding.GetPreamble Method ()

 

Returns a Unicode byte order mark encoded in UTF-8 format, if the UTF8Encoding encoding object is configured to supply one.

Namespace:   System.Text
Assembly:  mscorlib (in mscorlib.dll)

public override byte[] GetPreamble()

Return Value

Type: System.Byte[]

A byte array containing the Unicode byte order mark, if the UTF8Encoding encoding object is configured to supply one. Otherwise, this method returns a zero-length byte array.

The UTF8Encoding object can provide a preamble, which is a byte array that can be prefixed to the sequence of bytes that result from the encoding process. Prefacing a sequence of encoded bytes with a byte order mark (code point U+FEFF) helps the decoder determine the byte order and the transformation format, or UTF. The Unicode byte order mark (BOM) is serialized as 0xEF 0xBB 0xBF. Note that the Unicode Standard neither requires nor recommends the use of a BOM for UTF-8 encoded streams.

You can instantiate a UTF8Encoding object whose GetPreamble method returns a valid BOM in the following ways:

  • By retrieving the UTF8Encoding object returned by the Encoding.UTF8 property.

  • By calling a UTF8Encoding constructor with a encoderShouldEmitUTF8Identifier parameter and setting its value set to true.

All other UTF8Encoding objects are configured to return an empty array rather than a valid BOM.

The BOM provide nearly certain identification of an encoding for files that otherwise have lost a reference to their encoding, such as untagged or improperly tagged web data or random text files stored when a business did not have international concerns. Often user problems might be avoided if data is consistently and properly tagged.

For standards that provide an encoding type, a BOM is somewhat redundant. However, it can be used to help a server send the correct encoding header. Alternatively, it can be used as a fallback in case the encoding is otherwise lost.

There are some disadvantages to using a BOM. For example, knowing how to limit the database fields that use a BOM can be difficult. Concatenation of files can be a problem also, for example, when files are merged in such a way that an unnecessary character can end up in the middle of data. In spite of the few disadvantages, however, the use of a BOM is highly recommended.

For more information on byte order and the byte order mark, see The Unicode Standard at the Unicode home page.

System_CAPS_cautionCaution

To ensure that the encoded bytes are decoded properly when they are saved as a file or as a stream, you can prefix the beginning of a stream of encoded bytes with a preamble. Note that the GetBytes method does not prepend a BOM to a sequence of encoded bytes; supplying a BOM at the beginning of an appropriate byte stream is the developer's responsibility.

The following example uses the GetPreamble method to return the Unicode byte order mark encoded in UTF-8 format. Notice that the default constructor for UTF8Encoding does not provide a preamble.

using System;
using System.Text;

class Example
{
    public static void Main()
    {
        // The default constructor does not provide a preamble.
        UTF8Encoding UTF8NoPreamble = new UTF8Encoding();
        UTF8Encoding UTF8WithPreamble = new UTF8Encoding(true);

        Byte[] preamble;

        preamble = UTF8NoPreamble.GetPreamble();
        Console.WriteLine("UTF8NoPreamble");
        Console.WriteLine(" preamble length: {0}", preamble.Length);
        Console.Write(" preamble: ");
        ShowArray(preamble);
        Console.WriteLine();

        preamble = UTF8WithPreamble.GetPreamble();
        Console.WriteLine("UTF8WithPreamble");
        Console.WriteLine(" preamble length: {0}", preamble.Length);
        Console.Write(" preamble: ");
        ShowArray(preamble);
    }

    public static void ShowArray(Byte[] bytes)
    {
        foreach (var b in bytes)
            Console.Write("{0:X2} ", b);

        Console.WriteLine();
    }
}
// The example displays the following output:
//    UTF8NoPreamble
//     preamble length: 0
//     preamble:
//
//    UTF8WithPreamble
//     preamble length: 3
//     preamble: EF BB BF

The following example instantiates two UTF8Encoding objects, the first by calling the parameterless UTF8Encoding() constructor, which does not provide a BOM, and the second by calling the UTF8Encoding(Boolean) constructor with its encoderShouldEmitUTF8Identifier argument set to true. It then calls the GetPreamble method to write the BOM to a file before writing a UF8-encoded string. As the console output from the example shows, the file that saves the bytes from the second encoder has three more bytes than the first.

using System;
using System.IO;
using System.Text;

public class Example
{
   public static void Main()
   {
      String s = "This is a string to write to a file using UTF-8 encoding.";

      // Write a file using the default constructor without a BOM.
      var enc = new UTF8Encoding();
      Byte[] bytes = enc.GetBytes(s);
      WriteToFile(@".\NoPreamble.txt", enc, bytes);

      // Use BOM.
      enc = new UTF8Encoding(true);
      WriteToFile(@".\Preamble.txt", enc, bytes);
   }

   private static void WriteToFile(String fn, Encoding enc, Byte[] bytes)
   {
      var fs = new FileStream(fn, FileMode.Create);
      Byte[] preamble = enc.GetPreamble();
      fs.Write(preamble, 0, preamble.Length);
      Console.WriteLine("Preamble has {0} bytes", preamble.Length);
      fs.Write(bytes, 0, bytes.Length);
      Console.WriteLine("Wrote {0} bytes to {1}.", fs.Length, fn);
      fs.Close();
      Console.WriteLine();
   }
}
// The example displays the following output:
//       Preamble has 0 bytes
//       Wrote 57 bytes to NoPreamble.txt.
//
//       Preamble has 3 bytes
//       Wrote 60 bytes to Preamble.txt.

You can also compare the files by using the fc command in a console window, or you can inspect the files in a text editor that includes a Hex View mode. Note that when the file is opened in an editor that supports UTF-8, the BOM is not displayed.

Universal Windows Platform
Available since 8
.NET Framework
Available since 1.1
Portable Class Library
Supported in: portable .NET platforms
Silverlight
Available since 2.0
Windows Phone Silverlight
Available since 7.0
Windows Phone
Available since 8.1
Return to top
Show: