How to: Convert RTF to Plain Text
The document is archived and information here might be outdated

How to: Convert RTF to Plain Text (C# Programming Guide)

Updated: July 2008

Rich Text Format (RTF) is a document format developed by Microsoft in the late 1980s to enable the exchange of documents across operating systems. Both Microsoft Word and WordPad can read and write RTF documents. In the .NET Framework, you can use the RichTextBox control to create a word processor that supports RTF and enables a user to apply formatting to text in a WYSIWIG manner.

You can also use the RichTextBox control to programmatically remove the RTF formatting codes from a document and convert it to plain text. You do not need to embed the control in a Windows Form to perform this kind of operation.

To use the RichTextBox control in a project

  1. Add a reference to System.Windows.Forms.dll.

  2. Add a using directive for the System.Windows.Forms namespace (optional).

The following example provides a sample RTF file to be converted. The file contains RTF formatting, such as font information, and it also contains four Unicode characters and four extended ASCII characters. The file is opened, passed to the RichTextBox as RTF, retrieved as text, displayed in a MessageBox, and output to a file in UTF-8 format.

    // Save the following RTF file to the same folder as your .exe file, and call it "test.rtf".
    {\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fswiss\fcharset0 Arial;}{\f1\fnil\fprq1\fcharset0 Courier New;}{\f2\fswiss\fprq2\fcharset0 Arial;}}
{\colortbl ;\red0\green128\blue0;\red0\green0\blue0;}
{\*\generator Msftedit;}\viewkind4\uc1\pard\f0\fs20 This is the \i Greek \i0 word "psyche": \cf1\f1\u968?\u965?\u967?\u942?\cf2\f2 . It is encoded in Unicode.\par
Here are four extended \b ASCII \b0 characters (Windows code page 1252):  \'e2\'e4\u1233?\'e5\cf0\par
    class ConvertFromRTF
        static void Main()

            string path = @"test.rtf";

            //Create the RichTextBox. (Requires a reference to System.Windows.Forms.dll.)
            System.Windows.Forms.RichTextBox rtBox = new System.Windows.Forms.RichTextBox();

            // Get the contents of the RTF file. Note that when it is 
            // stored in the string, it is encoded as UTF-16. 
            string s = System.IO.File.ReadAllText(path);

            // Display the RTF text.

            // Convert the RTF to plain text.
            rtBox.Rtf = s;
            string plainText = rtBox.Text;

            // Display plain text output in MessageBox because console 
            // cannot display Greek letters.

            // Output plain text to file, encoded as UTF-8.
            System.IO.File.WriteAllText(@"output.txt", plainText);

RTF characters are encoded in eight bits. However, the format does let users specify Unicode characters in addition to extended ASCII characters from specified code pages. Because the RichTextBox.Text property is of type string, the characters are encoded as Unicode UTF-16. Any extended ASCII characters and Unicode characters from the source RTF document are correctly encoded in the text output.

If you use the File.WriteAllText method to write the text to disk, the text will be encoded as UTF-8 (without a Byte Order Mark).




July 2008

Added topic.

Content bug fix.

© 2016 Microsoft