3.1 Encapsulating HTML into RTF

Having the following source HTML content:

 <HTML><head>
 <style>
 <!--
  /* Style Definitions */
  p.MsoNormal, li.MsoNormal {font-family:Arial;}
 -->
 </style>
 <!-- This is a HTML comment.
 There is a horizontal tab (%x09) character before the comment,
 and some new lines inside the comment. -->
 </head>
 <body>
 <p
 class="MsoNormal">Note the line break inside a P tag. <b>This is bold text</b> </p>
 <p class="MsoNormal">
 This is a normal text with a character references: &nbsp; &lt; &uml;<br>
 characters that have special meaning in RTF: {}\<br>
 </p>
 <ol>
 <li class="MsoNormal">This is a list item
 </ol>
 </body>
 </HTML>

An encapsulating RTF writer can (by conforming to this algorithm) produce the following RTF:

 {\rtf1\ANSI\ansicpg1251\fromhtml1 \deff0
 {\fonttbl {\f0\fmodern Courier New;}{\f1\fswiss Arial;}{\f2\fswiss\fcharset0 Arial;}}
 {\colortbl\red0\green0\blue0;\red0\green0\blue255;} 
 {\*\htmltag64}
 \uc1\pard\plain\deftab360 \f0\fs24
 {\*\htmltag <HTML><head>\par
 <style>\par
 <!--\par
  /* Style Definitions */\par
  p.MsoNormal, li.MsoNormal \{font-family:Arial;\}\par
 -->\par
 </style>\par
 \tab <!-- This is a HTML comment.\par
 There is a horizontal tab (%x09) character before the comment, \par
 and some new lines inside the comment. -->\par
 </head>\par
 <body>\par
 <p\par
 class="MsoNormal">}
 {\htmlrtf \f1 \htmlrtf0 Note the line break inside a P tag. {\*\htmltag <b>}{\htmlrtf \b \htmlrtf0 This is a bold text{\*\htmltag </b>}} \htmlrtf\par\htmlrtf0}
 \htmlrtf \par \htmlrtf0
 {\*\htmltag </p>\par
 <p class="MsoNormal">\par}
 {\htmlrtf \f1 \htmlrtf0 This is a normal text with a character references:
 {\*\htmltag &nbsp;}\htmlrtf \'a0\htmlrtf0  {\*\htmltag &lt;}\htmlrtf <\htmlrtf0  {\*\htmltag &uml;}\htmlrtf {\f2\'a8}\htmlrtf0{\*\htmltag <br>\par}\htmlrtf\line\htmlrtf0
 characters which have special meaning in RTF: \{\}\\{\*\htmltag <br>\par}\htmlrtf\line\htmlrtf0\htmlrtf\par\htmlrtf0}
 {\*\htmltag </p>\par
 <ol>\par
     <li class="MsoNormal">}{\htmlrtf {{\*\pn\pnlvlbody\pndec\pnstart1\pnindent360{\pntxta.}}\li360\fi-360{\pntext 1.\tab} \f1 \htmlrtf0 This is a list item}\htmlrtf\par\htmlrtf0}
 {\*\htmltag \par
 </ol>\par
 </body>\par
 </HTML>\par }}

A de-encapsulating RTF reader can recover the original HTML document from the RTF example in this section by conforming to this algorithm.

Show: