Using the HTML to XHTML Conversion Tool [InfoPath 2003 SDK Documentation]

Applies to:

Microsoft Office InfoPath 2003

Microsoft Office InfoPath 2003 Service Pack 1

The Microsoft Office InfoPath 2003 Software Development Kit (SDK) HTML to XHTML conversion tool allows you to convert regular HTML into well-formed XHTML that can be edited in an InfoPath form. This is useful in cases where a form designer needs to take HTML documents that are created outside of InfoPath and insert them into a form. Since InfoPath will only accept well-formed XHTML, the HTML must first be converted. The conversion process attempts to fix malformed HTML by inserting closing tags (such as </p> tags), and creating self-closing tags where needed (such as <br/> tags). It also fixes attributes which are not properly formed.

Warning  The HTML to XHTML conversion tool may fail when it encounters badly formed HTML. It is not designed to correct all the possible instances of malformed HTML, such as HTML that is completely lacking any closing tags. The tool will not correct HTML, for example, like the following: <a><b>text.

The HTML to XHTML conversion tool is implemented as a Component Object Model (COM)-based object model that contains one object and two methods. The object is named XHTMLUtilities and the methods that it implements are convertToXHTML and convertToXHTMLEx. This simple object model can be used in any COM-compliant programming language.

The file name that contains the methods of the HTML to XHTML conversion tool is html2xhtml.dll, and it is located in the <drive>:\Program Files\Microsoft Office 2003 Developer Resources\Microsoft Office InfoPath 2003 SDK\Tools folder. This DLL must first be registered on your computer before you can reference it in script. In addition to this file, there is also a file named html2xhtml_sample.htm that is used to demonstrate how you can use the XHTMLUtilities object.

The following sections discuss the two methods of the XHTMLUtilities object.

The convertToXHTML method

Creates an XHTML string from a supplied HTML or XHTML string.

expression.convertToXHTML(ByVal bstrHTML As String) As String

expression Required. An expression that returns a reference to the XHTMLUtilities object.

bstrHTML Required String. The HTML that is to be converted to XHTML.

returnsString. The new XHTML string.

Remarks

The convertToXHTML method may fail to produce the appropriate XHTML string if XML is passed instead of HTML.

Example

In the following example, Windows script code is used to create a reference to the FileSystemObject object, which reads the HTML text stored in a specified file, the convertToXHTML method is used to convert the HTML text contained in a file to XHTML, then the FileSystemObject is used to create a new file that contains the XHTML text:

var args = WScript.Arguments;

if (args.length != 2)
{
   WScript.Echo("Usage: " + WScript.ScriptName + " <INPUTHTML> <OUTPUTXHTML>");
}
else
{
   var strInputFile = args.item(0);
   var strOutputFile = args.item(1);
   var objFSO = WScript.CreateObject("Scripting.FileSystemObject");
   var objInputFile = objFSO.OpenTextFile(strInputFile, 1 /*ForReading*/, false);

   strHTML   = objInputFile.ReadAll();
   var oXHTMLUtils = new ActiveXObject("HTML2XHTML.XHTMLUtilities");
   strXHTML = oXHTMLUtils.convertToXHTML(strHTML);
   var objOutputFile = objFSO.CreateTextFile(strOutputFile, true);
   objOutputFile.Write(strXHTML);
   objOutputFile.Close();
}

The convertToXHTMLEx method

Creates an XHTML string from a supplied HTML or XHTML string, and returns information about any changes that were made.

expression.convertToXHTMLEx(ByVal bstrHTML As String, ByVal iOptions As Long, ByRef pfStatus As Long) As String

expression Required. An expression that returns a reference to the XHTMLUtilities object.

bstrHTML  Required String. The HTML that is to be converted to XHTML.

options

Value Option Conversion Operation
0 DefaultNamespace Inserts the default XHTML namespace.
1 NoDefaultNamespace Does not insert the default XHTML namespace.

pfStatus  Required Long. A value indicating whether changes were made. A value of zero (false) indicates no changes were made; a value of one (true) indicates that changes were made.

returnsString. The new XHTML string.

Remarks

The convertToXHTMLEx method may fail to produce the appropriate XHTML string if XML is passed instead of HTML.

Example

In the following example, the convertToXHTMLEx method is used to convert the HTML text contained in a file to XHTML:

var args = WScript.Arguments;

if (args.length != 2)
{
   WScript.Echo("Usage: " + WScript.ScriptName + " <INPUTHTML> <OUTPUTXHTML>");
}
else
{
   var strInputFile = args.item(0);
   var strOutputFile = args.item(1);
   var objFSO = WScript.CreateObject("Scripting.FileSystemObject");
   var objInputFile = objFSO.OpenTextFile(strInputFile, 1 /*ForReading*/, false);
   var bReturn;

   strHTML   = objInputFile.ReadAll();
   var oXHTMLUtils = new ActiveXObject("HTML2XHTML.XHTMLUtilities");
   strXHTML = oXHTMLUtils.convertToXHTMLEx(strHTML, 1, bReturn);
   var objOutputFile = objFSO.CreateTextFile(strOutputFile, true);
   objOutputFile.Write(strXHTML);
   objOutputFile.Close();
}