[Note: This topic is pre-release documentation and is subject to change in future releases. Blank topics are included as placeholders.] Web services created using ASP.NET provide an HTML parsing solution that enables developers to parse content from a remote HTML page and programmatically expose the resulting data. For a detailed explanation, see HTML Parsing by ASP.NET XML Web Services.
To specify an operation and input parameters
Create a Web Services Description Language (WSDL) document, which is typically saved with the file name extension .wsdl. The document's content must consist of valid XML according to the WSDL schema. For a prototype, you can use a WSDL document dynamically generated for a Web service running on ASP.NET. Make a request with a ?wsdl argument appended to the Web service URL.
Specify the elements that define the operation each Web service method that parses HTML text. This step and the next one require a knowledge of the WSDL format.
If the parsing method takes input parameters, specify the elements that represent those parameters and associate them with the operation.
To specify the data returned from a parsed HTML page
Conclusion
Note: |
|---|
The operation name inside a binding must be globally unique or Wsdl.exe can be run with the namespace specified to prevent naming collisions caused by other WSDL files imported in the same application.
|
Add <match> XML elements in the service description within the <text> XML element for each piece of data you want to return from the parsed HTML page.
Apply attributes to the <match> element. The valid attributes are presented in a table under the topic HTML Parsing by ASP.NET XML Web Services.
To generate client proxy code for the Web service
The following code example is a simple Web page sample containing <TITLE> and <H1> tags.
<HTML>
<HEAD>
<TITLE>Sample Title</TITLE>
</HEAD>
<BODY>
<H1>Some Heading Text</H1>
</BODY>
</HTML>
The following code example is a service description that parses the contents of the HTML page, extracting the contents of the text within the <TITLE> and <H1> tags. In the code example, a TestHeaders method is defined for the GetTitleHttpGet binding. The TestHeaders method defines two pieces of data that can be returned from the parsed HTML page in <match> XML elements: Title and H1, which parse the contents of the <TITLE> and <H1> tags, respectively.
<?xml version="1.0"?>
<definitions xmlns:s="http://www.w3.org/2001/XMLSchema"
xmlns:http="http://schemas.xmlsoap.org/wsdl/http/"
xmlns:mime="http://schemas.xmlsoap.org/wsdl/mime/"
xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/"
xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/"
xmlns:s0="http://tempuri.org/"
targetNamespace="http://tempuri.org/"
xmlns="http://schemas.xmlsoap.org/wsdl/">
<types>
<s:schema targetNamespace="http://tempuri.org/"
attributeFormDefault="qualified"
elementFormDefault="qualified">
<s:element name="TestHeaders">
<s:complexType derivedBy="restriction"/>
</s:element>
<s:element name="TestHeadersResult">
<s:complexType derivedBy="restriction">
<s:all>
<s:element name="result" type="s:string" nullable="true"/>
</s:all>
</s:complexType>
</s:element>
<s:element name="string" type="s:string" nullable="true"/>
</s:schema>
</types>
<message name="TestHeadersHttpGetIn"/>
<message name="TestHeadersHttpGetOut"> <part name="Body" element="s0:string"/> </message>
<portType name="GetTitleHttpGet">
<operation name="TestHeaders">
<input message="s0:TestHeadersHttpGetIn"/>
<output message="s0:TestHeadersHttpGetOut"/>
</operation>
</portType>
<binding name="GetTitleHttpGet" type="s0:GetTitleHttpGet">
<http:binding verb="GET"/>
<operation name="TestHeaders">
<http:operation location="MatchServer.html"/>
<input>
<http:urlEncoded/>
</input>
<output> <text xmlns="http://microsoft.com/wsdl/mime/textMatching/"> <match name='Title' pattern='TITLE>(.*?)<'/> <match name='H1' pattern='H1>(.*?)<'/> </text> </output>
</operation>
</binding>
<service name="GetTitle">
<port name="GetTitleHttpGet" binding="s0:GetTitleHttpGet">
<http:address location="http://localhost" />
</port>
</service>
</definitions>
The following code example is a portion of the proxy class generated by Wsdl.exe for the previous service description.
' GetTitle is the name of the proxy class.
<br /><b>Public Class GetTitle</b>
Inherits HttpGetClientProtocol
<br /><b> Public Function TestHeaders() As TestHeadersMatches</b>
Return CType(Me.Invoke("TestHeaders", (Me.Url + _
"/MatchServer.html"), New Object(-1) {}),TestHeadersMatches)
End Function
End Class
<br /><b>Public Class TestHeadersMatches</b><br /><b> Public Title As String</b><br /><b> Public H1 As String</b>
End Class
' GetTitle is the name of the proxy class.
<br /><b>public class GetTitle : HttpGetClientProtocol</b>
{
<br /><b> public TestHeadersMatches TestHeaders() </b>
{
return ((TestHeadersMatches)(this.Invoke("TestHeaders",
(this.Url + "/MatchServer.html"), new object[0])));
}
}
<br /><b>public class TestHeadersMatches </b>
{
<br /><b> public string Title;</b><br /><b> public string H1;</b>
}
HTML Parsing by ASP.NET XML Web Services
.NET Framework Regular Expressions
MatchAttribute
XML Web Services Using ASP.NET
Web Services Description Language Tool (Wsdl.exe)