So far you have learned how computers connect to one another so that they can exchange information. Once the connection is established, the next stage is for data to flow between the two computers.
In most situations, the computer that initiates the exchange is called the client, and the computer that receives the connection is called the server. A computer program runs on the server at all times, listening for connections from clients. On the client computer, another program (such as a web browser) connects to the server whenever it requires information. For example, when you request a Web page, the browser makes a connection to the Web server for that page when you click the Go button.
In a typical client/server scenario, the client sends some data called a request to the server, and the server determines the nature of the request and formulates a response, which it sends back to the client. For example, when an email program reads your email from a mail server, the following steps occur:
The client sends a user name and password to the server.
The server responds to say that the user name and password are accepted.
The client requests a list of emails that are on the server.
The server responds with a list of emails, not including the body of the email.
The client requests the body of a specific email (for example when you double-click on the email to view it).
The server sends the email body.
The client and server must have a common understanding of the contents of the request and response messages. We saw earlier how when the two computers made a network connection, they have common understanding, defined by the TCP/IP protocol, that enables them to send data and confirm that it has been delivered. On top of this network protocol sits another messaging protocol that specifies a common language for computer applications to talk to each other. The protocol between two computer applications is called an application protocol. You can imagine a network protocol like the regulations that govern how to construct a road, and an application protocol like the regulations that govern how to drive on the road.
The most common Internet application protocol is the Hypertext Transfer Protocol (HTTP), which is the main protocol used in the World Wide Web. You may also have come across the File Transfer Protocol (FTP), which you can use to transfer files from one computer to another.
Hypertext Transfer Protocol (HTTP)
When a browser communicates with a Web server, it sends an HTTP request to the Web server. The contents of the HTTP request specify what action the client requires the server to take. HTTP supports several methods (sometimes called ‘verbs’) that the client can specify in the request; the most commonly-used methods are GET and POST. A browser uses the GET method when it requests a Web page, and it uses the POST method when it sends data to be processed, for example when you click the Submit or Search button on a Web page.
The following is an example of the information sent back and forth between the browser and the Web server when the browser sends a GET request for http://www.example.com/index.html. Don’t worry about the detail of the request and response; this is simply to give you a flavor of what is going on inside the HTTP protocol.
The first line of the request specifies the method (GET), the web page or resource (/index.html), and the protocol (HTTP 1.1). The following lines are request headers that indicate extra information to the Web server. The response contains a status line that includes the protocol (HTTP 1.1), a status code (200) and a text description of what the status code means (OK). The next few lines are response headers that contain extra information about the Web server and the Web page.
After the headers there is a blank line, followed by the actual content of the Web page. This is the real payload of the response, and it contains the information that the browser will actually display.
Hypertext Markup Language (HTML)
The purpose of HTTP is simply to provide a common ‘language’ in which the browser and the Web server can exchange information about Web pages and other resources. The real substance of the exchange, when the browser requests a Web page, is the section of the response that describes how the page will appear inside the browser window.
The description of the Web page that the server sends is in a standard format, so that the browser can understand how the Web server wants it to display the page. This time the standard format is not in fact a protocol, because it does not define how two computers exchange information. It is actually a language, although the distinction is fine: there are rules about computer languages just as there are about a network protocols. The language that Web servers and browsers use to describe a Web page is Hypertext Markup Language (HTML). HTML is a language that is used primarily to format data on a page. It doesn’t contain any advanced support for doing complex operations; it just serves to layout the contents in a readable way on the web page. When the browser receives an HTML page, it converts the HTML description into a screen display by a process called rendering. The browser reads the HTML instructions and “renders” the result to the screen.
HTML is a text-based language, which means that you can view and edit HTML in a standard text editor like Notepad. It consists of the text on the Web page, along with ‘markup tags’ that indicate to the browser how that text should be displayed. The markup specifies items such as the font to use for sections of text, where to display embedded images, and of course hyperlinks that enable you to link to different Web pages. You can look at the HTML for any Web page in your browser by viewing the page source. (To view the page source in Internet Explorer, click the View menu, and then click Source.)
This guide does not intend to teach you how to write HTML. There are many books and online resources that can help you to learn HTML. However, the document below is a short example of a simple HTML page, which demonstrates some important concepts.
| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Welcome</title>
</head>
<body>
<img src="/images/logo.jpg" height="50px" width="200px" border="0" />
<h1>Welcome to Example.com</h1>
<p>Welcome to the Example.com Web site. We hope you will find lots of examples here.</p>
<p>The full list of examples is on <a href="/example-list.html">this Web page.</a></p>
</body>
</html> |
HTML consists of a series of tags denoted by angle brackets around a tag name, such as the <html> tag in the example. At the end of the document is another tag, </html>. The forward slash indicates that this is the end tag that corresponds to the <html> tag at the start of the document. Everything between the two tags is called the <html> element. Inside the <html> element is a <head>element, which contains <title> and <link> elements. An HTML document consists of elements that are nested one inside another in this way.
Some of the elements define attributes inside the angle brackets as well as a tag name – the <a> and the <img> tags, for example. Attributes contain extra information about that tag that the browser uses to render the Web page. The <img> tag in the example above denotes that the browser should embed an image in the rendered Web page, and it should contain the following attributes:
src. This refers to the Uniform Resource Locator (URL) of the image to embed in the page.
height. This denotes how high the image should be in the rendered Web page – in this case, 50 pixels.
width. This denotes how wide the image should be in the rendered Web page.
border. This denotes the width of the border around the image – in this case, the border is set to 0 so browser should not display a border.
Note that the HTML document does not contain the image itself. Instead, the document contains only the URL of the image. When the browser renders this document, it performs the following steps.
The browser sees the <img> tag in the document and recognizes that it should display an embedded image.
The browser creates an HTTP GET request for the specified URL for the image file, and sends this request to the Web server.
When the server has responded by sending the image, the browser embeds the image in the rendered Web page.