A Uniform Resource Locator (URL) is a compact representation of the location and access method for a resource located on the Internet. Each URL consists of a scheme (HTTP, HTTPS, or FTP) and a scheme-specific string. This string can also include a combination of a directory path, search string, or name of the resource. The WinINet functions provide the ability to create, combine, break down, and canonicalize URLs. For more information on URLs, see
RFC-1738 on Uniform Resource Locators (URL).
The URL functions operate in a task-oriented manner. The content and format of the URL that is given to the function is not verified. The calling application should track the use of these functions to ensure that the data is in the intended format. For example, the
InternetCanonicalizeUrl function would convert the character "%" into the escape sequence "%25" when using no flags. If
InternetCanonicalizeUrl is used on the canonicalized URL, the escape sequence "%25" would be converted into the escape sequence "%2525", which would not work properly.
What Is a Canonicalized URL?
The format of all URLs must follow the accepted syntax and semantics in order to access resources through the Internet. Canonicalization is the process of formatting a URL to follow this accepted syntax and semantics.
Characters that must be encoded include any characters that have no corresponding graphic character in the US-ASCII coded character set (hexadecimal 80-FF, which are not used in the US-ASCII coded character set, and hexadecimal 00-1F and 7F, which are control characters), blank spaces, "%" (which is used to encode other characters), and unsafe characters (<, >, ", #, {, }, |, \, ^, ~, [, ], and ').
Using the WinINet Functions to Handle URLs
The following table summarizes the URL functions.
Canonicalizing URLs
Canonicalizing a URL is the process that converts a URL, which might contain unsafe characters such as blank spaces, reserved characters, and so on, into an accepted format.
The
InternetCanonicalizeUrl function can be used to canonicalize URLs. This function is very task-oriented, so the application should track its use carefully.
InternetCanonicalizeUrl does not verify that the URL passed to it is already canonicalized and that the URL that it returns is valid.
The following five flags control how
InternetCanonicalizeUrl handles a particular URL. The flags can be used in combination. If no flags are used, the function encodes the URL by default.
| Value | Meaning |
| ICU_BROWSER_MODE | Do not encode or decode characters after "#" or "?", and do not remove trailing white space after "?". If this value is not specified, the entire URL is encoded, and trailing white space is removed. |
| ICU_DECODE | Convert all %XX sequences to characters, including escape sequences, before the URL is parsed. |
| ICU_ENCODE_SPACES_ONLY | Encode spaces only. |
| ICU_NO_ENCODE | Do not convert unsafe characters to escape sequences. |
| ICU_NO_META | Do not remove meta sequences (such as "." and "..") from the URL. |
The ICU_DECODE flag should be used only on canonicalized URLs, because it assumes that all %XX sequences are escape codes and converts them into the characters indicated by the code. If the URL has a "%" symbol in it that is not part of an escape code, ICU_DECODE still treats it as one. This characteristic might cause
InternetCanonicalizeUrl to create an invalid URL.
To use
InternetCanonicalizeUrl to return a completely decoded URL, the ICU_DECODE and ICU_NO_ENCODE flags must be specified. This setup assumes that the URL being passed to
InternetCanonicalizeUrl has been previously canonicalized.
Combining Base and Relative URLs
A relative URL is a compact representation of the location of a resource relative to an absolute base URL. The base URL must be known to the parser and usually includes the scheme, network location, and parts of the URL path. An application can call
InternetCombineUrl to combine the relative URL with its base URL.
InternetCombineUrl also canonicalizes the resultant URL.
Cracking URLs
The
InternetCrackUrl function separates a URL into its component parts and returns the components indicated by the
URL_COMPONENTS structure that is passed to the function.
The components that make up the
URL_COMPONENTS structure are the scheme number, host name, port number, user name, password, URL path, and additional information (such as search parameters). Each component, except the scheme and port numbers, has a string member that holds the information, and a member that holds the length of the string member. The scheme and port numbers have only a member that stores the corresponding value; they are both returned on all successful calls to
InternetCrackUrl.
To get the value of a particular component in the
URL_COMPONENTS structure, the member that stores the string length of that component must be set to a nonzero value. The string member can be either the address of a buffer or NULL.
If the pointer member contains the address of a buffer, the string length member must contain the size of that buffer.
InternetCrackUrl returns the component information as a string in the buffer and stores the string length in the string length member.
If the pointer member is NULL, the string length member can be set to any nonzero value.
InternetCrackUrl stores the address of the first character of the URL string that contains the component information and sets the string length to the number of characters in the remaining part of the URL string that pertains to the component.
All pointer members set to NULL with a nonzero length member point to the appropriate starting point in the URL string. The length stored in the length member must be used to determine the end of the individual component's information.
To finish initializing the
URL_COMPONENTS structure properly, the
dwStructSize member must be set to the size of the
URL_COMPONENTS structure, in bytes.
The following example returns the components of the URL in the edit box, IDC_PreOpen1, and returns the components to the list box, IDC_PreOpenList. To display only the information for an individual component, this function copies the character immediately after the component's information in the string and temporarily replaces it with a NULL.
#include <windows.h>
#include <strsafe.h>
#include <wininet.h>
#include <stdlib.h>
#pragma comment(lib, "wininet.lib")
#pragma comment(lib, "user32.lib")
#define CRACKER_BUFFER_SIZE MAX_PATH
// For sample source code implementing the InternetErrorOut( )
// function referenced below, see the "Handling Errors" topic
// under "Using WinInet"
extern BOOL WINAPI InternetErrorOut( HWND hWnd, DWORD dwError,
LPCTSTR szFailingFunctionName );
// Forward declaration of listUrlPart helper functions:
BOOL listURLpart( HWND hDlg, int nListBoxID,
LPTSTR szPartName, LPTSTR part, DWORD partLength );
BOOL listURLpart( HWND hDlg, int nListBoxID,
LPTSTR szPartName, int partValue );
// Static list describing the URL Scheme types
// enumerated in INTERNET_SCHEME:
TCHAR* schemeType[] =
{
TEXT( "[Partial URL]" ), // 0
TEXT( "[Unknown scheme]" ), // 1
TEXT( "[Default scheme]" ), // 2
TEXT( "FTP" ), // 3
TEXT( "Gopher" ), // 4
TEXT( "HTTP" ), // 5
TEXT( "HTTPS" ), // 6
TEXT( "File" ), // 7
TEXT( "News" ), // 8
TEXT( "MailTo" ), // 9
TEXT( "Socks" ), // 10
TEXT( "JavaScript" ), // 11
TEXT( "VBScript" ) // 12
};
#define CRACKER_SCHEME_TYPE_ARRAY_SIZE 13
BOOL WINAPI Cracker( HWND hDlg, int nURLtextBoxId, int nListBoxId )
{
int i, j;
TCHAR* failedFunctionName;
TCHAR URL_buffer[CRACKER_BUFFER_SIZE];
URL_COMPONENTS URLparts;
URLparts.dwStructSize = sizeof( URLparts );
// The following elements determine which components are displayed
URLparts.dwSchemeLength = 1;
URLparts.dwHostNameLength = 1;
URLparts.dwUserNameLength = 1;
URLparts.dwPasswordLength = 1;
URLparts.dwUrlPathLength = 1;
URLparts.dwExtraInfoLength = 1;
URLparts.lpszScheme = NULL;
URLparts.lpszHostName = NULL;
URLparts.lpszUserName = NULL;
URLparts.lpszPassword = NULL;
URLparts.lpszUrlPath = NULL;
URLparts.lpszExtraInfo = NULL;
SendDlgItemMessage( hDlg, nListBoxId, LB_RESETCONTENT, 0, 0 );
if( !GetDlgItemText( hDlg, nURLtextBoxId,
URL_buffer, CRACKER_BUFFER_SIZE ) )
{
failedFunctionName = TEXT( "GetDlgItemText" );
goto CrackerError_01;
}
if( FAILED( StringCchLength( URL_buffer, CRACKER_BUFFER_SIZE,
(size_t*) &i ) ) )
{
failedFunctionName = TEXT( "StringCchLength" );
goto CrackerError_01;
}
if( !InternetCrackUrl( URL_buffer, strlen( URL_buffer ), 0,
&URLparts ) )
{
failedFunctionName = TEXT( "InternetCrackUrl" );
goto CrackerError_01;
}
failedFunctionName = TEXT( "listURLpart" );
i = URLparts.nScheme + 2;
if( ( i >= 0 ) && ( i < CRACKER_SCHEME_TYPE_ARRAY_SIZE ) )
{
StringCchLength( schemeType[i],
CRACKER_BUFFER_SIZE,
(size_t*) &j );
if( !listURLpart( hDlg, nListBoxId,
TEXT("Scheme type"),
schemeType[i], j ))
goto CrackerError_01;
}
if( !listURLpart( hDlg, nListBoxId, TEXT( "Scheme text" ),
URLparts.lpszScheme,
URLparts.dwSchemeLength ) ||
!listURLpart( hDlg, nListBoxId, TEXT( "Host name" ),
URLparts.lpszHostName,
URLparts.dwHostNameLength) ||
!listURLpart( hDlg, nListBoxId, TEXT( "Port number" ),
(int) URLparts.nPort ) ||
!listURLpart( hDlg, nListBoxId, TEXT( "User name" ),
URLparts.lpszUserName,
URLparts.dwUserNameLength) ||
!listURLpart( hDlg, nListBoxId, TEXT( "Password" ),
URLparts.lpszPassword,
URLparts.dwPasswordLength) ||
!listURLpart( hDlg, nListBoxId, TEXT( "Path" ),
URLparts.lpszUrlPath,
URLparts.dwUrlPathLength) ||
!listURLpart( hDlg, nListBoxId, TEXT( "Extra information"),
URLparts.lpszExtraInfo,
URLparts.dwExtraInfoLength))
goto CrackerError_01;
return( TRUE );
CrackerError_01:
// For sample source code of the InternetErrorOut( ) function
// referenced below, see the "Handling Errors"
// topic under "Using WinInet"
InternetErrorOut( hDlg, GetLastError( ), failedFunctionName );
return FALSE;
}
// listURLpart( ) helper function for string parts
BOOL listURLpart( HWND hDlg, int nListBoxId,
LPTSTR szPartName, LPTSTR part, DWORD partLength )
{
TCHAR outputBuffer[CRACKER_BUFFER_SIZE];
LPTSTR nextStart;
size_t nextSize;
if( partLength == 0 ) // Just skip empty ones
return( TRUE );
if( FAILED( StringCchCopyEx( outputBuffer,
(size_t) CRACKER_BUFFER_SIZE,
szPartName, &nextStart,
&nextSize, 0 ) ) ||
FAILED( StringCchCopyEx( nextStart, nextSize, TEXT( ": " ),
&nextStart, &nextSize, 0 ) ) ||
FAILED( StringCchCopyNEx( nextStart, nextSize, part,
(size_t) partLength,
&nextStart, &nextSize, 0 ) ) )
return( FALSE );
*nextStart = 0;
if( SendDlgItemMessage( hDlg, nListBoxId, LB_ADDSTRING, 0,
(LPARAM)outputBuffer ) < 0 )
return( FALSE );
return( TRUE );
}
// listURLpart( ) helper function for numeric parts
BOOL listURLpart( HWND hDlg, int nListBoxId,
LPTSTR szPartName, int partValue )
{
TCHAR outputBuffer[CRACKER_BUFFER_SIZE];
if( FAILED( StringCchPrintf( outputBuffer,
(size_t) CRACKER_BUFFER_SIZE,
TEXT( "%s: %d" ), szPartName,
partValue ) ) ||
( SendDlgItemMessage( hDlg, nListBoxId, LB_ADDSTRING, 0,
(LPARAM)outputBuffer ) < 0 ) )
return( FALSE );
return( TRUE );
}
Creating URLs
The
InternetCreateUrl function uses the information in the
URL_COMPONENTS structure to create a Uniform Resource Locator.
The components that make up the
URL_COMPONENTS structure are the scheme, host name, port number, user name, password, URL path, and additional information (such as search parameters). Each component, except the port number, has a string member that holds the information, and a member that holds the length of the string member.
For each required component, the pointer member should contain the address of the buffer holding the information. The length member should be set to zero if the pointer member contains the address of a zero-terminated string; the length member should be set to the string length if the pointer member contains the address of a string that is not zero-terminated. The pointer member of any components that are not required must be NULL.
Accessing URLs Directly
FTP, and HTTP resources on the Internet can be accessed directly by using the
InternetOpenUrl,
InternetReadFile, and
InternetFindNextFile functions.
InternetOpenUrl opens a connection to the resource at the URL passed to the function. When this connection is made, there are two possible steps. First, if the resource is a file,
InternetReadFile can download it; second, if the resource is a directory,
InternetFindNextFile can enumerate the files within the directory (except when using CERN proxies). For more information on
InternetReadFile, see
Reading Files. For more information on
InternetFindNextFile, see
Finding the Next File.
For applications that need to operate through a CERN proxy,
InternetOpenUrl can be used to access FTP directories and files. The FTP requests are packaged to appear like an HTTP request, which the CERN proxy would accept.
InternetOpenUrl uses the
HINTERNET handle created by the
InternetOpen function and the URL of the resource. The URL must include the scheme (http:, ftp:, file: [for a local file], or https: [for hypertext protocol secure]) and network location (such as www.microsoft.com). The URL can also include a path (for example, /isapi/gomscom.asp?TARGET=/windows/feature/) and resource name (for example, default.htm). For HTTP or HTTPS requests, additional headers can be included.
InternetQueryDataAvailable,
InternetFindNextFile,
InternetReadFile, and
InternetSetFilePointer (HTTP or HTTPS URLs only) can use the handle that is created by
InternetOpenUrl to download the resource.
The following diagram illustrates which handles to use with each function.
.png)
The root
HINTERNET handle created by
InternetOpen is used by
InternetOpenUrl. The
HINTERNET handle created by
InternetOpenUrl can be used by
InternetQueryDataAvailable,
InternetReadFile,
InternetFindNextFile (not shown here), and
InternetSetFilePointer (HTTP or HTTPS URLs only).
For more information, see
HINTERNET Handles.
Send comments about this topic to Microsoft
Build date: 11/19/2009