Share via


CreateUri function

Creates a new IUri instance, and initializes it from a Uniform Resource Identifier (URI) string. CreateUri also normalizes and validates the URI.

Syntax

STDAPI CreateUri(
  _In_       LPCWSTR   pwzURI,
  _In_       DWORD     dwFlags = Uri_CREATE_CANONICALIZE,
  _Reserved_ DWORD_PTR dwReserved,
  _Out_      IUri      **ppURI
);

Parameters

pwzURI [in]

A constant pointer to a UTF-16 character string that specifies the URI.

dwFlags [in]

A valid combination of the following flags.

Uri_CREATE_ALLOW_RELATIVE (0x0001)

Default. If the scheme is unspecified and not implicitly "file," assume relative.

Uri_CREATE_ALLOW_IMPLICIT_WILDCARD_SCHEME (0x0002)

If the scheme is unspecified and not implicitly "file," assume wildcard.

Uri_CREATE_ALLOW_IMPLICIT_FILE_SCHEME (0x0004)

Default. If the scheme is unspecified and URI starts with a drive letter (X:) or UNC path (\\), assume "file."

Uri_CREATE_NOFRAG (0x0008)

If there is a query string, don't look for a fragment.

Uri_CREATE_NO_CANONICALIZE (0x0010)

Do not canonicalize the scheme, host, authority, path, query, or fragment.

Uri_CREATE_CANONICALIZE (0x0100)

Default. Canonicalize the scheme, host, authority, path, query, and fragment.

Uri_CREATE_FILE_USE_DOS_PATH (0x0020)

Use DOS path compatibility mode to create "file" URIs.

Uri_CREATE_DECODE_EXTRA_INFO (0x0040)

Default. Perform the percent-encoding and percent-decoding canonicalizations on the query and fragment. This flag takes precedence over Uri_CREATE_NO_CANONICALIZE.

Uri_CREATE_NO_DECODE_EXTRA_INFO (0x0080)

Do not perform the percent-encoding or percent-decoding canonicalizations on the query and fragment. This flag takes precedence over Uri_CREATE_CANONICALIZE.

Uri_CREATE_CRACK_UNKNOWN_SCHEMES (0x0200)

Default. Hierarchical URIs with unrecognized schemes will be treated like hierarchical URIs.

Uri_CREATE_NO_CRACK_UNKNOWN_SCHEMES (0x0400)

Hierarchical URIs with unrecognized schemes will be treated like opaque URIs.

Uri_CREATE_PRE_PROCESS_HTML_URI (0x0800)

Default. Perform preprocessing on the URI to remove control characters and white space, as if the URI had come from the raw href value of an HTML page.

Uri_CREATE_NO_PRE_PROCESS_HTML_URI (0x1000)

Do not perform preprocessing to remove control characters and white space as appropriate.

Uri_CREATE_IE_SETTINGS (0x2000)

Use Internet Explorer registry settings to determine default URL-parsing behavior.

Uri_CREATE_NO_IE_SETTINGS (0x4000)

Default. Do not use Internet Explorer registry settings.

Uri_CREATE_NO_ENCODE_FORBIDDEN_CHARACTERS (0x8000)

Do not percent-encode characters that are forbidden by RFC-3986. Use with Uri_CREATE_FILE_USE_DOS_PATH to create file monikers.

Uri_CREATE_NORMALIZE_INTL_CHARACTERS (0x00010000)

Default. Percent encode all extended Unicode characters, then decode all percent encoded extended Unicode characters (except those identified as dangerous).

dwReserved [in]

Reserved. Must be set to 0.

ppURI [out]

An IUri interface pointer that receives the new instance.

Return value

Returns one of the following values.

Return code Description
S_OK

Success.

E_INVALIDARG

dwFlags conflict, or ppURI is NULL.

E_OUTOFMEMORY

There is insufficient memory to create the IUri.

INET_E_INVALID_URL

The string does not contain a recognized URI format.

INET_E_SECURITY_PROBLEM

The URI contains syntax that attempts to bypass security.

E_FAIL

Unknown error while parsing the URI.

 

Remarks

CreateUri returns E_INVALIDARGS if conflicting flags are specified in dwFlags. For example, Uri_CREATE_DECODE_EXTRA_INFO and Uri_CREATE_NO_DECODE_EXTRA_INFO, or Uri_CREATE_ALLOW_RELATIVE and Uri_CREATE_ALLOW_IMPLICIT_WILDCARD_SCHEME. INET_E_SECURITY_PROBLEM is returned if the URI specifies userinfo but the Windows Internet Explorer feature control FEATURE_HTTP_USERNAME_PASSWORD_DISABLE is enabled.

Hierarchical vs. Opaque Protocol Schemes

Hierarchical URIs and opaque URIs are mutually exclusive. A hierarchical URI conforms to the RFC-defined syntax for URIs. (Refer to RFC3986: Uniform Resource Identifier (URI), Generic Syntax.) An opaque URI is parsed without an authority in the following manner.

scheme ":" path [ "#" fragment ]  

By default, all URIs are treated as hierarchical unless the Uri_CREATE_NO_CRACK_UNKNOWN_SCHEMES is set. (Unknown protocol schemes are those not defined in the URL_SCHEME enumeration.) The two flags Uri_CREATE_ALLOW_RELATIVE and Uri_CREATE_ALLOW_IMPLICIT_WILDCARD_SCHEME only apply if the string input is not an implicit file path or an absolute (hierarchical) URI. The syntax for relative URIs is a shortened form of the syntax for absolute URIs, where some prefix of the URI is missing and path segments ("." and "..") are allowed to remain until combined with a base URI. The wildcard URI scheme might be explicitly stated as "*:[[//]authority][path]," or implicitly stated by the "authority[path]" form.

CreateUri can parse URIs in both the URL syntax and the Uniform Resource Name (URN) syntax. The difference between URLs and URNs is whether there is a protocol that enables access to the identified resource. Accessing the resource identified by an IUri is outside the scope of the Consolidated URL (cURL) API.

Creating File Schemes from File Paths

There are two kinds of file scheme URIs. The first is the well-formed, or "healthy," URL style that supports query strings, fragments, percent-encoded octets, and so on. The other is basically a DOS file path with "file://" prepended to the front. This latter form is generated when Uri_CREATE_FILE_USE_DOS_PATH is set and should be used only for legacy communication.

Warning   Legacy file scheme URIs should be used only with legacy APIs that will not accept healthy file scheme URIs. Legacy file scheme URIs do not allow percent encoded octets, which can lead to ambiguity. Therefore, legacy file scheme URIs should not be used unless absolutely necessary.

 

The following is a comparison of the two forms of file scheme URIs.

DOSPATH:   C:\Windows\My Documents 100%20\file.txt
HEALTHY:   file:///C:/Windows/My%20Documents%20100%2520/file.txt
LEGACY:    file://C:\Windows\My Documents 100%20\file.txt

DOSPATH:   \\server\share\My Documents 100%20\file.txt
HEALTHY:   file://server/share/My%20Documents%20100%2520/file.txt
LEGACY:    file://\\server\share\My Documents 100%20\file.txt 

The Uri_CREATE_ALLOW_IMPLICIT_FILE_SCHEME flag allows the creation of a file scheme URI from a Microsoft Win32 file path. It doesn't change the interpretation of the input string; that is, if a Win32 file path is passed in, CreateUri either succeeds or fails based on the Uri_CREATE_ALLOW_IMPLICIT_FILE_SCHEME flag; it won't change the interpretation of the input string.

Understanding Canonicalization

Canonicalization, or conversion into the standard URI format, involves the following steps.

  1. The scheme is changed to lowercase.

  2. If the host is an IPv4 or IPv6 address, it is converted to normal form.

  3. If the host is a named host, it is changed to lowercase. Internationalized Domain Names (IDNs) with labels in Punycode are converted to Unicode.

  4. If the explicit port is the same as the default port for the scheme, it is removed.

  5. Backslash (\) characters in the path are changed to forward slash characters (/) in http, https, ftp, news, nntp, snews, and telnet schemes.

  6. If the URI has an authority but no path, the path is set to "/".

  7. Relative path segments "./" and "../" are removed, and the path is shortened as appropriate.

  8. Percent-encoded characters in the format "%XX," (where X is a hexadecimal digit) are decoded, if they are unreserved.

  9. Characters that are forbidden to appear in a URI are percent encoded. Forbidden characters are those that are neither in the "reserved" nor "unreserved" sets. The percent sign (%), which is used for percent encoding, is allowed. Refer to the following table for details.

    Class Characters
    unreserved alphanumeric, hyphen (-), period (.), underscore (_), and tilde (~)
    reserved gen-delims + sub-delims
    gen-delims colon (:), slash (/), question mark (?), hash (#), square brackets ([]), and at sign (@)
    sub-delims exclamation point (!), dollar sign ($), ampersand (&), single quote ('), parentheses (()), asterisk (*), plus sign (+), comma (,), semicolon (;), and equal sign (=)

     

The following is a raw URI value.

hTTp://us%45r%3Ainfo@examp%4CE.com:80/path/a/b/./c/../%2E%2E/Forbidden'<|> Characters

After canonicalization, the absolute URI appears as follows.

http://usEr%3Ainfo@example.com/path/a/Forbidden%60%3C%7C%3E%20Characters
  • In the username component, the %45 is decoded to "E" because it is in the unreserved set, while the %3A (@) is not.
  • In the host component, the %4C is first decoded to "L," and then changed to lowercase.
  • The port "80" (the default port for http) is removed.
  • The "./" in the path is removed.
  • The "../" following the "c/" in the path is removed along with its logical parent, the "c/" path segment.
  • The %2E characters are in the unreserved set and are converted to "." forming "../". This new "../" is removed along with its logical parent path segment, which in this case is "b/."
  • All of the characters between "Forbidden" and "Characters" (including the space) are percent encoded because they are forbidden to appear in a URI.

Examples

The following example creates an IUri object from a NULL-terminated URI string and then uses IUri::GetHost to retrieve the host value.

IUri *pIUri = NULL;
HRESULT hr = CreateUri(
    pwszUri,                    // NULL terminated URI
    Uri_CREATE_ALLOW_RELATIVE,  // Flags to control behavior
    0,                          // Reserved must be 0
    &pIUri);

if (SUCCEEDED(hr))
{
    BSTR bstrHost = NULL;
    hr = pIUri->GetHost(&bstrHost);

    if (S_OK == hr)
    {
        // Host exists. Do something with it.
        SysFreeString(bstrHost);
    }
    else if (S_FALSE == hr)
    {
        // No Host in this URI.
    }

    pIUri->Release();
}

Requirements

Minimum supported client

Windows XP with SP2

Minimum supported server

Windows Server 2003 with SP1

Product

Internet Explorer 7

Header

Urlmon.h

Library

Urlmon.lib

DLL

Urlmon.dll

See also

Reference

CreateUriFromMultiByteString

CreateUriWithFragment