Cutting Edge

Validating ASP.NET Query Strings

Dino Esposito

Code download available at: Cutting Edge 2007_03.exe(168 KB)

Contents

The Problem
Defining a Strategy
Declarative Query Strings
Coding the QueryString HTTP Module
Query String Validation
Considerations and Alternatives
Summary

For years, classic ASP developers implemented page authentication by inserting some generic code at the top of each page that would grab user credentials, attach a cookie, and redirect. All that repetitive code was swept away by the ASP.NET HTTP modules for authentication. As a result, ASP.NET applications don't have to link each and every page being secured to the authentication module of choice. Everything can be done declaratively through the web.config file and a bunch of external resources, such as the login page and the membership database.

ASP.NET also introduced other system modules and programming techniques that minimize repetitive code and rationalize the implementation of a Web application's common features. For example, site maps, anonymous users, and profiles are now built-in features and you no longer have to write or copy the code for them over and over.

The focus on security also led to the inclusion of a fair number of native barriers in the ASP.NET runtime-this saves the developer from the burden of cross-checking input data against at least some possible forms of attack. Of course, this doesn't mean that an ASP.NET application is secure by design, but it does mean that the security bar starts higher than in the past. However, it is still up to developers to raise it even higher.

ASP.NET pages are designed to post data to themselves and group input parameters in the body of an HTTP POST packet. Most ASP.NET applications don't use the query string to pass input data as frequently as classic ASP applications did. Nonetheless, the query string is still a legitimate way to import external data into an ASP.NET page. But who validates this data?

Recent statistics show that cross-site scripting (XSS) attacks are gaining momentum and they claim the lion's share of discovered attacks. Successful XSS attacks are always due to unvalidated or improperly validated input data, and more often than not, this data comes through the query string.

Starting with version 1.1, ASP.NET preprocesses any posted data (forms and query string), looking for suspicious combinations of characters that may be exploited by XSS attackers. But this barrier is not a silver bullet and, as Michael Howard says in his November 2006 MSDN®Magazine article, "Secure Habits: 8 Simple Rules For Developing More Secure Code" (available at msdn.microsoft.com/msdnmag/issues/06/11/SecureHabits), you have to take responsibility. If your pages use query string parameters, you need to ensure that they are properly validated before use. How do you do that?

In this column, I build an HTTP module that reads an XML file where you have hardcoded the expected structure of the query string. The module then validates the query string of any requested page against the given schema. And you don't need to touch the code of any page. (For more on preventing XSS attacks, see the Microsoft Anti-Cross Site Scripting Library v1.5)

The Problem

Developers can't afford to just leave pages that take input from the query string unattended. Values have to be validated and the format of the query string carefully checked. Such a validation process contains two distinct steps: static validation (which checks the type and existence of required parameters) and dynamic validation (which verifies whether specified values are coherent with the expectations of the rest of the code). Dynamic validation is specific to each page and can't be delegated to an external, page-agnostic component. In contrast, static validation, relies on a list of general-purpose checks (parameter required, type, length) that can be executed without instantiating the page.

As with classic ASP where you had to include authentication generic code in each and every secured page, in ASP.NET you have to include query string validation code in each page. ASP.NET moved the authentication standard code to a small group of system-provided HTTP modules, but didn't take care of the query string. On the other hand, the growth of XSS and SQL injection attacks lately poses the problem of cross-checking any possible source of input. An external component linked to the application that implements strict static validation for the query string parameters is a great help because it automatically ensures that no ASP.NET page requests are ever executed when the query string doesn't comply with the declared schema.

More importantly, with an external component, no changes are required to the page's source code. All that you have to do is register the component with the application through the configuration file and add an XML file that describes the query string syntax for each interested page. Let's detail the strategy in a bit more depth.

Defining a Strategy

ASP.NET offers HTTP modules as a tool to inject your own code into the runtime pipeline before the requested page class is instantiated and processed. From a syntax perspective, an HTTP module is merely a class that implements a given interface. From a broader architectural perspective, an HTTP module is a kind of observer with the same lifetime as the application. The module observes the request processing activity and registers to listen for a few specific events, such as BeginRequest, EndRequest, or PostMapRequestHandler. The full list of application events for an ASP.NET request can be found in the documentation of the System.Web.HttpApplication class (msdn2.microsoft.com/0dbhtdck.aspx).

Once installed, an HTTP module kicks in each time a request processed by the ASP.NET runtime reaches the stage when the observed event is fired. Note that the ASP.NET runtime doesn't necessarily process requests for all resources hosted by the ASP.NET application. By default, static resources, such as cascading style sheets (CSS) and JPG files, are served directly by the Web server without even bothering the ASP.NET application, unless IIS has been configured to allow ASP.NET to handle these resources.

My query string HTTP module will listen for the begin-request event and validate the contents of the query string against a previously loaded schema. If the number of parameters match and the provided value is compatible with the expected type, the module lets the request reach the next stage. Otherwise, the request is terminated with an appropriate HTTP status code or an ASP.NET exception is thrown.

I mentioned an XML file where the syntax of the query string would be stored. This doesn't really have to be an XML file. (If it is an XML file, the schema is entirely up to you.) You just need a data source that declaratively persists information about the expected structure of a page's query string. It could be a simple XML file, or it could be a sophisticated provider-based service. My June 2006 column provides a good example of a custom application service designed to use providers (msdn.microsoft.com/msdnmag/issues/06/06/CuttingEdge).

Declarative Query Strings

Figure 1 shows a sample XML file and schema that the query string HTTP module will recognize. Under the root node <querystring>, there are as many <page> nodes as there are pages in the application that may process values from the query string. In the companion code of this column, the file shown in Figure 1 is named web.querystring. The name is arbitrary, of course, as is the schema.

Figure 1 Sample web.querystring Configuration File

<!--
<page url="..." abortOnError="TRUE|false">
  <param name="..." 
         type="Int|Text|Bool" 
         optional="FALSE|true"
         length="number (for Text type only)"
         casesensitive="false|true" />
</page>
-->

<querystring>

  <page url="/source/test.aspx" abortOnError="true">
    <param name="id" type="Int" optional="true" casesensitive="true" />
    <param name="code" type="Text" length="5" optional="false" />
    <param name="detailed" type="Bool" optional="false" />
  </page>

  <page url="/source/Test1.aspx" abortOnError="false">
    <param name="guid" type="Int" />
  </page>

</querystring>

(From a security standpoint, the main issue is not that a page receives values through the query string but that the page may use those values. If some code in the page processes input sent through the query string, then as a developer you must ensure that the input is safe and not evil. For this reason, you might want to add into the XML file a <page> node only for the pages in the application that actually consume data via the query string.)

In the sample schema, the <page> element has two attributes: url and abortOnError. The former indicates the relative URL to the page, while the latter is an optional Boolean attribute that indicates whether the page request should be aborted in case of bad input. If you choose to abort the page, then the user either receives an HTTP error or an ASP.NET exception, based on what you decide to do after unacceptable data is found in the query string. Regardless of how you manifest the outcome, there's no need to edit the code of the involved ASP.NET page. The possible termination of the request occurs in the HTTP module, before the page class is identified and instantiated.

There is an alternative approach. In it, the HTTP module lets the request go, but adds detailed information to the HTTP context to notify the page class of what was detected. The page then has the responsibility of taking proper countermeasures, such as showing an ad hoc error page. In this situation, the page author must integrate any query string anomalies in the context of the error-handling strategy set for the application. The downside of this approach is that it requires changes to the code of each page involved with the query string. (I'll return to this point later.)

The abortOnError attribute is set to true by default, meaning that any anomaly in the query string will abort the page request. Under each <page> node, there is a list of <param> nodes-one for each supported query-string parameter. In the sample code, a parameter can be defined using the attributes in Figure 2.

Figure 2 Attributes Supported by the Node

Attribute Description
Name Indicates the name of the query parameter.
Type Indicates the type of the query string parameter. Feasible values are: Text, Int, Bool.
Optional A Boolean attribute, this indicates whether the parameter is optional or not. It is set to false by default.
Length Indicates the maximum length of a parameter of type Text.
CaseSensitive A Boolean attribute, this indicates whether the name of the parameter is case sensitive. It is set to false by default, meaning the parameter can be specified on a query string with any combination of lowercase and uppercase letters.

All values passed on the query string are received by ASP.NET as strings. The QueryString property defined on the HttpRequest object is, therefore, a NameValueCollection object where keys and values are strings. The string format, though, is a mere serialization format. Each query string parameter can surely represent not only a string but also a Boolean or numeric value, plus special string subtypes like URL, GUID, and file names. Therefore, in the web.querystring file, you specify the expected type of the parameter using the values of a custom enum type, QueryStringParamTypes:

Friend Enum QueryStringParamTypes As Integer
    Text = 1
    Int = 2
    Bool = 3
End Enum

The list of supported types can be extended to add, for example, various numeric types. Parameters of type Text can also specify a maximum length through the attribute Length. A page that can accept, say, a 5-char customer ID from the query string has no reason for not limiting the length of that parameter. In addition, the web.querystring may be used to enable checks on the case sensitivity of the parameter's name and may designate a parameter as optional. The contents of the web.querystring file are parsed by the query string HTTP module and transformed into an in-memory object.

Coding the QueryString HTTP Module

The source code of the QueryString HTTP module is shown in Figure 3. As mentioned, an HTTP module class implements IHttpModule, which consists of Init and Dispose methods. These methods are invoked when the module is loaded and unloaded in the context of the application. In the Init method, an HTTP module typically registers a listener for the application events it wants to observe. In this case, it registers a handler for the BeginRequest event. In addition, the module processes the web.querystring file and creates an in-memory representation of its contents. The Init method is invoked only once per application-the contents of the configuration file are read the first time and cached, and changes to the web.querystring file are not detected until the Web application is restarted. This is not necessarily a problem as it is fairly unlikely you will need to enter changes to the web.querystring file in production without stopping and restarting the application. However, you could also extend the code in Figure 3 to use a file watcher object that detects any changes to the web.querystring file and reload it in a timely fashion.

Figure 3 QueryStringModule Class

Imports System
Imports System.Web
Imports System.IO

Public Class QueryStringModule : Implements IHttpModule

    Private _app As HttpApplication
    Private _queryStringData As Hashtable

    Public Sub Init(ByVal context As System.Web.HttpApplication) _
            Implements System.Web.IHttpModule.Init
        _app = context
        AddHandler _app.BeginRequest, AddressOf OnEnter

        ' Load and cache the XML querystring file
        Dim fileName As String = _
            HttpContext.Current.Server.MapPath("web.querystring")
        _queryStringData = QueryStringHelper.LoadFromFile(fileName)
    End Sub

    Public Sub Dispose() Implements System.Web.IHttpModule.Dispose
    End Sub

    Private Sub OnEnter(ByVal source As Object, ByVal e As EventArgs)
        ' Retrieve the query string data structure for the current page
        Dim currentPage As String = _
            HttpContext.Current.Request.Path.ToLower()
        Dim qsDesc As QueryStringDescriptor = _
            _queryStringData.Item(currentPage)

        ' Validate the query string
        Dim isValid As Boolean
        isValid = QueryStringHelper.Validate( _
           HttpContext.Current.Request.QueryString, qsDesc)

        ' Abort the request if validation fails
        If Not isValid Then
            If qsDesc.AbortOnError Then
                HttpContext.Current.Response.StatusCode = 500
                HttpContext.Current.Response.[End]()
            Else
                ' Add information about the error to Context.Items
                HttpContext.Current.Items( _
                    QueryStringHelper.QueryStringValidationStatus) = _
                        QueryStringHelper.GetErrorCode()
            End If
        Else
            ' Add information for the page to the Context.Items
            HttpContext.Current.Items( _
                QueryStringHelper.QueryStringValidationStatus) = _
                    QueryStringHelper.GetErrorCode()

            ' Add typed values  
            HttpContext.Current.Items( _
                QueryStringHelper.QueryStringValues) = _
                    QueryStringHelper.GetTypedValues( _
                        HttpContext.Current.Request.QueryString, qsDesc)
        End If
    End Sub
End Class

The contents of the web.querystring file are mapped to an object of type QueryStringDescriptor, as shown in Figure 4. The descriptor contains the URL of the page, a flag to indicate what to do in case of failed validation, and the list of supported query string parameters. Each parameter is described through an instance of the QueryStringParamInfo class. QueryStringParamCollection is the related collection class. It is a typical generic collection class enriched with a pair of Find methods: one to verify whether a parameter with a given name can be found in the collection and one to return the parameter descriptor instance.

Figure 4 Helper Classes for the QueryString Module

Friend Class QueryStringDescriptor
    Public Url As String
    Public AbortOnError As Boolean
    Public Parameters As QueryStringParamCollection
End Class

Friend Class QueryStringParamInfo
    Public Name As String
    Public [Type] As QueryStringParamTypes
    Public Length As Integer
    Public [Optional] As Boolean
    Public CaseSensitive As Boolean
End Class

Friend Class QueryStringParamCollection : Inherits Collection( _
        Of QueryStringParamInfo)

    Public Overloads Function Contains(ByVal name As String) As Boolean
        For i As Integer = 0 To Count - 1
            Dim comparison As StringComparison = _
                StringComparison.OrdinalIgnoreCase
            Dim currentItem As QueryStringParamInfo = Item(i)
            If currentItem.CaseSensitive Then
                comparison = StringComparison.Ordinal
            End If

            If String.Equals(currentItem.Name, name, comparison) Then
                Return True
            End If
        Next

        Return False
    End Function

    Public Function Find(ByVal name As String) As QueryStringParamInfo
        For i As Integer = 0 To Count - 1
            Dim currentItem As QueryStringParamInfo = Item(i)
            If String.Equals(currentItem.Name, name, _
                    StringComparison.OrdinalIgnoreCase) Then
                Return currentItem
            End If
        Next

        Return Nothing
    End Function
End Class

<Flags()> _
Public Enum QueryStringErrorCodes
    NoError = 0
    TooManyParameters = 1
    InvalidQueryParameter = 2
    MissingRequiredParameter = 4
    InvalidContent = 8
End Enum

The query string descriptor caches information about the query string of a given page. The web.querystring file, though, can reference multiple pages. For this reason, all descriptors for all pages referenced by web.querystring are grouped in a hash table using the page URL as the key. The following code snippet shows how the BeginRequest handler of the HTTP module retrieves the descriptor for the currently requested page:

Dim currentPage As String
currentPage = HttpContext.Current.Request.Path.ToLower()
Dim qsDesc As QueryStringDescriptor = _
    _queryStringData.Item(currentPage)

The query string descriptor is an in-memory representation of the correct syntax for the page's query string. The next step is to validate the posted query string against this schema.

Query String Validation

The validation process consists of three steps. First, the module counts the number of parameters in the posted query string. If the posted query string has more parameters than expected, the validation fails. Next, the module iterates on posted query string parameters and ensures that each parameter matches an entry in the declared schema. If additional, unknown parameters are found, the validation fails. Finally, the module iterates on all parameters defined in the schema and verifies that all required parameters are specified and that each specified parameter has a value of the proper type.

The data validation step attempts to parse the value of a given parameter to its declared type. Here's the code snippet used to validate numeric values:

If paramType = QueryStringParamTypes.Int Then
    Dim result As Integer
    Dim success As Boolean = Int32.TryParse(paramValue, result)
    If Not success Then Return False
End If

By design, Boolean values are parsed only from strings like true and false. The validation subsystem of the querystring HTTP module also accepts strings such as yes and no.

In the end, the contents of the query string are parsed and type-validated as the first step in the request's pipeline. If everything is OK, the request is processed. Otherwise, the request is immediately terminated with a proper HTTP status code. Here's an example:

HttpContext.Current.Response.StatusCode = 500
HttpContext.Current.Response.[End]()

The user is served a page like the one in Figure 5. You might complain that nothing indicates the real reason for the IIS error, but the HTTP status code, as well as the generic description, is clear about the origin of the error-an internal server-side error during the processing of the request. As the aforementioned Michael Howard article explains, the least possible amount of information should always be disclosed in error pages to avoid the risk of gently handing out details to potential hackers. In this respect, an HTTP 500 error is generic enough about what really occurred. Anyway, as the previous code snippet shows, the HTTP status code can be set at will.

Figure 5 Result of a Bad Query String

Figure 5** Result of a Bad Query String **(Click the image for a larger view)

Considerations and Alternatives

Should you really abort the request in case of badly formatted data or are you better off caching somewhere the result of the validation and letting the page code make the final decision about the user? Furthermore, should you really catch and process the query string so early in the request lifecycle? Let's tackle this latter point first.

Figure 6 lists the application-wide events that characterize the request processing. If not at the beginning of the request, where else should you check the query string? A good place is immediately after authorization. If the request proceeds beyond the authorization stage, you can be relatively sure that the page HTTP handler will be invoked.

Figure 6 Global Application Events

Event Description
BeginRequest Indicates the beginning of the request processing.
AuthenticateRequest
PostAuthenticateRequest
Wraps the request authentication process.
AuthorizeRequest
PostAuthorizeRequestPostAuthorizeRequest
Wraps the request authorization process.
ResolveRequestCache
PostResolveRequestCache>PostResolveRequestCache
Wraps the process that checks whether the request can be served with previously cached output pages.
PostMapRequestHandler Indicates that the HTTP handler to serve the request has been found.
AcquireRequestState
PostAcquireRequestState
Wraps the retrieval of session state for the request.
PostRequestHandlerExecute Indicates that the HTTP handler to serve the request has been executed.
ReleaseRequestState
PostReleaseRequestState
Wraps the release of session state for the request.
UpdateRequestCache
PostUpdateRequestCache
Wraps the process that checks whether the output of the requested resource should be cached for further reuse.
EndRequest Indicates the end of the request processing.

But can you do it later? In general, any event handler up to and including PostAcquireRequestState would work. The user code-the code page authors write in the codebehind or inline in the ASPX file-is executed only past the PostAcquireRequestState event. Subsequently, there's no way the page can consume the query string up until the global PostAcquireRequestState event is fired. However, you shouldn't wait this long. Checking the query string after authorization but before page execution can save you a couple of extra operations-referring to retrieval of session state and checking the output cache. If you're going to kill the page due to a bad query string, there's no reason to first load the session state, especially if it's coming from an out-of-process source like SQL Server™.

In the end, there are only two application events where the query string check should be placed: BeginRequest or PostAuthorizeRequest. You should opt for the latter if user information is required to process the query string, for example if some users are allowed to specify certain parameters based on their role. In this case, you can also add a roles attribute to the schema of Figure 1. In any other scenario, by placing the interception in BeginRequest, you can kill the page at a very early stage in the pipeline, preventing further processing.

Things are quite different if you still want the page code to handle the bad query string and try to gracefully degrade or recover. For this, I think any event before the page is executed will work fine. I would opt for PostAcquireRequestState, which is the latest point in the pipeline where you can check the query string before the page code executes. You also have session state available at this point. I haven't noted it yet, even though the context makes it patently clear: the query string information is available from the beginning in the QueryString collection of the Request intrinsic object.

So let's assume that you want the HTTP module to check the query string and pass its findings down the pipeline and up to the page code. There are a couple of possible approaches you could take. Before I discuss them, I should note that any such approach is code intrusive and requires changes to the source code of each page with a query string.

The simplest way for an HTTP module to communicate with the handler in charge of a given request is by stuffing data into the Items collection of the HttpContext object. The property Items is a hash table for HTTP modules and handlers to write and read information. Any data stored in the Items table has the same lifetime as the request.

The HTTP module gains access to the context object of the current request using the static Current property on the HttpContext class, like so:

HttpContext.Current.Items("QueryStringStatus") = errorCode

With Items being a System.Collections.Hashtable, both key and value can be any .NET type. The query-string module uses a public enum type to list all possible error codes:

<Flags()> _
Public Enum QueryStringErrorCodes
    NoError = 0
    TooManyParameters = 1
    InvalidQueryParameter = 2
    MissingRequiredParameter = 4
    InvalidContent = 8
End Enum

The combination of these codes, which better describes what's wrong with the query string, is stuffed into a conventionally named slot in the hash table. The HTTP module and pages must agree on a naming convention so the page can retrieve and use this information. The HTTP module defines a public constant, which represents the name of the slot:

Public Const QueryStringValidationStatus As String = _
       "QueryStringValidationStatus"

The page can use the following code to retrieve the message from the HTTP module and decide what to do with the info:

Dim result As QueryStringErrorCodes = _ 
    DirectCast(Context.Items( _
        QueryStringHelper.QueryStringValidationStatus), _
        QueryStringErrorCodes)

Say you also want the module to provide the page with typed values taken from a valid query string. Consider the following URL and assume the query string is correct:

https://www.yourserver.com/page.aspx?detailed=true

The page should incorporate code to parse the query string value and transform it into a Boolean value. Such a conversion is already done in the HTTP module during the validation step. Nothing is easier than sharing these typed values with the target page by placing a hash table of them in another Items slot (see the source code for details).

A neater approach is to add a new read-only property to each page with a query string. Assuming you call it IsValidQueryString, it will look like this:

Public Property IsValidQueryString As Boolean
   Get 
      Dim result As QueryStringErrorCodes = DirectCast( _
        Context.Items(QueryStringHelper.QueryStringValidationStatus), _
            QueryStringErrorCodes)
      Return (result = QueryStringErrorCodes.NoError)
   End Get
End Property

Even better, you could define such a property on a base class and derive all query-string enabled pages from this class.

Summary

Not all ASP.NET pages use the query string. However, the query string can be used as input for Web pages. As such, it is a potential vehicle for attacks on pages with security holes. If your pages need a query string barrier, be ready to write the same code over and over again in all of the pages where you use the query string.

The QueryString module presented in this column requires no coding in source pages and automatically checks the posted query string against a given schema saved in a separate XML file. This means there's zero impact on existing code, while offering one more built-in barrier against attackers. But remember that this is no silver bullet.

Send your questions and comments for Dino to cutting@microsoft.com  cutting@microsoft.com.

Dino Esposito is a mentor at Solid Quality Learning and the author of Programming Microsoft ASP.NET 2.0 (Microsoft Press, 2005). Based in Italy, Dino is a frequent speaker at industry events worldwide. Get in touch with Dino at cutting@microsoft.com or join the blog at weblogs.asp.net/despos.