Concurrent Affairs

The ReaderWriterGate Lock

Jeffrey Richter

Contents

The ReaderWriterGate Object Model
Using the ReaderWriterGate
How ReaderWriterGate is Implemented
Why the ReaderWriterGate is Cool
Knowing When the Callback Completes
Conclusion

One day I was working on a consulting project and came across a thread synchronization problem that I hadn't run into before. The company was creating a Web service and almost all the client requests that came into their server needed to access some shared data in a read-only fashion. Occasionally, a request that needed to modify the shared data would come in, and we needed to keep the data in sync. It sounded like the perfect scenario for using reader-writer locks (as I discussed in my June 2006 Concurrent Affairs column).

However, after I got more involved with the project, I realized that this company's requirements had an unusual twist: when a Web service request came into the server and the lock was taken for writing, the lock was held for an extremely long time. As the writer thread was modifying the shared data, other Web service requests to read the data were coming into the server. The problem here was that threads in the pool were waking up to process the read requests, but all of these threads were immediately put to sleep waiting for the writer thread to complete its task.

It didn't take long before hundreds of pooled threads were created and all of these threads were in a wait state inside the operating system kernel. We now had lots of threads that couldn't run just sitting around wasting resources (such as each thread's stack and the associated thread kernel object). This was bad. And then, it got worse! When the writer thread completed writing to the shared data, it released the lock. The lock then saw that hundreds of reader threads were waiting for it and it released all of the waiting reader threads. The server computer had four CPUs in it and these four little CPUs had to process the work of all these runnable threads. Windows® would schedule all these threads using its normal round-robin scheduling technique, but the overhead of constantly switching the CPUs from one thread to another to another was simply killing performance, scalability, and throughput.

For months, this problem gnawed at me until I finally devised my own thread synchronization lock, which I call the ReaderWriterGate. This lock is fast, and it fits really well in a wide variety of applications. The interesting thing about the ReaderWriterGate is that calling threads almost never block, and if they do block, they are guaranteed to block for only an incredibly short period of time. And, because internally the ReaderWriterGate uses one of my ResourceLock-derived classes (discussed in the June column I mentioned earlier), I could change just one line of code to have ReaderWriterGate use a spin lock internally so that threads never block inside the Windows kernel (as I described in my October 2005 Concurrent Affairs column). This would give even better performance but, for fairness, this would require that all threads using the ReaderWriterGate be of the same thread priority and that priority boosting be disabled. This would improve performance even more. Furthermore, the ReaderWriterGate can perform an extraordinary amount of work using just a few threads. So, very few resources are required regardless of the number of requests being made against the gate.

I have my own implementation of this primitive which I make available free of charge as part of the Power Threading library, available for download from www.wintellect.com. If you want to use my implementation, you can, pursuant to the End-User License Agreement (EULA) to that accompanies the library. See the EULA for more details.

What I'd like to do now is explain the idea behind the ReaderWriterGate, discuss a possible implementation, and offer a slightly new way of thinking about threading and thread synchronization in general. It is my hope that you will see places in your existing code where this kind of thinking can be applied so that with minimal re-architecting, you could incorporate some of these ideas to give your applications better performance and scalability.

The ReaderWriterGate Object Model

To use the ReaderWriterGate class, there are only a few methods you need to be aware of:

public sealed class ReaderWriterGate 
{
   public ReaderWriterGate();

   public void QueueWrite(
      ReaderWriterGateCallback callback, Object state);
   public void QueueRead(
      ReaderWriterGateCallback callback, Object state);

   ... // Some methods not shown to simplify the discussion
}

As you can see, this is an incredibly simple object model (simpler than most locks). When you want to execute a task that requires read-only/shared access to a resource, you simply call the QueueRead method and when you want to execute a task that requires write/exclusive access to a resource, you simply call the QueueWrite method. Both of these methods take two arguments. The first argument, an instance of ReaderWriterGateCallback, identifies a method that you write; this method does the actual reading or writing. ReaderWriterGateCallback is a delegate type defined as follows:

delegate void ReaderWriterGateCallback(
    ReaderWriterGateReleaser releaser);

The second argument passed to QueueRead and QueueWrite is a reference to any object. This reference will be passed to your callback method that does the actual reading or writing. This simply gives you a way to get some data from the method that wants the task done and provide it to the thread or method that performs the actual task. For those of you familiar with the .NET ThreadPool and its QueueUserWorkItem method, this model should feel a bit familiar.

Notice that the ReaderWriterGateCallback delegate requires that your callback method take just one parameter that is of the ReaderWriterGateReleaser type. The following shows you what this type looks like:

public sealed class ReaderWriterGateReleaser : IDisposable 
{
   // Returns second argument passed to QueueXxx
   public object State { get; }

   // Returns reference to Gate
   public ReaderWriterGate Gate { get; }    

   public void Release();    // Calls Dispose internally
   public void Dispose();    // Useful with C#'s using statement
}

Within your callback method, you can query the State property to retrieve the Object reference that was passed as the second argument to the QueueRead/QueueWrite methods. Your callback method can also query the Gate property to retrieve a reference to the ReaderWriterGate that was used to orchestrate the calling of the method. It is unlikely that your method would ever query this property unless it was going to initiate some other task on the same resource (guarded by the same ReaderWriterGate object).

The main purpose of the ReaderWriterGateReleaser object is to give your callback method a way to indicate that it is finished holding the resource gate while continuing to execute. Next I will present an example that shows why you might want to use this and how to use it properly. In fact, I suspect that most callback methods will completely ignore the ReaderWriterGateReleaser argument entirely. If the argument is ignored, then the gate will automatically be released when your callback method returns.

Using the ReaderWriterGate

Let's imagine that you're implementing a Web service that accepts catalog orders from customers. In addition, the service accepts requests to update the catalog from company employees. When the service initializes, it constructs an instance of a hypothetical CatalogOrderSystem class (shown in Figure 1). When a CatalogOrderSystem object is constructed, it initializes a private field to refer to a ReaderWriterGate object. This object will be used to orchestrate threads reading from and writing to the CatalogOrderSystem object.

Figure 1 CatalogOrderSystem Class

internal sealed class CatalogOrderSystem 
{
   // The gate used to protect access to the Catalog Order System’s data
   private ReaderWriterGate m_gate = new ReaderWriterGate();

   public void UpdateCatalogWS(CatalogEntry[] catalogEntries) 
   {
      ... // Perform any validation/pre-processing on catalogEntries...

      // Updating the catalog requires exclusive access to it.
      m_gate.QueueWrite(UpdateCatalog, catalogEntries);
   }

   // The code in this method has exclusive access to the catalog.
   private void UpdateCatalog(ReaderWriterGateReleaser r) 
   {
      CatalogEntry[] catalogEntries = (CatalogEntry[]) r.State;
      
      ... // Update the catalog with the new entries...
    } // When this method returns, exclusive access is relinquished.

   public void BuyCatalogProductsWS(CatIDAndQuantity[] items) 
   {
      // Buying products requires read access to the catalog.
      m_gate.QueueRead(BuyCatalogProducts, items);
   }

   // The code in this method has shared read access to the catalog.
   private void BuyCatalogProducts(ReaderWriterGateReleaser r) 
   {
      using (r) 
      {
         CatIDAndQuantity[] items = (CatIDAndQuantity[])r.State;
         foreach (CatIDAndQuantity item in items) 
         {
            ... // Process each catalog item to build customer’s order...
         }
      } // When r is Disposed, read access is relinquished.

      ... // Save customer’s order to database
      ... // Send customer e-mail confirming order
   }
}

Let's say that a request comes in to update the order catalog; when it does, the UpdateCatalogWS method is called and is passed a reference to an array of CatalogEntry objects. Inside UpdateCatalogWS, it first performs any validation or preprocessing required on the array's objects. This work is performed while the ReaderWriterGate is not acquired, allowing this thread to operate concurrently with other threads that may currently be accessing the CatalogOrderSystem object. If all goes well, then to actually modify the order catalog with the new entries, exclusive access is required to the catalog order data. So, QueueWrite is called and is passed UpdateCatalog (the method that actually updates the catalog) and the array of new catalogEntries.

Internally, the ReaderWriterGate will have a thread pool thread call UpdateCatalog when and only when exclusive access can be given to code executing within the method. When UpdateCatalog is called, it can assume that no other threads are reading or writing to the OrderCatalogSystem object and it can update the catalog entries or do whatever it needs to do. When the UpdateCatalog method returns, the gate will automatically be released and the gate will allow any queued method to be called, allowing the method's code to manipulate the OrderCatalogSystem object.

Now, let's say that a request comes in to place an order; this means that the BuyCatalogProductsWS method is called and this method is passed a reference to an array of CatIDAndQuantity objects. Inside BuyCatalogProductsWS, it needs to read the catalog information in order to complete the customer's order request. To make sure that the catalog can be read safely, read access is required. So, QueueRead is called passing BuyCatalogProducts (the method that actually completes the customer's order) and the array of items.

The ReaderWriterGate will call BuyCatalogProducts when and only when read access can be given to code executing within the method. When BuyCatalogProducts is called, it can assume that no other threads are writing to the OrderCatalogSystem object (although other threads may be reading from it) and it can do whatever it needs to do. When the BuyCatalogProducts method executes, it first looks up the catalog entries in the catalog; this requires read access to the catalog. Then, after all the entries have been looked up, the catalog no longer needs to be accessed but there is more work required in order to complete the processing of the customer's order. Normally, the ReaderWriterGate is released when the callback method returns. But, in order to improve performance and scalability, the BuyCatalogProducts method wants to release the gate early; as soon as it no longer executes code that reads from the catalog.

To release the gate before the method returns, I used a C# using statement. At this statement's closing brace, the C# compiler emits a finally block that contains code to call the ReaderWriterGateReleaser's Dispose method. This notifies the ReaderWriterGate that the thread has finished reading the resource and the lock may allow other queued methods to be called. At this point, the BuyCatalogProducts method does not return; instead, it continues processing the customer's request. When the BuyCatalogProducts method does return, the ReaderWriterGate is normally released. However, in this case, the code inside the ReaderWriterGate that called BuyCatalogProducts knows that the ReaderWriterGate has already been released and it will not release it again. In fact, in compliance with the .NET Framework guidelines for implementing IDisposable, if you called Dispose (or Release) on a single ReaderWriterGateReleaser object multiple times, only the first invocation actually releases the gate; additional invocations do nothing and just return.

How ReaderWriterGate is Implemented

Now that you see how to use the ReaderWriterGate class, let's look at how I implemented it. A ReaderWriterGate object contains the private instance fields shown in Figure 2.

Figure 2 ReaderWriterGate Private Instance Fields

public sealed class ReaderWriterGate 
{
   // Used for thread-safe access to other fields
   private ResourceLock m_syncLock = new MonitorResourceLock();

   // The current state of the gate; see the ReaderWriteGateStates enum
   private ReaderWriterGateStates m_state = ReaderWriterGateStates.Free;

   // The number of methods desiring shared access currently executing
   private Int32 m_numReaders = 0;

   // A FIFO queue of methods desiring exclusive access
   private Queue<ReaderWriterGateReleaser> m_qWriteRequests =
      new Queue<ReaderWriterGateReleaser>();

   // A FIFO queue of methods desiring shared access
   private Queue<ReaderWriterGateReleaser> m_qReadRequests =
      new Queue<ReaderWriterGateReleaser>();
}

private enum ReaderWriterGateStates 
{
   // No methods are desiring access
   Free,

   // One or more methods desiring shared access are executing
   // and no writer methods are queued up desiring access
   OwnedByReaders,

   // One or more methods desiring shared access are executing
   // and one or more writer methods are queued up desiring access
   OwnedByReadersAndWriterPending,

   // One method desiring exclusive access is executing
   OwnedByWriter
}

When a thread calls QueueWrite, this method immediately constructs a ReaderWriterGateReleaser object, passing to the constructor the delegate to the method that will do the actual writing to the resource, a reference to the ReaderWriterGate object, a Boolean value of true indicating that the callback method will be writing to the resource, and the state object passed as the second argument to QueueWrite. The thread calling QueueWrite checks the m_state field to determine what to do. Figure 3 shows the basic logic implemented within QueueWrite.

Figure 3 Basic Logic of QueueWrite

1. If m_state is Free
    1.1 Set m_state to OwnedByWriter
    1.2 Invoke callback method via ThreadPool’s QueueUserWorkItem
    1.3 Return to caller
2. If m_state is OwnedByReaders or OwnedByReadersAndWriterPending
    2.1 Set m_state to OwnedByReadersAndWriterPending
    2.2 Add ReaderWriterGateReleaser object to m_qWriteRequests
    2.3 Return to caller
3. If m_state is OwnedByWriter
    3.1 Do not change m_state
    3.2 Add ReaderWriterGateReleaser object to m_qWriteRequests
    3.3 Return to caller

Note that QueueWrite always returns immediately; it never waits for the callback method to complete its execution. This behavior is different from normal thread synchronization locks such as Monitor or ReaderWriterLock where the calling thread is blocked until the thread is allowed access. For the ReaderWriterGate, the calling thread either has a thread pool thread execute the callback method or it adds an entry to a queue; the calling thread never waits to get access to the resource being protected by the gate. This is an extremely important distinction. The thread calling QueueWrite cannot be sure that the callback method has executed by the time QueueWrite returns.

When a thread calls QueueRead, this method immediately constructs a ReaderWriterGateReleaser object, passing to the constructor the delegate to the method that will do the actual reading from the resource, a reference to the ReaderWriterGate object, a Boolean value of false indicating that the callback method will be reading from the resource and the state object passed as the second argument to QueueRead. Figure 4 shows the basic logic implemented within QueueRead.

Figure 4 Pseudocode for QueueRead

1. If m_state is Free or OwnedByReaders
    1.1 Set m_state to OwnedByReaders
    1.2 Add 1 to m_numReaders
    1.3 Invoke callback method via ThreadPool’s QueueUserWorkItem
    1.4 Return to caller
2. If m_state is OwnedByWriter or OwnedByReadersAndWriterPending
    2.1 Do not change m_state
    2.2 Add ReaderWriterGateReleaser object to m_qReadRequests
    2.3 Return to caller

Like QueueWrite, QueueRead always returns immediately and never waits for the callback method to complete its execution. When QueueRead is called, it either queues the callback method to the thread pool or it adds the ReaderWriterGateReleaser object onto the m_qReadRequests queue and then the method returns. The thread calling QueueRead cannot be sure that the callback method has executed by the time QueueRead returns.

Why the ReaderWriterGate is Cool

Let's go back to the Web service example. Let's say a thread makes a write request against the server. Initially, a thread pool thread wakes up and calls QueueWrite. If the gate is Free, then the callback method is queued to the thread pool and the QueueWrite method immediately returns and this thread pool thread will likely return to the thread pool. When it returns to the thread pool, it will see the queued callback method and will call the method to do the work. In this case, one thread does everything and no context switch occurs—the performance is awesome!

Now, let's say that the callback method takes a long time to execute. When it is executing, additional requests could be coming in to the Web server. Let's say 100 requests come in to read data. All of these requests get queued to the common language runtime's (CLR's) thread pool. Since one thread pool thread is busy executing the previously queued write method, another thread pool thread will wake up and call into the Web service code. The Web service code will call QueueRead, see that the gate is not Free and will just construct a ReaderWriterGateReleaser object identifying the callback and add this object to the m_qReadRequests queue. This second thread pool thread will then return from QueueRead and will ultimately return all the way back to the thread pool.

Note that this thread is not blocked. In fact, if 99 more read requests come into the Web service, this thread will extract those 99 read requests (one at a time) off the thread pool and simply add a bunch of ReaderWriterGateReleaser objects to the m_qReadRequests queue. At this point, the thread pool has needed just two threads and now one thread is back in the pool ready to handle more incoming requests and the other thread is still executing the write method. If a write request comes in, a ReaderWriterGateReleaser object is constructed and then added to the m_qWriteRequests queue, and once again the thread returns immediately to the thread pool. This is incredibly scalable and uses an extremely small number of system resources!

Now, when the first thread pool thread finishes processing the write method, the gate is automatically released, and the ReaderWriterGate must then examine its queues to determine what write or read methods to invoke next. The internal logic looks something like Figure 5.

Figure 5 Pseudocode for Examining Queues

1. If a reader thread is releasing the gate, do steps 1.1 & 1.2;     
    else, go  to step #2
    1.1. Subtract 1 from m_numReaders
    1.2. If m_numReaders is > 0, return to caller; else, goto step #2
2. If m_qWriteRequests.Count > 0
    2.1. Set m_state to OwnedByWriter
    2.2. Invoke 1 writer callback method via ThreadPool’s
         QueueUserWorkItem
    2.3. Return to caller
3. If m_qReadRequests.Count > 0
    3.1. Set m_state to OwnedByReaders
    3.2. Set m_numReaders = m_qReadRequests.Count
    3.3. Invoke all reader callback methods via ThreadPool’s
         QueueUserWorkItem
    3.4. Return to caller
4. Set m_state to Free
5. Return to caller

From the logic, you can see that when a reader callback method returns, the gate checks to see if it is the last reader and, if not, then no additional methods will get invoked and the thread pool thread executing the callback method returns to the thread pool so that it can be used to do something else.

If there are no more readers reading, then the gate checks to see if there are any writer methods queued up in the m_qWriteRequests queue. If there are, the next writer's callback method is posted to the CLR's thread pool, causing a thread pool thread to wake up and execute the code that writes to the resource. Note that only one writer method is invoked at a time, ensuring exclusive access to the data being protected by the gate. Also note that my implementation always prefers to wake writers, which can starve readers, but this is the usually case with reader/writer locks. You could always change the policy to prefer readers if you desire.

If there are writers queued up, then the gate checks to see if there are any reader methods queued up in the m_qReadRequests queue. If there are, all of the reader callback methods are posted to the CLR's thread pool since, theoretically, they can all execute at once. However, you may remember that earlier I mentioned that there was a significant performance problem when a normal reader/writer lock releases all its waiting reader threads: all threads are runnable and the operating system must perform a lot of context switching, which adversely affects performance.

In the case of the ReaderWriterGate, all of the reader callback methods are queued up to the thread pool which knows to invoke only a few of the methods at a time using just a few threads. Ideally, the number of threads used is equal to the number of CPUs in the computer itself and, therefore, no context switching takes place. These few threads will execute the read callback methods and then go back to the thread pool and execute a few more. The end result is that very few threads are required to perform lots of work and the work executes more quickly because fewer context switches are necessary. Better performance with fewer resources—you gotta love it!

Knowing When the Callback Completes

Shortly after I released the first version of my Power Threading Library with my ReaderWriterGate, someone tried to use it in an ASP.NET application. His scenario was similar to what I have already described: clients were making requests for Web pages, and to gain scalability, the Web pages needed to access shared data using reader/writer semantics. Fortunately, ASP.NET 2.0 has support for implementing asynchronous Web pages. For more information about this, I encourage you to read Jeff Prosise's Wicked Code column about asynchronous pages in ASP.NET 2.0, which appeared in the October 2005 issue of MSDN®Magazine.

To take advantage of asynchronous Web pages, you need to issue an asynchronous request and return to the ASP.NET infrastructure an IAsyncResult object that it can use to know when the asynchronous operation has completed. You use the ReaderWriterGate class's QueueWrite and QueueRead methods to start the asynchronous operation, but in my original implementation I offered no way for you to know when the callback method had actually completed its task. This meant that the ReaderWriterGate could not easily be used in an ASP.NET Web Forms application.

After I got this user's feedback, I realized how useful this would be and I added a few additional methods to the ReaderWriterGate class. Here are the new methods:

public sealed class ReaderWriterGate 
{
   public IAsyncResult BeginRead(ReaderWriterGateCallback callback, 
      Object state, AsyncCallback asyncCallback, Object asyncState);
   public void EndRead(IAsyncResult ar);

   public IAsyncResult BeginWrite(ReaderWriterGateCallback callback, 
      Object state, AsyncCallback asyncCallback, Object asyncState);
   public void EndWrite(IAsyncResult ar);

   ... // Other methods not shown
}

You would use the BeginWrite/BeginRead and EndWrite/EndRead methods instead of the QueueWrite/QueueRead method when writing an asynchronous ASP.NET Web page or any kind of application where you need to know when the callback method has completed execution. These BeginXxx methods always return immediately but now, when they return, I give you back an IAsyncResult object that identifies the queued callback method. The IAsyncResult object I return fits in with the rest of the CLR's asynchronous programming model and you can use it just like you would any other IAsyncResult object. In the case of an ASP.NET Web page, you can return my object to the ASP.NET infrastructure directly and it will know to complete processing the page when the callback method completes. Also, just like the CLR's asynchronous programming model, when you call EndWrite/EndRead, I throw any exception that was unhandled by the callback method.

Conclusion

I am particularly proud of the ReaderWriterGate idea and of my implementation. I think it solves a real-world need for producing scalable applications and services using a minimum of resources. Even though threading has been around for many years, I still believe that there are new ways to think about it that can lead to new ideas and ways of architecting applications and services to start taking advantage of the multiple-CPU machines that are now becoming commonplace.

Send your questions and comments for Jeffrey to mmsync@microsoft.com.

Jeffrey Richter is a cofounder of Wintellect (www.Wintellect.com), an architecture review, consulting, and training firm. He is the author of several books, including CLR via C# (Microsoft Press, 2006). Jeffrey is also a contributing editor to MSDN Magazine and has been consulting with Microsoft since 1990.