Best Practices for Maximizing Scalability and Cost Effectiveness of Queue-Based Messaging Solutions on Azure
Dette innholdet er ikke tilgjengelig på ditt språk, men her er den engelske versjonen.

Best Practices for Maximizing Scalability and Cost Effectiveness of Queue-Based Messaging Solutions on Azure

Updated: March 31, 2015

Authored by: Amit Srivastava and Valery Mizonov

Reviewed by: Brad Calder, Sidney Higa, Christian Martinez, Steve Marx, Curt Peterson, Paolo Salvatori, and Trace Young

This article offers prescriptive guidance and best practices for building scalable, highly efficient and cost effective queue-based messaging solutions on the Azure platform. The intended audience for this article includes solution architects and developers designing and implementing cloud-based solutions which leverage the Azure platform’s queue storage services.

A traditional queue-based messaging solution utilizes the concept of a message storage location known as a message queue, which is a repository for data that will be sent to or received from one or more participants, typically via an asynchronous communication mechanism.

The purpose of this article is to examine how developers can take advantage of particular design patterns in conjunction with capabilities provided by the Azure platform to build optimized and cost-effective queue-based messaging solutions. The article takes a deeper look at most commonly used approaches to implementing queue-based interactions in Azure solutions, and provides recommendations for improving performance, increasing scalability and reducing operating expense.

The underlying discussion is mixed with relevant best practices, hints and recommendations where appropriate. The scenario described in this article highlights a technical implementation that is based upon a real-world customer project.

A typical messaging solution that exchanges data between its distributed components using message queues includes publishers depositing messages into queues and one or more subscribers intended to receive these messages. In most cases, the subscribers, sometimes referred to as queue listeners, are implemented as single- or multi-threaded processes, either continuously running or initiated on demand as per a scheduling pattern.

At a higher level, there are two primary dispatch mechanisms used to enable a queue listener to receive messages stored on a queue:

  • Polling (pull-based model): A listener monitors a queue by checking the queue at regular intervals for new messages. When the queue is empty, the listener continues polling the queue, periodically backing off by entering a sleep state.

  • Triggering (push-based model): A listener subscribes to an event that is triggered (either by the publisher itself or by a queue service manager) whenever a message arrives on a queue. The listener in turn can initiate message processing thus not having to poll the queue in order to determine whether or not any new work is available.

It is also worth mentioning that there are different flavors of both mechanisms. For instance, polling can be blocking and non-blocking. Blocking keeps a request on hold until a new message appears on a queue (or timeout is encountered) whereas a non-blocking request completes immediately if there is nothing on a queue. With a triggering model, a notification can be pushed to the queue listeners either for every new message, only when the very first message arrives to an empty queue or when queue depth reaches a certain level.

The dequeue operations supported by Azure Queue Service API are non-blocking. This means that the API methods such as GetMessage or GetMessages will return immediately if there is no message found on a queue. By contrast, the Azure Service Bus queues offer blocking receive operations which block the calling thread until a message arrives on a queue or a specified timeout period has elapsed.

The most common approach to implementing queue listeners in Azure solutions today can be summarized as follows:

  1. A listener is implemented as an application component that is instantiated and executed as part of a worker role instance.

  2. The lifecycle of the queue listener component would often be bound to the run time of the hosting role instance.

  3. The main processing logic is comprised of a loop in which messages are dequeued and dispatched for processing.

  4. Should no messages be received, the listening thread enters a sleep state the duration of which is often driven by an application-specific back-off algorithm.

  5. The receive loop is being executed and a queue is being polled until the listener is notified to exit the loop and terminate.

The following flowchart diagram depicts the logic commonly used when implementing a queue listener with a polling mechanism in Azure applications:

For purposes of this article, more complex design patterns, for example those that require the use of a central queue manager (broker) are not used.

The use of a classic queue listener with a polling mechanism may not be the optimal choice when using Azure queues because the Azure pricing model measures storage transactions in terms of application requests performed against the queue, regardless of if the queue is empty or not. The purpose of the next sections is to discuss some techniques for maximizing performance and minimizing the cost of queue-based messaging solutions on the Azure platform.

In this section we must examine how to improve the relevant design aspects to achieve higher performance, better scalability and cost efficiency.

Perhaps, the easiest way of qualifying an implementation pattern as a “more efficient solution” would be through the design which meets the following goals:

  • Reduces operational expenditures by removing a significant portion of storage transactions that don’t derive any usable work.

  • Eliminates excessive latency imposed by a polling interval when checking a queue for new messages.

  • Scales up and down dynamically by adapting processing power to volatile volumes of work.

The implementation pattern should also meet these goals without introducing a level of complexity that effectively outweighs the associated benefits.

When evaluating the total cost of ownership (TCO) and return on investment (ROI) for a solution deployed on the Azure platform, the volume of storage transactions is one of the main variables in the TCO equation. Reducing the number of transactions against Azure queues decreases the operating costs as it relates to running solutions on Azure.

The storage space cost associated with Azure Queue can be computed as –

Queue space: 24 bytes + Len(QueueName) * 2 +  For-Each Metadata(4 bytes + Len(QueueName) * 2 bytes + Len(Value) * 2 bytes)

Message space: 12 bytes + Len(Message)

In the context of a queue-based messaging solution, the volume of storage transactions can be reduced using a combination of the following methods:

  1. When putting messages in a queue, group related messages into a single larger batch, compress and store the compressed image in a blob storage and use the queue to keep a reference to the blob holding the actual data. This approach helps optimize transaction cost and storage space cost.

  2. When retrieving messages from a queue, batch multiple messages together in a single storage transaction. The GetMessages method in the Queue Service API allows de-queuing the specified number of messages in a single transaction (see the note below).

  3. When checking the presence of work items on a queue, avoid aggressive polling intervals and implement a back-off delay that increases the time between polling requests if a queue remains continuously empty.

  4. Reduce the number of queue listeners – when using a pull-based model, use only 1 queue listener per role instance when a queue is empty. To further reduce the number of queue listeners per role instance to zero, use a notification mechanism to instantiate queue listeners when the queue receives work items.

  5. If queues remain empty for most of the time, automatically scale down the number of role instances and continue to monitor relevant system metrics to determine if and when the application should scale up the number of instances to handle increasing workload.

  6. Implement a mechanism to remove poison messages. Poison messages are typically malformed messages that the application cannot process. If left unprocessed, such messages could accumulate and incur repeated transactional and processing costs. A simple implementation mechanism could be to remove messages older than a threshold duration from the queue and write them to an archival system for further evaluation.

  7. Reduce expected time-out failures. When you send a request to the service, you can specify your own time-out and set it to be smaller than the SLA time-out. In this scenario, when the request times out, it is classified as an expected time-out and counted towards billable transactions.

Most of the above recommendations can be translated into a fairly generic implementation that handles message batches and encapsulates many of the underlying queue/blob storage and thread management operations. Later in this article, we will examine how to do this.

When retrieving messages via the GetMessages method, the maximum batch size supported by Queue Service API in a single dequeue operation is limited to 32.

Generally speaking, the cost of Azure queue transactions increases linearly as the number of queue service clients increases, such as when scaling up the number of role instances or increasing the number of dequeue threads. To illustrate the potential cost impact of a solution design that does not take advantage of the above recommendations; we will provide an example backed up by concrete numbers.

If the solution architect does not implement relevant optimizations, the billing system architecture described above will likely incur excessive operating expenses once the solution is deployed and running on the Azure platform. The reasons for the possible excessive expense are described in this section.

As noted in the scenario definition, the business transaction data arrives at regular intervals. However, let’s assume that the solution is busy processing workload just 25% of the time during a standard 8-hour business day. That results in 6 hours (8 hours * 75%) of “idle time” when there may not be any transactions coming through the system. Furthermore, the solution will not receive any data at all during the 16 non-business hours every day.

During the idle period totaling 22 hours, the solution is still performing attempts to dequeue work as it has no explicit knowledge when new data arrives. During this time window, each individual dequeue thread will perform up to 79,200 transactions (22 hours * 60 min * 60 transactions/min) against an input queue, assumed a default polling interval of 1 second.

As previously mentioned, the pricing model in the Azure platform is based upon individual “storage transactions”. A storage transaction is a request made by a user application to add, read, update or delete storage data. As of the writing of this whitepaper, storage transactions are billed at a rate of $0.01 for 10,000 transactions (not taking into account any promotional offerings or special pricing arrangements).

When calculating the number of queue transactions, keep in mind that putting a single message on a queue would be counted as 1 transaction, whereas consuming a message is often a 2-step process involving the retrieval followed by a request to remove the message from the queue. As a result, a successful dequeue operation will attract 2 storage transactions. Please note that even if a dequeue request results in no data being retrieved; it still counts as a billable transaction.

The storage transactions generated by a single dequeue thread in the above scenario will add approximately $2.38 (79,200 / 10,000 * $0.01 * 30 days) to a monthly bill. In comparison, 200 dequeue threads (or, alternatively, 1 dequeue thread in 200 worker role instances) will increase the cost by $457.20 per month. That is the cost incurred when the solution was not performing any computations at all, just checking on the queues to see if any work items are available. The above example is abstract as no one would implement their service this way, which is why it is important to do the optimizations described next.

To optimize performance of queue-based Azure messaging solutions one approach is to use the publish/subscribe messaging layer provided with the Azure Service Bus, as described in this section.

In this approach, developers will need to focus on creating a combination of polling and real-time push-based notifications, enabling the listeners to subscribe to a notification event (trigger) that is raised upon certain conditions to indicate that a new workload is put on a queue. This approach enhances the traditional queue polling loop with a publish/subscribe messaging layer for dispatching notifications.

In a complex distributed system, this approach would necessitate the use of a “message bus” or “message-oriented middleware” to ensure that notifications can be reliably relayed to one or more subscribers in a loosely coupled fashion. Azure Service Bus is a natural choice for addressing messaging requirements between loosely coupled distributed application services running on Azure and running on-premises. It is also a perfect fit for a “message bus” architecture that will enable exchanging notifications between processes involved in queue-based communication.

The processes engaged in a queue-based message exchange could employ the following pattern:


Specifically, and as it relates to the interaction between queue service publishers and subscribers, the same principles that apply to the communication between Azure role instances would meet the majority of requirements for push-based notification message exchange.

The use of the Azure Service Bus is subject to a pricing model that takes into account the volume of messaging operations against a Service Bus messaging entity such as a queue or a topic.

It is therefore important to perform a cost-benefit analysis to assess the pros and cons of introducing the Service Bus into a given architecture. Along those lines, it is worth evaluating whether or not the introduction of the notification dispatch layer based on the Service Bus would, in fact, lead to cost reduction that can justify the investments and additional development efforts.

For more information on the pricing model for Service Bus, please refer to the relevant sections in Azure Platform FAQs.

While the impact on latency is fairly easy to address with a publish/subscribe messaging layer, a further cost reduction could be realized by using dynamic (elastic) scaling, as described in the next section.

Azure storage defines scalability targets at an overall account level and a per partition level. A queue in Azure is its own, single partition, and therefore, it can process up to 2000 messages per second. When the number of messages exceeds this quota, storage service responds with an HTTP 503 Server Busy message. This message indicates that the platform is throttling the queue. Application designers should perform capacity planning to ensure that an appropriate number of queues can sustain the application’s request rate. If a single queue is unable to handle an application’s request rate, design a partitioned queue architecture with multiple queues to ensure scalability.

An application could also leverage several different queues for different message types. This ensures application scalability by allowing multiple queues to co-exist without choking a single queue. This also allows discrete control over queue processing based on the sensitivity and priority of the messages that are stored in different queues. High priority queues could have more workers dedicated to them than low priority queues.

The Azure platform makes it possible for customers to scale up and down faster and easier than ever before. The ability to adapt to volatile workloads and variable traffic is one of the primary value propositions of the cloud platform. This means that “scalability” is no longer an expensive IT vocabulary term, it is now an out-of-the-box feature that can be programmatically enabled on demand in a well-architected cloud solution.

Dynamic scaling is the technical capability of a given solution to adapt to fluctuating workloads by increasing and reducing working capacity and processing power at runtime. The Azure platform natively supports dynamic scaling through the provisioning of a distributed computing infrastructure on which compute hours can be purchased as needed.

It is important to differentiate between the following 2 types of dynamic scaling on the Azure platform:

  • Role instance scaling refers to adding and removing additional web or worker role instances to handle the point-in-time workload. This often includes changing the instance count in the service configuration. Increasing the instance count will cause Azure runtime to start new instances whereas decreasing the instance count will in turn cause it to shut down running instances.

  • Process (thread) scaling refers to maintaining sufficient capacity in terms of processing threads in a given role instance by tuning the number of threads up and down depending on the current workload.

Dynamic scaling in a queue-based messaging solution would attract a combination of the following general recommendations:

  1. Monitor key performance indicators including CPU utilization, queue depth, response times and message processing latency.

  2. Dynamically increase or decrease the number of role instances to cope with the spikes in workload, either predictable or unpredictable.

  3. Programmatically expand and trim down the number of processing threads to adapt to variable load conditions handled by a given role instance.

  4. Partition and process fine-grained workloads concurrently using the Task Parallel Library in the .NET Framework 4.

  5. Maintain a viable capacity in solutions with highly volatile workload in anticipation of sudden spikes to be able to handle them without the overhead of setting up additional instances.

The Service Management API makes it possible for a Azure hosted service to modify the number of its running role instances by changing deployment configuration at runtime.

The maximum number of Azure small compute instances (or the equivalent number of other sized compute instances in terms of number of cores) in a typical subscription is limited to 20 by default. Any requests for increasing this quota should be raised with the Azure Support team. For more information, see the Azure Platform FAQs

With the introduction of Azure Auto Scaling, the platform can scale the instance count up or down based on the queue message depth. This is a very natural fit for dynamic scaling. The added advantage is that the Azure platform monitors and scales tasks for the application.

Dynamic scaling of the role instance count may not always be the most appropriate choice for handling load spikes. For instance, a new role instance can take a few seconds to spin up and there are currently no SLA metrics provided with respect to spin-up duration. Instead, a solution may need to simply increase the number of worker threads to deal with the volatile workload increase. While workload is being processed, the solution will monitor the relevant load metrics and determine whether it needs to dynamically reduce or increase the number of worker processes.

At present, the scalability target for a single Azure queue is “constrained” at 2000 transactions/sec. If an application attempts to exceed this target, for example, through performing queue operations from multiple role instance running hundreds of dequeue threads, it may result in HTTP 503 “Server Busy” response from the storage service. When this occurs, the application should implement a retry mechanism using exponential back-off delay algorithm. However, if the HTTP 503 errors are occurring regularly, it is recommended to use multiple queues and implement a partitioning-based strategy to scale across multiple queues.

In most cases, auto-scaling the worker processes is the responsibility of an individual role instance. By contrast, role instance scaling often involves a central element of the solution architecture that is responsible for monitoring performance metrics and taking the appropriate scaling actions. The diagram below depicts a service component called Dynamic Scaling Agent that gathers and analyzes load metrics to determine whether it needs to provision new instances or decommission idle instances.


It is worth noting that the scaling agent service can be deployed either as a worker role running on Azure or as an on-premises service. Irrespectively of the deployment topology, the service will be able to access the Azure queues.

As an alternative to manual dynamic scaling, consider using the built-in auto scaling feature with Azure.

Now that we have covered the latency impact, storage transaction costs and dynamic scale requirements, it is a good time to consolidate our recommendations into a technical implementation.

For the sake of a concrete example, we will generalize a real-world customer scenario as follows.

A SaaS solution provider launches a new billing system implemented as a Azure application servicing the business needs for customer transaction processing at scale. The key premise of the solution is centered upon the ability to offload compute-intensive workload to the cloud and leverage the elasticity of the Azure infrastructure to perform the computationally intensive work.

The on-premises element of the end-to-end architecture consolidates and dispatches large volumes of transactions to a Azure hosted service regularly throughout the day. Volumes vary from a few thousands to hundreds of thousands transactions per submission, reaching millions of transactions per day. Additionally, assume that the solution must satisfy a SLA-driven requirement for a guaranteed maximum processing latency.

The solution architecture is founded on the distributed map-reduce design pattern and is comprised of a multi-instance worker role-based cloud tier using the Azure queue storage for work dispatch. Transaction batches are received by Process Initiator worker role instance, decomposed (de-batched) into smaller work items and enqueued into a collection of Azure queues for the purposes of load distribution.

Workload processing is handled by multiple instances of the processing worker role fetching work items from queues and passing them through computational procedures. The processing instances employ multi-threaded queue listeners to implement parallel data processing for optimal performance.

The processed work items are routed into a dedicated queue from which these are dequeued by the Process Controller worker role instance, aggregated and persisted into a data store for data mining, reporting and analysis.

The solution architecture can be depicted as follows:


The diagram above depicts a typical architecture for scaling out large or complex compute workloads. The queue-based message exchange pattern adopted by this architecture is also very typical for many other Azure applications and services which need to communicate with each other via queues. This enables taking a canonical approach to examining specific fundamental components involved in a queue-based message exchange.

To maximize the efficiency and cost effectiveness of queue-based messaging solutions running on the Azure platform, solution architects and developers should consider the following recommendations.

As a solution architect, you should:

  • Provision a queue-based messaging architecture that uses the Azure queue storage service for high-scale asynchronous communication between tiers and services in cloud-based or hybrid solutions.

  • Recommend partitioned queuing architecture to scale beyond 2000 messages/sec.

  • Understand the fundamentals of Azure pricing model and optimize solution to lower transaction costs through a series of best practices and design patterns.

  • Consider dynamic scaling requirements by provisioning an architecture that is adaptive to volatile and fluctuating workloads.

  • Employ the right auto-scaling techniques and approaches to elastically expand and shrink compute power to further optimize the operating expense.

  • Evaluate Azure auto scaling to see if it fits the application’s need for dynamic scaling

  • Evaluate the cost-benefit ratio of reducing latency through taking dependency on Azure Service Bus for real-time push-based notification dispatch.

As a developer, you should:

  • Design a messaging solution that employs batching when storing and retrieving data from Azure queues.

  • Evaluate Azure auto scaling to see if it fits the application’s need for dynamic scaling

  • Implement an efficient queue listener service ensuring that queues will be polled by a maximum of one dequeue thread when empty.

  • Dynamically scale down the number of worker role instances when queues remain empty for a prolonged period of time.

  • Implement an application-specific random exponential back-off algorithm to reduce the effect of idle queue polling on storage transaction costs.

  • Adopt the right techniques that prevent from exceeding the scalability targets for a single queue when implementing highly multi-threaded multi-instance queue publishers and consumers.

  • Employ a robust retry policy capable of handling a variety of transient conditions when publishing and consuming data from Azure queues.

  • Use the one-way eventing capability provided by Azure Service Bus to support push-based notifications in order to reduce latency and improve performance of the queue-based messaging solution.

  • Explore the new capabilities of the .NET Framework 4 such as TPL, PLINQ and Observer pattern to maximize the degree of parallelism, improve concurrency and simplify the design of multi-threaded services.

The accompanying sample code is available for download from the MSDN Code Gallery. The sample code also includes all the required infrastructure components such as generics-aware abstraction layer for the Azure queue service which were not supplied in the above code snippets. Note that all source code files are governed by the Microsoft Public License as explained in the corresponding legal notices.

© 2015 Microsoft