When you stroll (or browse) through a well-stocked electronics store these days, you’ll find an amazing array of “things” that have the ability to connect to a network. Don’t just think phones, tablets, notebooks or desktops, or just TVs, Blu-ray players and set-top boxes. Think espresso makers, refrigerators and picture frames. Think garage door openers, air conditioning and alarm systems. If you look around and behind the cover panels in industrial or commercial environments such as your own office building, you’ll find temperature and humidity sensors, motion sensors, surveillance cameras and a multitude of other kinds of sensors or control switches inside equipment.
Many of these devices generate useful data: temperature readings; number of cups of brewed coffee and how the grinder has been set; infrared images showing that no one is in the conference room and therefore the lights can be turned off.
It’s easy to imagine a scenario where you’d like to upload some data into a “thing” as well, such as pushing the latest pictures of your children (or your pet) to a picture frame sitting on grandma’s sideboard; or one where you want to flip a switch from a distance—even if that distance only means your phone connected via 3G/4G mobile carrier network—to turn the temperature in the house up a notch. From a networking perspective that’s three worlds away, but from the consumer perspective there’s no appreciable difference between flipping the switch at home or while sitting in a cab on the way back from the airport returning from a two-week vacation.
The opportunities around connected devices are enormous. Supplying services for special-purpose devices might indeed provide more monetization potential for forward-looking cloud developers than apps on general-purpose screen devices tailored for human interaction, such as phones, tablets or the many PC form factors. This seems especially true when you combine such services with cloud technologies emerging around “big data” analysis.
For the purpose of the following architecture discussion, let’s imagine the offering is a Software as a Service (SaaS) for air conditioners. While the scenario is fictitious and so are all the numbers, the patterns and magnitudes are fairly close to actual scenarios that the Azure team is discussing with partners and customers.
What’s nice about air conditioners—from a business perspective—is that there’s healthy demand, and global climate trends indicate they won’t be going out of fashion anytime soon. Less nice is that they’re enormously hungry for electricity and can overload the electrical grid in hotter regions, resulting in rolling brownouts.
The SaaS solution, for which I’ll outline the architecture, targets electricity companies looking for analytic insight into air conditioner use for the purpose of capacity management and for a mechanism that allows them to make broad emergency adjustments to air conditioning systems hanging on their electricity grid when the grid is at the verge of collapse.
The bet is that utility customers would prefer their room temperatures forcibly adjusted upward to a cozy 80° F/27° C, rather than having the power grid cut out, leaving them with no defense against the scorching 100° F/38° C outside temperatures.
Let’s further assume the SaaS will be paired with a number of air conditioner manufacturers to integrate the required hardware and protocols. Once the devices are installed and connected to the service, there will be opportunities to sell electricity-company- or manufacturer-branded companion apps through mobile app stores that allow the customers to monitor and control their air conditioners from their phones or tablets. Figure 1 shows an overview of the scenario.
Figure 1 Overview of the Software as a Service Solution
As you’re architecting the solution, you need to accommodate two general directions of message traffic. Inbound (from the cloud perspective), you need to collect metrics and events from the devices, aggregate the data and flow it toward the analysis system. Outbound, you need to be able to send control commands to the devices.
Each air conditioning device should be able to send humidity, temperature and device status readings ranging from once per hour to once every 10 minutes as required. Device status changes should also cause events to be raised and sent. Control commands will be much rarer, probably not more than one or two events per day and device.
For the initial ramp-up, set a target scale of 250,000 devices in the first year, accelerating to 2.5 million devices after the third year. At that scale, you can expect 0.25 to 1.5 million events per hour for the first year and around 2.5 to 15 million events per hour by the third year.
You need to address three major areas of concern in the architecture: provisioning, event fan-in and command fan-out.
Provisioning Relates to setting up new devices, assigning them a unique identity within the system and giving each device a way to prove that identity. Furthermore, the devices must be associated with particular resources in the system and must receive a configuration document containing all of this information.
Event Fan-In Involves designing the system so it can handle the desired throughput for event ingestion. 15 million events per hour translates (generously rounded) to some 4,200 events per second, which justifies thinking hard about how to partition the system. This is especially the case when those events originate from 4,200 distinct sources with each client connecting remotely, establishing a fresh session with potentially significant network latency.
Because the events and resulting statistics from each particular housing unit matter not only at the macro level, but also to the residents who purchase the mobile application and want to have accurate trend graphs, you not only have to flow rolled-up data for analysis but also figure out how to retain 360 million events per day.
Command Fan-Out Deals with the flow of commands to the devices. Commands might instruct a device to change its configuration or they might be status inquiries that tell the device to emit a status event. In cases where the electricity company needs to make adjustments across the board to alleviate pressure on the grid, they might want to send a single command to all devices at once and as fast as possible. In the mobile phone case, where a resident wants to adjust the temperature for comfort, the command is targeted at a particular device or all devices in a housing unit.
Figure 2 illustrates the general layout of the fan-in model for the air conditioning scenario.
Figure 2 The Fan-in Model
The first and most obvious aspect of the architecture is the presence of partitions. Instead of looking at a population of millions of connected devices as a whole, you’re subdividing the device population in much smaller and therefore more manageable partitions of several thousand devices each. But manageability isn’t the only reason for introducing partitions.
First, each resource in a distributed system has a throughput- and storage-capacity ceiling. Therefore, you want to limit the number of devices associated with any single Service Bus Topic such that the events sent by the devices won’t exceed the Topic’s throughput capacity, and any message backlog that might temporarily build up doesn’t exceed the Topic’s storage capacity. Second, you want to allocate appropriate compute resources and you also don’t want to overtax the storage back end with too many writes. As a result, you should bundle a relatively small set of resources with reasonably well-known performance characteristics into an autonomous and mostly isolated “scale-unit.”
Each scale-unit supports a maximum number of devices—and this is important for limiting risks in a scalability ramp-up. A nice side effect of introducing scale-units is that they significantly reduce the risk of full system outages. If a system depends on a single data store and that store has availability issues, the whole system is affected. But if, instead, the system consists of 10 scale-units that each maintain an independent store, a hiccup in one store affects only 10 percent of the system.
The architecture decision shown in Figure 2 is that the devices drop all events into Azure Service Bus Topics instead of into a service edge that the solution provides directly. Azure Service Bus already provides a scaled-out and secure network service gateway for messaging, and it makes sense to leverage that functionality whenever possible.
For this particular scenario you’ll assume the devices are capable of supporting HTTPS and therefore can talk directly to Azure Service Bus with its existing protocol support. However, as the devices get cheaper and smaller and have only a few kilobytes of memory and a slow processor, supporting SSL—and therefore HTTPS—can become a significant challenge, and even HTTP can start to appear enormously heavy. These are cases where building and running a custom network gateway (see the far right of Figure 2) with a custom or vertical-industry protocol is a good idea.
At the time this article was written, Service Bus only supported HTTP/HTTPS for the scenarios described above. Since then, support for Advanced Message Queuing Protocol (AMQP) 1.0 was added first in the cloud with Azure Service Bus (middle of 2013) and then in Service Bus for Windows Server 1.1 in October 2013. AMQP is a binary and bi-directional open protocol with support for many platforms, including Embedded Linux. We at Microsoft work directly with the Apache Foundation contributing to Apache QPid Proton, which Service Bus customers with embedded systems solutions already use today.
AMQP should be seen as the preferred protocol for implementing the communication paths for the patterns presented in this article. For more information about the current thinking and work Microsoft is doing in this area, I encourage you to follow my Subscribe! video blog on Channel 9.
Each ingestion Topic is configured with three subscriptions. The first subscription is for a storage writer that writes the received event into an event store; the second subscription is for an aggregation component that rolls up events and forwards them into the global statistics system; and the third subscription is for routing messages to the device control system.
As a throughput ceiling for each of these Topics, assume an average of 100 messages per second at the entity level. Given you’re creating and routing three copies of every submitted event, you can ingest at most some 33 messages per second into each Topic. That’s obviously a (shockingly) defensive number. The reason to be very cautious is that you’re dealing with distributed devices that send messages every hour, and it would be somewhat naïve to assume perfect, random distribution of event submissions across any given hour. If you assume the worst-case scenario of a 10-minute event interval with one extra control interaction feedback message per device and hour, you can expect seven messages per hour from each device and, therefore, rounding down, you can hook some 50,000 devices onto each Topic.
While this flow rate is nothing to worry about for storage throughput, storage capacity and the manageability of the event store are concerns. The per-device event data at a resolution of one hour for 50,000 devices amounts to some 438 million event records per year. Even if you’re very greedy about how to pack those records and use, say, only 50 bytes per record, you’ll still be looking at 22GB of payload data per year for each of the scale-units. That’s still manageable, but it also underlines that you need to keep an eye on the storage capacity and storage growth as you think about sizing scale-units.
Based on the one-year ramp-up projection, you’ll need five of these scale-units for the first year and 50 after the third.
Now let’s look in more detail at how the incoming events will be handled and used. To illustrate that, I’m zooming in on one of the scale-units in Figure 3.
Figure 3 A Scale-Unit
The messages originating from the devices can be split into two broad categories: control replies and events. The control replies are filtered by the subscription for the control system. The events, containing the current device status and sensor readings, are routed to the storage writer and to the aggregator via separate subscriptions.
The storage writer’s job is to receive an event from the subscription and write it into the event store.
For storing events, Azure Table Storage is a great candidate. To support the partitioning model, a storage account will be shared across all partitions in the system and have one table per partition. The table store requires a partition key to allocate data rows to a storage section, and you’ll use the device’s identifier for that key, which will yield nice lookup performance when you need to serve up historical stats to plot graphs. For the row key, simply use a string-encoded timestamp in a compact YYYYMMDDHHMMSS format that yields chronological sorting and allows for range queries for plotting graphs. Also, because individual devices will, in this case, never send events in intervals of less than a second, you don’t need to worry about potential collisions. The timestamp will be derived from the message’s EnqueuedTimeUtc property that the Azure Service Bus broker automatically stamps on each incoming message, so you don’t have to trust the device’s clock. If you were building a high-throughput solution where collisions were possible, you could additionally leverage the message’s SequenceNumber property, which is a unique, sequential 64-bit identifier the broker stamps on each message.
The aggregator’s job is to take incoming events and roll them up so that they can be forwarded into one of the system-wide statistics Topics that feed the analytics infrastructure. This is often done by computing averages or sums across a set of events for a particular duration, and forwarding only the computed values. In this case, you might be interested in trends on energy and temperature readings on a per-neighborhood basis with a 15-minute resolution. Thus the aggregator, running across two Web roles that each see roughly half of the data, will maintain a map of devices to houses and neighborhoods, aggregate data into per-neighborhood buckets as device events arrive, and emit an event with the computed aggregates every 15 minutes. Because each aggregator instance gets to see only half of the events, the analytics back end behind the statistics Topic might have to do a final rollup of the two events to get the full picture for the neighborhood.
The first MSDN Magazine article in this series, “Building the Internet of Things” (msdn.microsoft.com/magazine/hh852591), is about using Microsoft StreamInsight in the device context. It’s a great illustration of the rollup stage; the example described in the article could easily take the place of the aggregator in this architecture.
As Figure 4 shows, the Control System will send commands into Topics and devices will receive messages from subscriptions for fan-out of commands.
Figure 4 The Fan-out Model
The key argument for using a subscription per device is reliability. When you send a command, you want the command to eventually arrive at the device, even if the device is turned off or not currently connected. So you need the equivalent of a mailbox for the device.
The cost for that extra reliability and flexibility is somewhat greater complexity. The throughput ceiling for each entity I’ve discussed also manifests in the number of subscriptions Azure Service Bus allows on each Topic; the system quota currently limits a Topic to 2,000 subscriptions. How many subscriptions you actually allocate on each Topic for the fan-out scenario depends on the desired flow rate. The cost associated with filtering and selecting a message into a subscription can be looked at as a copy operation. Therefore, a Topic with 1,000 subscriptions will yield a flow rate of one message per second if you assume a per-entity throughput of 1,000 messages per second.
For the emergency commands, you need to send one command to all connected devices in a broadcast fashion. For targeted commands, where a consumer adjusts the temperature on a particular device or in a particular house via an app connected on 3G, the adjustment should happen within a few seconds.
To figure out whether one message per second is also good enough for targeted commands, assume that a resident will adjust or ask for the current status of his air conditioning system very rarely. You should expect most commands to occur during significant weather events. Pessimistically assume every device gets adjusted once every day and adjustments generally occur between 7 a.m. and 7 p.m. If you’re assuming a limit of 1,000 subscriptions for a Topic for the broadcast case, those commands would result in a requirement for 1.4 messages per minute. Even if everyone turned their air conditioning on during a single hour, you’d be OK with some 16 messages per minute. As a result of these scale and throughput considerations, limit each fan-out Topic to 1,000 subscriptions, which means you’ll need 50 fan-out Topics for each scale-unit.
You can target messages to particular devices or housing units or issue a broadcast message to all devices using subscription filters. Figure 5 illustrates a simple filter that allows messages carrying either a property with a particular DeviceId or a particular HousingUnit, or which have the custom Broadcast property set to true. If any of these conditions are true, the message is selected for the subscription. Thus, the control system can use the exact same Topic for broadcast and targeted messages and select the targets by setting the respective routing properties on the message.
Figure 5 Targeting Messages
The control system can also get replies to commands through its subscription on the fan-in side whereby command replies carry special filter properties that are handled in a similar fashion.
Azure Service Bus uses the notion of namespaces to organize resources and to govern access to those resources. Namespaces are created via the Azure Portal and are associated with a particular datacenter region. If your solution spans multiple large geographic regions, you can create multiple namespaces and then pin resources and clients to a particular Azure datacenter with the obvious advantage of shorter network routes. For this solution I might create a namespace in the North/Central U.S. region (clemensv-ac-ncus-prod.servicebus.windows.net) and one in the South/Central U.S. region (clemensv-ac-scus-prod); or I might create namespaces for particular utility customers (clemensv-ac-myenergy-prod). The “prod” suffix distinguishes production namespaces from possible parallel test or staging namespaces.
In order to make setup and management of scale-units easier, leverage the namespace’s hierarchical structure. The scale-unit name, forming the base path for all entities within the scale-unit, could be just a number, or it could be associated with a particular customer (/myenergy-3). Because you have a single ingestion Topic, you can put it directly underneath (/myenergy-3/fan-in) and then put the fan-out topics underneath a separate segment by simply numbering the topics (/myenergy-3/fan-out/01). The fan-out subscriptions for the individual devices follow that structure, using the device-identifier as the subscription name (/myenergy-3/fan-out/01/subscriptions/0123456).
Creating a new scale-unit is effectively a management script (using the Azure Service Bus API) that creates a new structure of this shape. The subscriptions for the devices and the filter rules are created during device provisioning. Of course, you also have to create, deploy and configure the matching storage and compute resources. For the event store, as noted earlier, you’ll set up a new table per scale-unit.
Provisioning is about onboarding a factory-new device as it gets installed in a housing unit. For this article I’ll skip over details such as device deactivation and just discuss a provisioning model in which the devices are set up with a unique device -id at the factory. The alternative model would be to have devices come from the factory entirely uninitialized and implement a scheme similar to what you might know from services such as Netflix, Hulu or Twitter, where the fresh device contacts a Web service to obtain a code and then displays the code, and the user logs on to an activation site by entering the code.
Figure 6 shows the flow for provisioning a device that has a previously assigned identifier. The first step in setting up the device is not shown, and deals with getting the device on a network, which might happen via wired Ethernet, Wi-Fi or some wireless carrier network.
Figure 6 The Provisioning Model
For provisioning, the system runs a special Web service that handles setting up and managing device identity and configuration. The partition allocator keeps track of how many devices have been activated, and which devices are assigned to which scale-unit. Once a device is associated with a scale-unit, the device gets provisioned into the scale-unit.
In step 1, the device establishes a connection to the provisioning service by presenting its device identifier. In this model, all issued device identifiers are on an allow list (step 2) that permits activating the device once. Once the device is verified to be good for activation, the provisioning service calls the Access Control Service (ACS) to create a new service identity and key for the device (step 3), and also creates a “Send” permission rule for the ingestion Topic and a “Listen” rule for the fan-out subscription created immediately afterward in step 4. With these, the device’s account has only and exactly the rights it needs to interact with the system. After step 4, you have a collection of all the resource URIs the device will interact with, as well as an account and key that the device can use for authenticating against the ACS to obtain a token. All that information is put into the response to the device’s activation call, and the device writes it into permanent configuration storage.
It’s worth noting that though a good amount of scale-out partitioning has been introduced in this article, the Access Control identities aren’t being partitioned in the example given here.
To alleviate the pressure on the ACS, you can simply have your client acquire fewer tokens. The default token expiration for the ACS relying party definition for Azure Service Bus is 1,200 seconds, or 20 minutes. You can dial this up to 24 hours, or 86,400 seconds, and have the device’s cache acquire tokens for that long. In this case, devices will have to acquire a fresh token only once per day. The only downside is that you can’t revoke access to the bearer of the token if such a long-lived token were intercepted, but for tokens that are constrained to send or listen on particular entities that seems like a tolerable risk.
Connecting “things” to and through the cloud is an area that the Azure Service Bus team will pay significant attention to over the next several years. The architecture outlined in this article is one it considers a good start, and it does—for instance, in the concept of scale-units—include hard-earned best practices from building Azure Service Bus itself. There’s a lot of room for optimizations around message flow, aggregation and distribution scenarios, not only in terms of capacity, but also for simpler programming and configuration models.
You’ll start seeing some of those optimizations, such as automatic forwarding, appear in Azure Service Bus within the next few months, and the team will be leveraging these new features to create better abstractions that will make scale-out scenarios like the one described here even simpler to manage. Distributing events to millions of devices in a timely fashion will always require a significant number of distinct system resources, but there’s a lot of abstraction magic you can add to reduce the visible number of moving parts.
To make the scenario more tangible and more fun, the next article in this series will help you build a simple air conditioner (yes, hardware!) powered by the Microsoft .NET Micro Framework. This little do-it-yourself air conditioner will talk to a back-end service via Azure Service Bus that will follow the general architecture explained here. See you then.
Clemens Vasters is the principal technical lead on the Azure Service Bus team. Vasters has been on the Service Bus team from the earliest incubation stages and works on the technical feature roadmap for Azure Service Bus, which includes push notifications and high-scale signaling for the Web and devices. He is also a frequent conference speaker and architecture courseware author. Follow him on Twitter at twitter.com/clemensv.
Thanks to the following technical experts for reviewing this article: Elio Damaggio, Abhishek Lal, Colin Miller and Todd Holmquist-Sutherland
Nice, thank you!
Great article! Thanks. Would be also nice to include pricing considerations.
More MSDN Magazine Blog entries >
Browse All MSDN Magazines
Subscribe to MSDN Flash newsletter
Receive the MSDN Flash e-mail newsletter every other week, with news and information personalized to your interests and areas of focus.