Marc Mercuri
Summary: This article discusses thedistributed-systems momentum, spotlighting its remaining challenges andconsiderations.
Introduction
Reference Scenario #1: Fabrikam Package Tracking Service
Reference Scenario #2: Product Finder Service
Conclusion
We’re at an exciting crossroads for distributed computing.Historically considered something of a black art, distributed computing is nowbeing brought to the masses. Emerging programming models empower mere mortalsto develop systems that are not only distributed, but concurrent—allowing forthe ability to both scale up to use all the cores in a single machine and outacross a farm.
The emergence of these programming models is happening in tandemwith the entry of pay-as-you-go cloud offerings for services, compute, andstorage. No longer do you need to make significant up-front investments inphysical hardware, a specialized environment in which to house it, or afull-time staff who will maintain it. The promise of cloud computing is thatanyone with an idea, a credit card, and some coding skills has the opportunityto develop scalable distributed systems.
With the major barriers to entry being lowered, there willwithout a doubt be a large influx of individuals and corporations that willeither introduce new offerings or expand on existing ones—be it with their ownfirst-party services, adding value on top of the services of others in newcomposite services, or distributed systems that use both categories of servicesas building blocks.
In the recent past, “distributed systems” in practice were oftendistributed across a number of nodes within a single Enterprise or a closegroup of business partners. As we move to this new model, they becomeDistributed with a capital D. This may be distributed across tenants on thesame host, located across multiple external hosts, or a mix of internal andexternally hosted services.
One of the much-heralded opportunities in this brave new world isthe creation of composite commercial services. Specifically, the ability for anindividual or a corporation to add incremental value to third-party servicesand then resell those compositions is cited as both an accelerator forinnovation and a revenue generation opportunity where individuals canparticipate at various levels.
This is indeed a very powerful opportunity, but looking below thesurface, there are a number of considerations for architecting this nextgeneration of systems. This article aims to highlight some of theseconsiderations and initiate discussion in the community.
When talking about composite services, it’s always easier if youhave an example to reference. In this article, we’ll refer back to a scenarioinvolving a simple composite service that includes four parties: DuwamishDelivery, Contoso Mobile, Fabrikam,and AdventureWorks (see Figure 1).
.jpg)
Figure 1. Simple composite-service scenario
Duwamish Delivery is an international company that deliverspackages. It offers a metered Web service that allows its consumers todetermine the current status of a package in transit.
Contoso Mobile is a mobile-phonecompany that provides a metered service that allows a service consumer to sendtext messages to mobile phones.
Fabrikam provides a composite servicethat consumes both the Duwamish Delivery Service and the ContosoMobile Service. The composite service is a long-running workflow that monitorsthe status of a package, and the day before the package is due to be delivered,sends a text-message notification to the recipient.
AdventureWorks is a retail Web sitethat sells bicycles and related equipment. Fabrikam’sservice is consumed by the AdventureWorks orderingsystem, and is called each time a new order is shipped.
A delta between supply and demand, whether the product is agadget, a toy, or a ticket, is not uncommon. Oftentimes, there are multiplevendors of these products with varying supply and price.
A. Datum Corporation sees this as an opportunity for a compositeservice and will build a product locator service that consumes first-partyservices from popular product-related service providers such as Amazon, eBay,Yahoo!, CNET, and others (see Figure 2).
.jpg)
Figure 2. Popular product-related service providers
The service allows consumers to specify a product name or UPCcode and returns a consolidated list of matching products, prices, andavailability from the data provided by the first-party services.
In addition, A. Datum Corporation’s service allows consumers tospecify both a timeframe and a price range. The composite service monitors theprice of the specified product during the timeframe identified, and when theprice and/or availability changed within that timeframe, a notification wouldbe sent to the consumer.
As the supply of high-demand items can deplete quickly, thisservice should interact with consumers and provide these notifications acrossdifferent touch points (e-mail, SMS text message, and so on). A. Datum willutilize its own e-mail servers, but for text messaging will leverage Contoso Mobile’s text-messaging service.
If composite services reach their potential, an interesting tree ofcustomers will emerge. Services will be composited into new composite services,which themselves will be composited. When identifying the customers of aservice, one must have a flexible means of referencing where a customer sits inthe hierarchy and be able to reflect their position as a customer in relationto the various services in the tree. A common approach is to refer to customerswith a designation of Cx, where xidentifies the distance of the customer from the service in question.
We will use this approach when referring to the participants inour reference scenarios.
In our first reference scenario, Fabrikamis a C1 of Duwamish Delivery and Contoso Mobile. AdventureWorks is a C1 of Fabrikam.AdventureWorks would be seen as a C2 of DuwamishDelivery and Contoso Mobile.
In our second scenario, A. Datum would be the C1 of Amazon, eBay,Yahoo!, CNET, and others. Consumers of the composite service would be seen as aC1 of A. Datum and a C2 of the first-party service providers.
Now that we have a lingua franca for discussing the customers inthe composite-services n-level hierarchy, let’s look at considerationsfor the services themselves.
It is common that services are designed as stateless, and typicallyinvoked with either a request/response or a one-way pattern. As you move to aworld of cloud services and distributed systems, you’ll see more and moreservices become long running and stateful, with“services” oftentimes now being an interface to a workflow or business process.
Both of our reference scenarios reflect this. In the firstscenario, a package-tracking service monitors the state of package until suchpoint as it’s delivered. Depending on the delivery option chosen, this processruns for a minimum of a day, but could run for multiple days, even weeks, asthe package makes its way to its destination. In our second reference scenario,the elapsed time could be even longer—weeks, months, even years.
When moving the focus from short-lived, stateless serviceinteractions to these stateful, longer runningservices, there are a number of important things to keep in mind when designingyour systems.
One of these considerations is the importance of the ability tosupport subscriptions and publications.
Many long-running, stateful servicesneed to relay information to their service consumers at some point in theirlifetime. You’ll want to design your services and systems to minimize frequentpolling of your service, which unnecessarily places a higher load on yourserver(s) and increases your bandwidth costs.
By allowing consumers to submit a subscription for notifications,you can publish acknowledgments, status messages, updates, or requests foraction as appropriate. In addition to bandwidth considerations, as you startbuilding composites from metered services, the costs associated with notimplementing a publish-and-subscribe mechanisms rise.
In reference scenario 2, the service is querying back-endecommerce sites for the availability of an item. If a consumer queries thisservice for a DVD that won’t be released until next year, it’s a waste ofresources for the consumer to query this service (and consequently the servicequerying the back-end providers) regularly for the next 365 days.
Scenario 2 also highlights that if your service communicates viamechanisms affiliated with consumer devices (that is, SMS text-message or voiceapplications), the value of publishing to your C1 is minimal; you’ll want toallow your C1 customers to subscribe to notifications not just for themselvesbut also for their customers.
Designing your services to support publications and subscriptionsis important in developing long-running services.
One of the great things about moving processes to services in thecloud is that the processes can run 24 hours a day, 7 days a week, for years ata time. They can be constantly monitoring, interacting, and participating onyour behalf.
Eventually, the service will need to interact with a consumer, beit to return an immediate response, an update, or a request to interact withthe underlying workflow of a service.
The reality is that while your services may be running 24/7 inthe same place, your consumer may not be. After you’ve created apublish-and-subscribe mechanism, you should consider how and when tocommunicate. Will you interact only when an end consumer’s laptop is open? Willyou send alerts via e-mail, SMS text message, RSS feed, a voice interface?
When designing your service, you should do so with theunderstanding that when you transition to the pay-as-you-go cloud, yourcustomers may range from a couple of guys in a garage to the upper echelons ofthe Fortune 500. The former may be interested in your service, but priced outof using it due to the associated costs for communication services like SMS orvoice interfaces. The latter group may not want to pay for your premiumservices, and instead utilize their own vendors. And there are a large group oforganizations in between, which would be happy to pay for the convenience.
When designing your solutions, be sure to consider the broadpotential audience for your offering. If you provide basic and premium versionsof your service, you’ll be able to readily adapt to audiences’ sensitivity, beit price, convenience, or control.
Per Wikipedia, “A white label product or service is a product orservice produced by one company (the producer) that other companies (themarketers) rebrand to make it appear as if they made it.”
While perhaps the writer of the entry did not have Web servicesin mind when writing the article, it is definitely applicable. In a world ofcomposite services, your service consumer may be your C1 but the end consumeris really a C9 at the end of a long chain of service compositions. For thosethat do choose to have your service perform communications on their behalf,white labeling should be a consideration.
In our first reference scenario, the service providing thecommunication mechanism is Contoso Mobile. Thatservice is consumed by Fabrikam who offers acomposite service to AdventureWorks Web site. Should Contoso Mobile, Fabrikam, AdventureWorks, or a combination brand the messaging?
When communicating to the end consumer, you will need to decidewhether to architect your service to include branding or other customization byyour C1. This decision initiates at the communication service provider, and ifoffered, is subsequently provided as the service is consumed and compositeddownstream.
When looking at whether or not to white label a service, you maywant to consider a hybrid model where you allow for some customization by yourC1 but still include your own branding in the outbound communications. In our firstreference scenario, this could take the form of each text message ending with“Delivered by Contoso Mobile.”
I refer to this as the “Hotmail model.” When the free e-mailservice Hotmail was introduced, it included a message at the bottom of eache-mail that identified that the message was sent via Hotmail. There isdefinitely a category of customers willing to allow this messaging for adiscounted version of a premium service.
After you’ve decided how these services will communicate and whowill have control over the messaging, you need to determine when thecommunication will occur and which communication mode is appropriate at a givenpoint of time.
Interaction no longer ends when a desktop client or a Web browseris closed, so you’ll want to allow your C1 to specify which modes ofcommunication should be used and when. This can take the form of a simple ruleor a set of nested rules that define an escalation path, such as:
communicate via e-mail
if no responsefor 15 minutes
communicate via SMS
if no response for 15 minutes
communicatevia voice
Rules for communication are important. Services are running indata centers and there’s no guarantee that those data centers are in the sametime zone as the consumer. In our global economy, servers could be on the otherside of the world, and you’ll want to make sure that you have a set of rulesidentifying whether it’s never appropriate to initiate a phone call at 3am orthat it is permitted given a number of circumstances.
If you provide communication mechanisms as part of your service,it will be important to design it such that your C1 can specify business rulesthat identify the appropriate conditions.
As stated previously, long-running, statefulservices could run for days, weeks, months, even years. This introduces someinteresting considerations, specifically as it relates to communicating and coordinatingsystem events.
When developing your service, consider how you will communicateinformation down the chain to the consumers of your service and up the chain tothe services you consume.
In a scenario where you have a chain of five services, and thethird service in the chain needs to shut down (either temporarily orpermanently), it should communicate to its consumers that it will be offlinefor a period of time. You should also provide a mechanism for a consumer torequest that you delay the shutdown of your service and notify them of yourresponse.
Similarly, your service should communicate to the services itconsumes. Your service should first inspect the state to determine if there isany activity currently ongoing with the service(s) that it consumes and, if so,communicate that it will be shut down and delay the relay of messages for a setperiod of time.
There will be times when it is appropriate for an application topoll your service to retrieve status. When designing your service, you’ll wantto provide an endpoint that serializes state relevant to a customer.
Two formats to consider for this are RSS and ATOM feeds. Theseprovide a number of key benefits, as specified in Table 1.
Table 1. Benefits of RSS and ATOM feeds
| Directly consumable by existing clients | A number of clients already have a means for consuming RSS or ATOM feeds. On the Windows platform, common applications including Microsoft Office Outlook, Internet Explorer, Vista Sidebar, and Sideshow have support to render RSS and ATOM feeds. |
| Meta-data benefits | Both RSS and ATOM have elements to identify categories, which can be used to relay information relevant to both software and UI-based consumers. Using this context, these consumers can make assumptions on how to best utilize the state payload contained within the feed. |
| Easily filterable and aggregatable | State could be delivered as an item in the feed, and those items could be easily aggregated or filtered compositing data from multiple feeds. |
| Client libraries exist on most platforms | Because RSS and ATOM are commonly used on the Web, there are client-side libraries readily available on most platforms. |
As soon as you take a dependency on a third-party service, youintroduce a point of failure that is directly out of your control. Unlikesoftware libraries where there is a fixed binary, third-party services can and do change without your involvement orknowledge.
As no provider promises 100 percent uptime, if you’re building acomposite service, you will want to identify comparable services for those thatyou consume and incorporate them into your architecture as a backup.
If, for example, Contoso Mobile’s SMSservice was to go offline, it would be helpful to be able to reroute requeststo a third-party service, say A. Datum Communications. While the interfaces to Contoso Mobile’s and A. Data Communication’s services maybe different, if they both use a common data contract to describe textmessages, it should be fairly straightforward to include failover in thearchitecture.
In the second reference scenario, for example, eBay and Amazonare both data sources. While each of these APIs has result sets that include alist of products, the data structures that represent these products aredifferent. In some cases, elements with the same or similar names have bothdifferent descriptions and data types.
This poses an issue not just for the aggregation of result sets,but also for the ability of a service consumer to seamlessly switch over fromVendor A’s service to that of another vendor’s service. For the serviceconsumer, it’s not just about vendor lock-in, but the ability to build asolution confidently that can failover readily to another provider. Whether astartup such as Twitter or a Web 1.0 heavyweight such as Amazon, vendor’sservices do go offline. And an offline service impacts the business’ bottomline.
Generic contracts—for both data and service definitions—providebenefits from a consumption and aggregation standpoint, and would provide valuein both of our reference scenarios. Service providers may be hesitant toutilize generic contracts, as they make it easier for customers to leave theirservice. The reality is that while generic contracts—for both data and servicedefinitions—do make it easier for a customer to switch, serious applicationswill want to have a secondary service as a backup. For those new to market,this should be seen as an opportunity and generic contracts should beincorporated into your design.
Many service providers today maintain a Web page that suppliesservice status and/or a separate service that allows service consumers toidentify whether a service is online. This is a nice (read: no debuggerrequired) sanity check that your consumers can use to determine whether yourservice is really offline or if the issue lies in their code.
If you’re offering composite services, you may want to provide alower level of granularity, which provides transparency not just on yourservice, but also on the underlying services you consume.
There is also an opportunity to enhance the standard statusservice design to allow individuals to subscribe to alerts on the status of aservice. This allows a status service to proactively notify a consumer when aservice may go offline and again when it returns to service. This isparticularly important to allow your consumers to proactively handle failovers,enabling some robust system-to-system interactions.
This can provide some significant nontechnical benefits as well,the biggest of which is a reduced number of support calls. This can also helpmitigate damage to your brand; if your service depends on a third-partyservice—say, Contoso Mobile—your customers may bemore understanding if they understand the issue is outside of your control.
For commercial services, you will, of course, want to identifyyour C1 consumers and provide a mechanism to authenticate them. Considerwhether you need to capture the identities of customers beyond C1 to C2 to C3to Cx.
Why would you want to do this? Two common scenarios: compliance andpayment.
If your service is subject to government compliance laws, you mayneed to audit who is making use of your service. Do you need to identify theconsumer of your service, the end-consumer, or the hierarchy of consumers inbetween?
In regards to payment, consider our first reference scenario.Duwamish and Contoso sell directly to Fabrikam, and Fabrikam sellsdirectly to AdventureWorks. It seems straightforward,but what if Fabrikam is paid by AdventureWorksbut decides not to pay Duwamish and Contoso? This isa simple composition, unless you expand the tree, with another company thatcreates a shopping cart service that consumes the Fabrikamservice, and is itself consumed in several e-commerce applications.
While you can easily shut off the service to a C1 customer, youmay want to design your service to selectively shut down incoming traffic. Thereality is that in a world of hosted services, the barriers to entry forswitching services should be lower. Shutting off a C1 customer may result in C2customers migrating to a new service provider out of necessity.
For example, if Fabrikam also sells itsservice to Amazon, Contoso Mobile and Duwamish mayrecognize that shutting off service to C2 customer Amazon could drive it toselect a new C1, significantly cutting into projected future revenues for both Contoso Mobile and Duwamish. AdventureWorks,however, is a different story and doesn’t drive significant traffic. Both Contoso and Duwamish could decide to reject service callsfrom AdventureWorks but allow service calls fromAmazon while they work through their financial dispute with Fabrikam.
There is no doubt that now is an exciting time to be designingsoftware, particularly in the space of distributed systems. This opens up anumber of opportunities and also requires a series of considerations. Thisarticle should be viewed as an introduction to some of these considerations,but is not an exhaustive list. The goal here was to introduce some of theseconsiderations, enough to serve as a catalyst to drive further conversation. Besure to visit the forums on the Architecture Journal site to continuethe conversation.
Marc Mercuri is a principalarchitect on the Platform Architecture Team, whose recent projects include thepublic-facing Web sites Tafiti.com and RoboChamps.com. Marc has been in thesoftware industry for 15 years and is the author of Beginning InformationCards and CardSpace: From Novice to Professional,and coauthored the books Windows Communication Foundation Unleashed! andWindows Communication Foundation: Hands On!
This article was published in the Architecture Journal, a printand online publication produced by Microsoft. For more articles from thispublication, please visit the Architecture Journal Web site.