Eugenio Pace and Gianpaolo Carraro
Summary: Building on the Software as aService (Saas) momentum, distributed computing is evolving towards a modelwhere cloud-based infrastructure becomes a key player, to the level of having anew breed of companies deploying the entirety of their assets to the cloud. Isthis reasonable? Only time will tell. In this article, we will take anenthusiastic yet pragmatic look at the cloud opportunities. In particular, wewill propose a model for better deciding what should be pushed to the cloud andwhat should be kept in-house, explore a few examples of cloud-basedinfrastructure, and discuss the architectural trade-offs resulting from thatusage.
Introduction
LOtSS in the Enterprise: Big Pharma
Applying LOtSS to LitwareHR
Cloud Services at the UI
Cloud Services at the Data Layer
Authentication and Authorization in the Cloud
Decompose-Optimize-Recompose
Conclusion
Resources
From a pure economical perspective, driving a car makes verylittle sense. It is one of the most expensive ways of moving a unit of massover a unit of distance. If this is so expensive, compared to virtually allother transportation means (public transport, trains, or even planes), why areso many people driving cars? The answer is quite simple: control. Setting asidethe status-symbol element of a car, by choosing the car as a means oftransportation, a person has full control on when to depart, which route totravel (scenic or highway), and maybe most appealing, the journey can startfrom the garage to the final destination without having to rely on externalparties. Let’s contrast that with taking the train. Trains have strictschedules, finite routes, depart and arrive only at train stations, might haveloud fellow passengers, are prone to go on strike. But, of course, the train ischeaper, and you don’t have to drive. So, which is one is better? It depends ifyou are optimizing for cost or for control.
Let’s continue this transportation analogy and look at freighttrains. Under the right circumstances—typically, long distances and bulkfreight—transport by rail is more economic and energy efficient than pretty muchanything else. For example, on June 21, 2001, a train more than sevenkilometers long, comprising 682 ore cars, made the Newman-Port Hedland trip inWestern Australia. It is hard to beat such economy of scale. It is important tonote, however, that this amazing feat was possible only at the expense of twoparticular elements: choice of destination and type of goods transported.Newman and Port Hedland were the only two cities that this train was capable oftransporting ore to and from. Trying to transport ore to any other cities ortransporting anything other than bulk would have required a differenttransportation method.
By restricting the cities that the train can serve andrestricting the type of content it can transport, this train was able to carrymore than 650 wagons. If the same train was asked to transport both bulk andpassengers as well as being able to serve all the cities of the western coastof Australia, the wagon count would have likely be reduced by at least an orderof magnitude.
The first key point we learn from the railroads is that highoptimization can be achieved through specialization. Another way to think aboutit is that economy of scale is inversely proportional to the degree of freedoma system has. Restricting the degrees of freedom (specializing the system)achieves economy of scale (optimization).
Of course, moving goods from only two places is not veryinteresting, even if you can do it very efficiently. This is why less optimizedbut more agile means of transport such as trucks are often used to transportsmaller quantities of goods to more destinations.
As demonstrated by post offices around the globe, their deliverynetwork is a hybrid model including high economy of scale (point-to-pointtransportations—for example, train between two major cities); medium economy ofscale (trucks dispatching mail from the train station to the multiple regionalcenters); and very low economy of scale, but super-high flexibility (deliverypeople using car, bike, or foot who are capable of reaching the most remotedwelling).
The final point we can learn from the railroads is the notion ofefficient transloading. Transloading happens when goods are transferred fromone means of transport to another; for example, from train to a truck as in thepostal service scenario. Since a lot of cost can be involved in transloadingwhen not done efficiently, it is commonly done in transloading hubs, wherespecialized equipment can facilitate the process. Another important innovationthat lowered the transloading costs was the standard shipping container. Bypackaging all goods in a uniform shipping container, the friction between thedifferent modes of transport was virtually removed and finer grained hybridtransportation networks could be built.
In a world of low transloading costs, decisions no longer have tobe based on the constraints of the global system. Instead, decisions can beoptimized at the local subsystem without worrying about the impact on othersubsystems.
We call the application of these three key lessons:
The rest of this article is about applying LOtSS in an enterprisescenario (Big Pharma) and an ISV scenario (LitwareHR)
How does LOtSS apply to the world of an enterprise? Let’s examineit through the example of Big Pharma, a fictitious pharmaceutical company.
Big Pharma believes it does two things better than itscompetition: clinical trials and molecular research. However, it found that 80percent of its IT budget is allocated to less strategic assets such as e-mail,CRM, and ERP. Of course, these capabilities are required to run the business,but none of them really help differentiate Big Pharma from its competitors. AsDr. Thomas (Big Pharma’s fictitious CEO) often says, “We have never beaten the competitionbecause we had a better ERP system… It is all about our molecules.”
Therefore, Dr. Thomas would be happy to get industry standardservices for a better price. This is why Big Pharma is looking into leveraging“cloud offerings” to further optimize Big Pharma IT. The intent is to embrace ahybrid architecture, where economy of scale can be tapped for commodityservices while retaining full control of the core capabilities. The challengeis to smoothly run and manage IT assets that span across the corporateboundaries. In LOtSS terms, Big Pharma is interested in optimizing its IT byselectively using the cloud where economy of scale is attainable. For this towork, Big Pharma needs to reduce to its minimum the “transloading” costs, whichin this case means crossing the corporate firewall.
After analysis, Big Pharma is ready to push out of its datacenter two capabilities available at industry standard as a service by avendor: CRM and e-mail. Notice that “industry average” doesn’t necessarily mean“low quality.” They are simply commoditized systems with features adequate forBig Pharma’s needs. Because these providers offer these capabilities to a largenumber of customers, Big Pharma can tap into the economies of scale of theseservice providers, lowering TCO for each capability.
Big Pharma’s ERP, on the other hand, was heavily customized, andthe requirements are such that no SaaS ISV offering matches the need. However,Big Pharma chooses to optimize this capability at a different level, by hostingit outside its data center. It still owns the ERP software itself, but nowoperations, hardware, A/C, and power are all the responsibility of the hoster.(See Figure 1.)
.jpg)
Figure 1. Big Pharma on-premise vs. Big Pharma leveraging hosted“commodity” services for noncore business capabilities and leveraging cloudservices for aspects of their critical software
The consequences of moving these systems outside of Big Pharmacorporate boundaries cannot be ignored. Three common aspects of IT—security,management, and integration across these boundaries—potentially could introduceunacceptable transloading costs.
From an access control perspective, Big Pharma does not want tokeep separate user name/passwords on each hosted service but wants to retain acentralized control of authorization rules for its employees by leveraging thealready existing hierarchy of roles in its Active Directory. In addition, BigPharma wants a single sign-on experience for all its employees, regardless ofwho provides the service or where it is hosted. (See Figure 2.)
.jpg)
Figure 2. Single sign-on through identity federation and a cloud STS
One proven solution for decreasing the cross-domainauthentication and authorization of “transloading costs” is federated identityand claims-based authorization. There are well-known, standards-basedtechnologies to implement this efficiently.
From a systems management perspective, Big Pharma employs adedicated IT staff to make sure everything is working on agreed SLAs. Also,employees experiencing issues with the CRM, ERP, or e-mail will still call BigPharma’s helpdesk to get them resolved. It is therefore important that the ITstaff can manage the IT assets regardless of whether they are internal orexternal. Therefore, each hosted system has to provide and expose a managementAPI that Big Pharma IT can integrate into its existing management tools. Forexample, a ticket opened by an employee at Big Pharma for a problem with e-mailwill automatically open another one with the e-mail service provider. When theissue is closed there, a notification is sent to Big Pharma that will in turnclose the issue with the employee. (See Figure 3.)
.jpg)
Figure 3. Big Pharma monitoring and controlling cloud services through management APIs
Now let’s examine the systems that Big Pharma wants to invest inthe most: molecular research and clinical trials. Molecular research representsthe core of Big Pharma business: successful drugs. Typical requirements ofmolecular research are modeling and simulation. Modeling requires high-endworkstations with highly specialized software. Big Pharma chooses to run thissoftware on-premise; due to its very complex visualizations requirements andhigh interactivity with its scientists, it would be next to impossible to runthat part of the software in the cloud. However, simulations of these abstractmodels are tasks that demand highly intensive computational resources. Not onlyare these demands high, but they are also very variable. Big Pharma’sscientists might come up with a model that requires the computationalequivalent of two thousands machine/day and then nothing for a couple of weekswhile the results are analyzed. Big Pharma could decide to invest at peakcapacity but, because of the highly elastic demands, it would just acquire lotsof CPUs that would, on average have a very low utilization rate. It coulddecide to invest at median capacity, which would not be very helpful for peak requests.Finally, Big Pharma chooses to subscribe to a service for raw computingresources. Allocation and configuration of these machines is done on-demand byBig Pharma’s modeling software, and the provider they use guarantees state ofthe art hardware and networking. Because each simulation generates anincredible amount of information, it also subscribes to a storage service thatwill handle this unpredictably large amount of data at a much better cost thanan internal high-end storage system. Part of the raw data generated by thecluster is then uploaded as needed into Big Pharma’s on-premise systems foranalysis and feedback into the modeling tools.
This hybrid approach offers the best of both worlds: Thesimulations can run using all the computational resources needed, while costsare kept under control as only what is used is billed. (See Figure 4.)
.jpg)
Figure 4. Big Pharma scientist modeling molecules on theon-premise software and submitting a simulation job to an on-demand cloudcompute cluster
Here, again, for this to be really attractive, transloading costs(in this example, the ability to bridge the cloud-based simulation platform andthe in-house modeling tool) must be managed; otherwise, all the benefits of theutility platform would be lost in the crossing.
The other critical system for Big Pharma is clinical trials.After modeling and simulation, test drugs are tried with actual patients foreffectiveness and side effects. Patients who participate in these trials mustcommunicate back to Big Pharma all kinds of health indicators as well asvarious other pieces of information: what they are doing, where they are, howthey feel, and so on. Patients submit the collected information to both thesimulation cluster and the on-premise, specialized software that Big Pharmauses to track clinical trials.
Similar to the molecule modeling software, Big Pharma developsthis system in-house, because of its highly specialized needs. One could arguethat Big Pharma applied LOtSS many times already in the past. It is likely thatthe clinical-trial software communication subsystem allowing patients to sharetheir trial results evolved from an automated fax system several years ago to aself-service Web portal nowadays, allowing the patients to submit their datadirectly.
Participating patients use a wide variety of devices to interactwith Big Pharma: their phones, the Web, their personal health monitoringsystems. Therefore, Big Pharma decides to leverage a highly specializedcloud-based messaging system to offer reliable, secure communications with highSLAs. Building this on its own would be costly for the company. Big Pharmadoesn’t have the expertise required to develop and operate it, and they wouldnot benefit from the economies of scale that the cloud service benefits from.
The Internet Service Bus allows Big Pharma to interact withpatients using very sophisticated patterns. For example, it can broadcastupdates to the software running on its devices to selected groups of patients.If, for any reason, Big Pharma’s clinical-trial software becomes unavailable,the ISB will store pending messages and forward them as the system becomesavailable again. (See Figure 5.)
.jpg)
Figure 5. Big Pharma clinical-trial patients submitting data toclinical-trial system and simulation cluster
Patients, on the other hand, are not Big Pharma employees, andtherefore, are not part of Big Pharma’s directory. Big Pharma uses federationto integrate patients’ identities with its security systems. Patients use oneof its existing Web identities such as Microsoft Live ID to authenticatethemselves; the cloud-identity service translates these authentication tokensinto tokens that are understood by the services with which patients interact.
In summary, this simplified scenario illustrates optimizationsthat Big Pharma can achieve in its IT environment by selectively leveragingeconomy-of-scale–prone specialized cloud services. This selection happens atdifferent levels of abstraction, from finished services (for example, e-mail,CRM, and ERP) to building block services that happen only in the context ofanother capability (for example, cloud compute, cloud-identity services, andcloud storage).
As described in the previous scenario, optimization can happen atmany levels of abstraction. After all, it’s about extracting a component from alarger system and replacing it with a cheaper substitute, while making surethat the substitution does not introduce high transloading cost (defeating thebenefit of the substitution).
The fact is that ISVs have been doing this for a long time. Veryfew ISVs develop their own relational database nowadays. This is becauseacquiring a commercial, packaged RDBMS is cheaper, and building the equivalentfunctionality brings marginal value and cannot be justified.
Adopting commercial RDBMS allowed ISVs to optimize theirinvestments, so there is no need to develop and maintain the “plumbing” that isrequired to store, retrieve, and query data. ISVs could then focus thoseresources on higher-value components of their applications. But it does notcome for free; there is legacy code, skills that must be acquired, newprogramming paradigms to be learned, and so on. These are all, of course,examples of “transloading” costs that result as a consequence of adopting aspecialized component.
The emergence and adoption of standards such as ANSI SQL andcommon relational models are the equivalent of “containerization” and havecontributed to a decrease in these costs.
The market of visual components (for example, ActiveX controls)is just another example of LOtSS: external suppliers creating specializedcomponents that resulted in optimizations in the larger solution. “Transloadingcosts” were decreased by de-facto standards that everybody complied with tools.(See Figure 6.)
.jpg)
Figure 6. Window built upon three specialized visual parts,provided by three different vendors
Cloud services offer yet another opportunity for ISVs to optimizeaspects of their applications.
“Mashups” are one of the very first examples of leveraging cloudservices. Cartographic services such as Microsoft Virtual Earth or Google Mapsallow anybody to add complex geographical and mapping features to anapplication with very low cost. Developing an equivalent in-house system wouldbe prohibitive for most smaller ISVs, which simply could not afford the cost ofimage acquisition and rendering. Established standards such as HTML, HTTP,JavaScript, and XML, together with emerging ones such as GeoRSS, contributetoward lowering the barrier of entry to these technologies. (See Figure 7.)
.jpg)
Figure 7. Web mashup using Microsoft Virtual Earth, ASP.NET, andGeoRSS feeds
Storage is a fundamental aspect of every nontrivial application.There are mainly two options today for storing information: a file system and arelational database. The former is appropriate for large datasets, unstructuredinformation, documents, or proprietary file formats. The latter is normally usedfor managing structured information using well-defined data models. There arealready hybrids even within these models. Many applications use XML as theformat to encode data and then store these datasets in the file system. Thisapproach provides database-like capabilities, such as retrieval and searchwithout the cost of a fully fledged RDBMS that might require a separate serverand extra maintenance.
The costs of running your own storage system include managingdisk space and implementing resilience and fault tolerance.
Current cloud-storage systems come in various flavors, too.Generally speaking, there are two types: blob persistence services that areroughly equivalent to a file system and simple, semistructured entitiespersistence services that can be thought of as analogous to RDBMS. However,pushing these equivalences too much can be dangerous, because cloud-storagesystems are essentially different than local ones. To begin with, cloud storageapparently violates one of the most common system-design principles: Place dataclose to compute. Because every time that you interact with the store younecessarily make a request that goes across the network, applications that aretoo “chatty” with its store can be severely affected by the unavoidable latency.
Fortunately, data comes in different types. Some pieces ofinformation never or very seldom change (reference data, historical data, andso on) whereas some is very volatile (the current value of a stock, and so on).This differentiation creates an opportunity for optimization.
One such optimization opportunity for leveraging cloud storage isthe archival scenario. In this scenario, the application stores locally datathat is frequently accessed or too volatile. All information that is seldomused but cannot be deleted can be pushed to a cloud-storage service. (SeeFigure 8.)
.jpg)
Figure 8. Application storing reference and historical data on acloud-storage service
With this optimization, the local store needs are reduced to themost active data, thus reducing disk capacity, server capacity, andmaintenance.
There are costs with this approach though. Most programmingmodels of cloud-storage systems are considerably different from traditionalones. This is one transloading cost that an application architect will need toweigh in the context of the solution. Most cloud-storage services expose somekind of SOAP or REST interface with very basic operations. There are no suchthings as complex operators or query languages as of yet.
Traditional applications typically either have a user repositorythat they own (often, stored as rows in database tables) or rely on a corporateuser directory such as Microsoft Active Directory.
Either of these very common ways of authenticating andauthorizing users to a system has significant limitations in a hybrid in-housecloud scenario. The most obvious one is the proliferation of identity andpermissions databases as companies continue to apply LOtSS. The likelihood ofcrossing organization boundaries increases dramatically and it is impracticaland unrealistic that all these databases will be synchronized with ad-hocmechanisms. Additionally, identity, authorization rules, and life cycle (userprovisioning and decommission) have to be integrated.
Companies today have afforded dealing with multiple identitieswithin their organization, because they have pushed that cost to theiremployees. They are the ones having to remember 50 passwords for 50 different,not integrated systems.
As mentioned in the Big Pharma scenario, identity federation andclaims-based authorization are two technologies that are both standards-based(fostering wide adoption) and easier to implement today by platform-levelframeworks built by major providers such as Microsoft. In addition toframeworks, ISVs can leverage the availability of cloud-identity services,which enables ISVs to optimize their infrastructure by pushing this entireinfrastructure to a specialized provider. (See Figure 9.)
.jpg)
Figure 9. Big Pharma consuming two services. CRM leverages cloudidentity for claim mapping, and ERP uses a host-based STS.
The concept introduced in this article is fairly straightforwardand can be summarized in the following three steps: Decompose a system insmaller subsystems; identify optimization opportunities at the subsystem level;and recompose while minimizing the transloading cost introduced by the newoptimized subsystem. Unfortunately, in typical IT systems, there are too manyvariables to allow a magic formula for discovering the candidates foroptimization. Experience gained from previous optimizations, along withtinkering, will likely be the best weapons to approach LOtSS. That said, thereare high-level heuristics that can be used. For example, not all data is equal.Read-only, public, benefitting from geo distribution data such as blog entries,product catalogs, and videos is likely to be very cloud friendly; on the otherhand, volatile, transaction requiring, regulated, personally identifying datawill be more on-premise–friendly. Similarly, not all computation is equal;constant predictable computation loads are less cloud attractive than variable,unpredictable loads, which are better suited for utility type underlyingplatform.
LOtSS uses the cloud as an optimization opportunity drivingoverall costs down. Although this “faster, cheaper, better” view of the cloudis very important and is likely to be a key driver of cloud utilization, it isimportant to realize that it is only a partial view of what the cloud canoffer. The cloud also presents opportunities for “not previously possible”scenarios.
IT history has shown that over time, what was unique andstrategic becomes commodity. The only thing that seems to stay constant is theneed to identify new optimization opportunities continuously by selectivelyreplacing subsystems with more cost-effective ones, while keeping the costintroduced by the substitute low. In other words, it seems that one of the fewsustainable competitive advantages in IT is in fact the ability to masterLOtSS.
“Data on the Outside vs. Data on the Inside,” by Pat Helland (http://msdn.microsoft.com/en-us/library/ms954587.aspx)
Identity Management in the Cloud (http://msdn.microsoft.com/en-us/arcjournal/cc836390.aspx)
Microsoft Identity Framework, code-named “Zermatt” (https://connect.microsoft.com/site/sitehome.aspx?SiteID=642)
In his current role, GianpaoloCarraro (Senior Director, Architecture Strategy) leads a team of architectsdriving thought leadership and architectural best practices in the area ofSoftware + Services, SaaS, and cloud computing. Prior to Microsoft, Gianpaolohelped inflate and then burst the .com bubble as cofounder and chief architectof a SaaS startup. Gianpaolo started his career in research as a member of thetechnical staff at Bell Laboratories. You can learn more about him through hisblog at http://blogs.msdn.com/gianpaolo/.
Eugenio Pace (SeniorArchitect, Architecture Strategy) is responsible for developing architectureguidance in the area of Software + Services, SaaS, and cloud computing. Beforejoining the Architecture Strategy group, he worked in the patterns &practices team at Microsoft, where he was responsible for deliveringclient-side architecture guidance, including Web clients, smart clients, andmobile clients. During that time, his team shipped the Composite UI ApplicationBlock, and three software factories for mobile and desktop smart clients andfor Web development. Before joining Patterns and Practices, he was an architectat Microsoft Consulting Services. You can find his blog at http://blogs.msdn.com/eugeniop.
This article was published in the Architecture Journal, a printand online publication produced by Microsoft. For more articles from thispublication, please visit the Architecture Journal Web site.