.gif)
Eugenio Pace and Gianpaolo Carraro
Summary: Building on the Software as a
Service (Saas) momentum, distributed computing is evolving towards a model
where cloud-based infrastructure becomes a key player, to the level of having a
new breed of companies deploying the entirety of their assets to the cloud. Is
this reasonable? Only time will tell. In this article, we will take an
enthusiastic yet pragmatic look at the cloud opportunities. In particular, we
will propose a model for better deciding what should be pushed to the cloud and
what should be kept in-house, explore a few examples of cloud-based
infrastructure, and discuss the architectural trade-offs resulting from that
usage.
Contents
Introduction
LOtSS in the Enterprise: Big Pharma
Applying LOtSS to LitwareHR
Cloud Services at the UI
Cloud Services at the Data Layer
Authentication and Authorization in the Cloud
Decompose-Optimize-Recompose
Conclusion
Resources
Introduction
From a pure economical perspective, driving a car makes very
little sense. It is one of the most expensive ways of moving a unit of mass
over a unit of distance. If this is so expensive, compared to virtually all
other transportation means (public transport, trains, or even planes), why are
so many people driving cars? The answer is quite simple: control. Setting aside
the status-symbol element of a car, by choosing the car as a means of
transportation, a person has full control on when to depart, which route to
travel (scenic or highway), and maybe most appealing, the journey can start
from the garage to the final destination without having to rely on external
parties. Let’s contrast that with taking the train. Trains have strict
schedules, finite routes, depart and arrive only at train stations, might have
loud fellow passengers, are prone to go on strike. But, of course, the train is
cheaper, and you don’t have to drive. So, which is one is better? It depends if
you are optimizing for cost or for control.
Let’s continue this transportation analogy and look at freight
trains. Under the right circumstances—typically, long distances and bulk
freight—transport by rail is more economic and energy efficient than pretty much
anything else. For example, on June 21, 2001, a train more than seven
kilometers long, comprising 682 ore cars, made the Newman-Port Hedland trip in
Western Australia. It is hard to beat such economy of scale. It is important to
note, however, that this amazing feat was possible only at the expense of two
particular elements: choice of destination and type of goods transported.
Newman and Port Hedland were the only two cities that this train was capable of
transporting ore to and from. Trying to transport ore to any other cities or
transporting anything other than bulk would have required a different
transportation method.
By restricting the cities that the train can serve and
restricting the type of content it can transport, this train was able to carry
more than 650 wagons. If the same train was asked to transport both bulk and
passengers as well as being able to serve all the cities of the western coast
of Australia, the wagon count would have likely be reduced by at least an order
of magnitude.
The first key point we learn from the railroads is that high
optimization can be achieved through specialization. Another way to think about
it is that economy of scale is inversely proportional to the degree of freedom
a system has. Restricting the degrees of freedom (specializing the system)
achieves economy of scale (optimization).
- Lesson 1—The cloud can be seen as a specialized
system with fewer degrees of freedom than the on-premise alternative, but can
offer very high economy of scale.
Of course, moving goods from only two places is not very
interesting, even if you can do it very efficiently. This is why less optimized
but more agile means of transport such as trucks are often used to transport
smaller quantities of goods to more destinations.
As demonstrated by post offices around the globe, their delivery
network is a hybrid model including high economy of scale (point-to-point
transportations—for example, train between two major cities); medium economy of
scale (trucks dispatching mail from the train station to the multiple regional
centers); and very low economy of scale, but super-high flexibility (delivery
people using car, bike, or foot who are capable of reaching the most remote
dwelling).
- Lesson 2—By adopting a hybrid strategy, it is
possible to tap into economy of scale where possible while maintaining
flexibility and agility where necessary.
The final point we can learn from the railroads is the notion of
efficient transloading. Transloading happens when goods are transferred from
one means of transport to another; for example, from train to a truck as in the
postal service scenario. Since a lot of cost can be involved in transloading
when not done efficiently, it is commonly done in transloading hubs, where
specialized equipment can facilitate the process. Another important innovation
that lowered the transloading costs was the standard shipping container. By
packaging all goods in a uniform shipping container, the friction between the
different modes of transport was virtually removed and finer grained hybrid
transportation networks could be built.
- Lesson 3—Lowering the transloading costs through
techniques such as transloading hubs and containerization allows a much
finer-grained optimization of a global system.
In a world of low transloading costs, decisions no longer have to
be based on the constraints of the global system. Instead, decisions can be
optimized at the local subsystem without worrying about the impact on other
subsystems.
We call the application of these three key lessons:
- Optimization through specialization
- Hybrid strategy maximizing economy of scale where possible while maintaining flexibility and agility, where necessary
- Lowering transloading cost in the context of software architecture: localized optimization through selective specialization (or LOtSS)
The rest of this article is about applying LOtSS in an enterprise
scenario (Big Pharma) and an ISV scenario (LitwareHR)
LOtSS in the Enterprise: Big Pharma
How does LOtSS apply to the world of an enterprise? Let’s examine
it through the example of Big Pharma, a fictitious pharmaceutical company.
Big Pharma believes it does two things better than its
competition: clinical trials and molecular research. However, it found that 80
percent of its IT budget is allocated to less strategic assets such as e-mail,
CRM, and ERP. Of course, these capabilities are required to run the business,
but none of them really help differentiate Big Pharma from its competitors. As
Dr. Thomas (Big Pharma’s fictitious CEO) often says, “We have never beaten the competition
because we had a better ERP system… It is all about our molecules.”
Therefore, Dr. Thomas would be happy to get industry standard
services for a better price. This is why Big Pharma is looking into leveraging
“cloud offerings” to further optimize Big Pharma IT. The intent is to embrace a
hybrid architecture, where economy of scale can be tapped for commodity
services while retaining full control of the core capabilities. The challenge
is to smoothly run and manage IT assets that span across the corporate
boundaries. In LOtSS terms, Big Pharma is interested in optimizing its IT by
selectively using the cloud where economy of scale is attainable. For this to
work, Big Pharma needs to reduce to its minimum the “transloading” costs, which
in this case means crossing the corporate firewall.
After analysis, Big Pharma is ready to push out of its data
center two capabilities available at industry standard as a service by a
vendor: CRM and e-mail. Notice that “industry average” doesn’t necessarily mean
“low quality.” They are simply commoditized systems with features adequate for
Big Pharma’s needs. Because these providers offer these capabilities to a large
number of customers, Big Pharma can tap into the economies of scale of these
service providers, lowering TCO for each capability.
Big Pharma’s ERP, on the other hand, was heavily customized, and
the requirements are such that no SaaS ISV offering matches the need. However,
Big Pharma chooses to optimize this capability at a different level, by hosting
it outside its data center. It still owns the ERP software itself, but now
operations, hardware, A/C, and power are all the responsibility of the hoster.
(See Figure 1.)
.jpg)
Figure 1. Big Pharma on-premise vs. Big Pharma leveraging hosted
“commodity” services for noncore business capabilities and leveraging cloud
services for aspects of their critical software
- Lesson 4—Optimization can happen at different
levels. Companies can selectively outsource entire capabilities to highly
specialized domain-specific vendors (CRM and e-mail in this example) or just
aspects of an application (for example, the operations and hardware as in the
ERP).
The consequences of moving these systems outside of Big Pharma
corporate boundaries cannot be ignored. Three common aspects of IT—security,
management, and integration across these boundaries—potentially could introduce
unacceptable transloading costs.
From an access control perspective, Big Pharma does not want to
keep separate user name/passwords on each hosted service but wants to retain a
centralized control of authorization rules for its employees by leveraging the
already existing hierarchy of roles in its Active Directory. In addition, Big
Pharma wants a single sign-on experience for all its employees, regardless of
who provides the service or where it is hosted. (See Figure 2.)
.jpg)
Figure 2. Single sign-on through identity federation and a cloud STS
One proven solution for decreasing the cross-domain
authentication and authorization of “transloading costs” is federated identity
and claims-based authorization. There are well-known, standards-based
technologies to implement this efficiently.
- Lesson 5—Security systems today are often a
patchwork of multiple ad-hoc solutions. It is common for companies to have multiple
identity management systems in place. Companies like Big Pharma will favor
solutions that implement standards-based authentication and authorization
systems because it decreases transloading costs.
From a systems management perspective, Big Pharma employs a
dedicated IT staff to make sure everything is working on agreed SLAs. Also,
employees experiencing issues with the CRM, ERP, or e-mail will still call Big
Pharma’s helpdesk to get them resolved. It is therefore important that the IT
staff can manage the IT assets regardless of whether they are internal or
external. Therefore, each hosted system has to provide and expose a management
API that Big Pharma IT can integrate into its existing management tools. For
example, a ticket opened by an employee at Big Pharma for a problem with e-mail
will automatically open another one with the e-mail service provider. When the
issue is closed there, a notification is sent to Big Pharma that will in turn
close the issue with the employee. (See Figure 3.)
.jpg)
Figure 3. Big Pharma monitoring and controlling cloud services through management APIs
- Caveat—Standards around systems management,
across organization boundaries, are emerging (for example, WS-Management), but
they are not fully mainstream yet. In other words, “containerization” has not
happened completely, but it is happening.
Now let’s examine the systems that Big Pharma wants to invest in
the most: molecular research and clinical trials. Molecular research represents
the core of Big Pharma business: successful drugs. Typical requirements of
molecular research are modeling and simulation. Modeling requires high-end
workstations with highly specialized software. Big Pharma chooses to run this
software on-premise; due to its very complex visualizations requirements and
high interactivity with its scientists, it would be next to impossible to run
that part of the software in the cloud. However, simulations of these abstract
models are tasks that demand highly intensive computational resources. Not only
are these demands high, but they are also very variable. Big Pharma’s
scientists might come up with a model that requires the computational
equivalent of two thousands machine/day and then nothing for a couple of weeks
while the results are analyzed. Big Pharma could decide to invest at peak
capacity but, because of the highly elastic demands, it would just acquire lots
of CPUs that would, on average have a very low utilization rate. It could
decide to invest at median capacity, which would not be very helpful for peak requests.
Finally, Big Pharma chooses to subscribe to a service for raw computing
resources. Allocation and configuration of these machines is done on-demand by
Big Pharma’s modeling software, and the provider they use guarantees state of
the art hardware and networking. Because each simulation generates an
incredible amount of information, it also subscribes to a storage service that
will handle this unpredictably large amount of data at a much better cost than
an internal high-end storage system. Part of the raw data generated by the
cluster is then uploaded as needed into Big Pharma’s on-premise systems for
analysis and feedback into the modeling tools.
This hybrid approach offers the best of both worlds: The
simulations can run using all the computational resources needed, while costs
are kept under control as only what is used is billed. (See Figure 4.)
.jpg)
Figure 4. Big Pharma scientist modeling molecules on the
on-premise software and submitting a simulation job to an on-demand cloud
compute cluster
Here, again, for this to be really attractive, transloading costs
(in this example, the ability to bridge the cloud-based simulation platform and
the in-house modeling tool) must be managed; otherwise, all the benefits of the
utility platform would be lost in the crossing.
The other critical system for Big Pharma is clinical trials.
After modeling and simulation, test drugs are tried with actual patients for
effectiveness and side effects. Patients who participate in these trials must
communicate back to Big Pharma all kinds of health indicators as well as
various other pieces of information: what they are doing, where they are, how
they feel, and so on. Patients submit the collected information to both the
simulation cluster and the on-premise, specialized software that Big Pharma
uses to track clinical trials.
Similar to the molecule modeling software, Big Pharma develops
this system in-house, because of its highly specialized needs. One could argue
that Big Pharma applied LOtSS many times already in the past. It is likely that
the clinical-trial software communication subsystem allowing patients to share
their trial results evolved from an automated fax system several years ago to a
self-service Web portal nowadays, allowing the patients to submit their data
directly.
Participating patients use a wide variety of devices to interact
with Big Pharma: their phones, the Web, their personal health monitoring
systems. Therefore, Big Pharma decides to leverage a highly specialized
cloud-based messaging system to offer reliable, secure communications with high
SLAs. Building this on its own would be costly for the company. Big Pharma
doesn’t have the expertise required to develop and operate it, and they would
not benefit from the economies of scale that the cloud service benefits from.
The Internet Service Bus allows Big Pharma to interact with
patients using very sophisticated patterns. For example, it can broadcast
updates to the software running on its devices to selected groups of patients.
If, for any reason, Big Pharma’s clinical-trial software becomes unavailable,
the ISB will store pending messages and forward them as the system becomes
available again. (See Figure 5.)
.jpg)
Figure 5. Big Pharma clinical-trial patients submitting data to
clinical-trial system and simulation cluster
Patients, on the other hand, are not Big Pharma employees, and
therefore, are not part of Big Pharma’s directory. Big Pharma uses federation
to integrate patients’ identities with its security systems. Patients use one
of its existing Web identities such as Microsoft Live ID to authenticate
themselves; the cloud-identity service translates these authentication tokens
into tokens that are understood by the services with which patients interact.
In summary, this simplified scenario illustrates optimizations
that Big Pharma can achieve in its IT environment by selectively leveraging
economy-of-scale–prone specialized cloud services. This selection happens at
different levels of abstraction, from finished services (for example, e-mail,
CRM, and ERP) to building block services that happen only in the context of
another capability (for example, cloud compute, cloud-identity services, and
cloud storage).
Applying LOtSS to LitwareHR
As described in the previous scenario, optimization can happen at
many levels of abstraction. After all, it’s about extracting a component from a
larger system and replacing it with a cheaper substitute, while making sure
that the substitution does not introduce high transloading cost (defeating the
benefit of the substitution).
The fact is that ISVs have been doing this for a long time. Very
few ISVs develop their own relational database nowadays. This is because
acquiring a commercial, packaged RDBMS is cheaper, and building the equivalent
functionality brings marginal value and cannot be justified.
Adopting commercial RDBMS allowed ISVs to optimize their
investments, so there is no need to develop and maintain the “plumbing” that is
required to store, retrieve, and query data. ISVs could then focus those
resources on higher-value components of their applications. But it does not
come for free; there is legacy code, skills that must be acquired, new
programming paradigms to be learned, and so on. These are all, of course,
examples of “transloading” costs that result as a consequence of adopting a
specialized component.
The emergence and adoption of standards such as ANSI SQL and
common relational models are the equivalent of “containerization” and have
contributed to a decrease in these costs.
The market of visual components (for example, ActiveX controls)
is just another example of LOtSS: external suppliers creating specialized
components that resulted in optimizations in the larger solution. “Transloading
costs” were decreased by de-facto standards that everybody complied with tools.
(See Figure 6.)
.jpg)
Figure 6. Window built upon three specialized visual parts,
provided by three different vendors
Cloud services offer yet another opportunity for ISVs to optimize
aspects of their applications.
Cloud Services at the UI
“Mashups” are one of the very first examples of leveraging cloud
services. Cartographic services such as Microsoft Virtual Earth or Google Maps
allow anybody to add complex geographical and mapping features to an
application with very low cost. Developing an equivalent in-house system would
be prohibitive for most smaller ISVs, which simply could not afford the cost of
image acquisition and rendering. Established standards such as HTML, HTTP,
JavaScript, and XML, together with emerging ones such as GeoRSS, contribute
toward lowering the barrier of entry to these technologies. (See Figure 7.)
.jpg)
Figure 7. Web mashup using Microsoft Virtual Earth, ASP.NET, and
GeoRSS feeds
Cloud Services at the Data Layer
Storage is a fundamental aspect of every nontrivial application.
There are mainly two options today for storing information: a file system and a
relational database. The former is appropriate for large datasets, unstructured
information, documents, or proprietary file formats. The latter is normally used
for managing structured information using well-defined data models. There are
already hybrids even within these models. Many applications use XML as the
format to encode data and then store these datasets in the file system. This
approach provides database-like capabilities, such as retrieval and search
without the cost of a fully fledged RDBMS that might require a separate server
and extra maintenance.
The costs of running your own storage system include managing
disk space and implementing resilience and fault tolerance.
Current cloud-storage systems come in various flavors, too.
Generally speaking, there are two types: blob persistence services that are
roughly equivalent to a file system and simple, semistructured entities
persistence services that can be thought of as analogous to RDBMS. However,
pushing these equivalences too much can be dangerous, because cloud-storage
systems are essentially different than local ones. To begin with, cloud storage
apparently violates one of the most common system-design principles: Place data
close to compute. Because every time that you interact with the store you
necessarily make a request that goes across the network, applications that are
too “chatty” with its store can be severely affected by the unavoidable latency.
Fortunately, data comes in different types. Some pieces of
information never or very seldom change (reference data, historical data, and
so on) whereas some is very volatile (the current value of a stock, and so on).
This differentiation creates an opportunity for optimization.
One such optimization opportunity for leveraging cloud storage is
the archival scenario. In this scenario, the application stores locally data
that is frequently accessed or too volatile. All information that is seldom
used but cannot be deleted can be pushed to a cloud-storage service. (See
Figure 8.)
.jpg)
Figure 8. Application storing reference and historical data on a
cloud-storage service
With this optimization, the local store needs are reduced to the
most active data, thus reducing disk capacity, server capacity, and
maintenance.
There are costs with this approach though. Most programming
models of cloud-storage systems are considerably different from traditional
ones. This is one transloading cost that an application architect will need to
weigh in the context of the solution. Most cloud-storage services expose some
kind of SOAP or REST interface with very basic operations. There are no such
things as complex operators or query languages as of yet.
Authentication and Authorization in the Cloud
Traditional applications typically either have a user repository
that they own (often, stored as rows in database tables) or rely on a corporate
user directory such as Microsoft Active Directory.
Either of these very common ways of authenticating and
authorizing users to a system has significant limitations in a hybrid in-house
cloud scenario. The most obvious one is the proliferation of identity and
permissions databases as companies continue to apply LOtSS. The likelihood of
crossing organization boundaries increases dramatically and it is impractical
and unrealistic that all these databases will be synchronized with ad-hoc
mechanisms. Additionally, identity, authorization rules, and life cycle (user
provisioning and decommission) have to be integrated.
Companies today have afforded dealing with multiple identities
within their organization, because they have pushed that cost to their
employees. They are the ones having to remember 50 passwords for 50 different,
not integrated systems.
- Consider this: When a company fires an employee,
access to all the systems is traditionally restricted by simply preventing
him/her to enter the building and therefore to the corporate network. In a
company leveraging a cloud service, managing its own username/password as
opposed to federated identity, the (potentially annoyed and resentful) fired
employee is theoretically capable of opening a browser at home and logging on
to the system. What could happen then is left as an exercise to the reader.
As mentioned in the Big Pharma scenario, identity federation and
claims-based authorization are two technologies that are both standards-based
(fostering wide adoption) and easier to implement today by platform-level
frameworks built by major providers such as Microsoft. In addition to
frameworks, ISVs can leverage the availability of cloud-identity services,
which enables ISVs to optimize their infrastructure by pushing this entire
infrastructure to a specialized provider. (See Figure 9.)
.jpg)
Figure 9. Big Pharma consuming two services. CRM leverages cloud
identity for claim mapping, and ERP uses a host-based STS.
Decompose-Optimize-Recompose
The concept introduced in this article is fairly straightforward
and can be summarized in the following three steps: Decompose a system in
smaller subsystems; identify optimization opportunities at the subsystem level;
and recompose while minimizing the transloading cost introduced by the new
optimized subsystem. Unfortunately, in typical IT systems, there are too many
variables to allow a magic formula for discovering the candidates for
optimization. Experience gained from previous optimizations, along with
tinkering, will likely be the best weapons to approach LOtSS. That said, there
are high-level heuristics that can be used. For example, not all data is equal.
Read-only, public, benefitting from geo distribution data such as blog entries,
product catalogs, and videos is likely to be very cloud friendly; on the other
hand, volatile, transaction requiring, regulated, personally identifying data
will be more on-premise–friendly. Similarly, not all computation is equal;
constant predictable computation loads are less cloud attractive than variable,
unpredictable loads, which are better suited for utility type underlying
platform.
Conclusion
LOtSS uses the cloud as an optimization opportunity driving
overall costs down. Although this “faster, cheaper, better” view of the cloud
is very important and is likely to be a key driver of cloud utilization, it is
important to realize that it is only a partial view of what the cloud can
offer. The cloud also presents opportunities for “not previously possible”
scenarios.
IT history has shown that over time, what was unique and
strategic becomes commodity. The only thing that seems to stay constant is the
need to identify new optimization opportunities continuously by selectively
replacing subsystems with more cost-effective ones, while keeping the cost
introduced by the substitute low. In other words, it seems that one of the few
sustainable competitive advantages in IT is in fact the ability to master
LOtSS.
Resources
“Data on the Outside vs. Data on the Inside,” by Pat Helland (http://msdn.microsoft.com/en-us/library/ms954587.aspx)
Identity Management in the Cloud (http://msdn.microsoft.com/en-us/arcjournal/cc836390.aspx)
Microsoft Identity Framework, code-named “Zermatt” (https://connect.microsoft.com/site/sitehome.aspx?SiteID=642)
About the authors
In his current role, Gianpaolo
Carraro (Senior Director, Architecture Strategy) leads a team of architects
driving thought leadership and architectural best practices in the area of
Software + Services, SaaS, and cloud computing. Prior to Microsoft, Gianpaolo
helped inflate and then burst the .com bubble as cofounder and chief architect
of a SaaS startup. Gianpaolo started his career in research as a member of the
technical staff at Bell Laboratories. You can learn more about him through his
blog at http://blogs.msdn.com/gianpaolo/.
Eugenio Pace (Senior
Architect, Architecture Strategy) is responsible for developing architecture
guidance in the area of Software + Services, SaaS, and cloud computing. Before
joining the Architecture Strategy group, he worked in the patterns &
practices team at Microsoft, where he was responsible for delivering
client-side architecture guidance, including Web clients, smart clients, and
mobile clients. During that time, his team shipped the Composite UI Application
Block, and three software factories for mobile and desktop smart clients and
for Web development. Before joining Patterns and Practices, he was an architect
at Microsoft Consulting Services. You can find his blog at http://blogs.msdn.com/eugeniop.
This article was published in the Architecture Journal, a print
and online publication produced by Microsoft. For more articles from this
publication, please visit the Architecture Journal Web site.