The Impact of Virtualization on Software Architecture


Nishant Thorat

Arvind Raghavendran

June 2010

 

Summary: This article examines the impact of virtualization on software architecture and attempts to identify enabling architecture patterns.

Any problem in computer science can be solved with another layer of indirection. But that usually will create another problem.
–David Wheeler

 

Introduction

Virtualization has a profound impact on IT environments, as it abstracts the physical characteristics of computing resources from users. Virtualization techniques are mainly used for enhancing capacity (using excess computer resources efficiently), compatibility (running legacy applications on modern platforms), and manageability (easy patch management and deployment). However, one must not confuse virtualization technologies—such as OS virtualization and application virtualization—with virtualization itself. In principle, virtualization can be applied to provide an abstraction layer that helps solve complex problems in software architecture. As Table 1 shows, virtualization technologies are of different types and can be complementary in different architectural styles.

In this article, we examine the impact of virtualization on software architecture to support desired attributes—such as high dynamic scalability, high availability, and geographic distribution—in modern composite applications, as well as the benefits of extending a virtualized environment to traditional applications.

Traditionally, enterprise software was designed and developed to address specific enterprise problems. Hence, the architecture of the enterprise-application software stack would be confined to the enterprise boundary and often restricted to the n-tier architectural pattern. With the ever-increasing volume of unstructured data and unprecedented increase in transactional data volume, scalability is becoming an important factor for enterprise applications. The classic n-tier applications do not scale linearly, following Amdahl's law.1

The scalability problem can be overcome by applying the virtualization principles to the complete application stack. In this approach, each layer of the application stack is mapped to multiple physical resources through virtualization. This leads to logical separation at every level, data, components, and actual devices—hence, enabling scaling out without adding complexity.

 

Virtualization Attributes for Architects

Virtualization demands changes in the way in which we approach various attributes of software architecture, such as security, deployment, multitenancy, and communication.

Security

From an architectural perspective, there is some confusion about how virtualization differs from existing models of computing and how these differences affect the technological approaches to network- and  information-security practices.

An application-delivery infrastructure that is based on virtualization is a new approach that is made possible by advances in data-center virtualization, the availability of high-speed bandwidth, and innovations in endpoint sophistication:

Table 1. Virtualization technologies

  • Should data be kept in the data center? Centralizing the data into a data center facilitates easier maintenance of applications and better protection of confidential data. Also, operating costs are reduced, as less time is spent on managing employee endpoint resources.
  • Should application streaming be used? Application streaming provides business value to remote offices and retains control of confidential data. Virtualization of desktop applications reduces the amount of time that an application is exposed to potential vulnerabilities, as it transparently streams compliant images from the data center and reduces the risk of malicious code that might be lingering on corporate endpoints.
  • Should a wide variety of endpoint devices be supported? A greater scale of application virtualization can be achieved by providing the user with a wide variety of endpoint devices, such as  browser, remote display, or streamed-application access.

Deployment

Application virtualization has a huge impact on how most enterprise solutions are packaged and deployed:

  • Handling compatibility issues—Software appliances can provide the answer, as they have several benefits over traditional software applications. A software appliance dramatically simplifies software deployment by encapsulating an application's dependencies in a pre-integrated, self-contained unit. It frees users from having to worry about resolving potentially complex OS-compatibility issues, library dependencies, and undesirable interactions with other applications.
  • Impact on current systems—Do the virtualized applications integrate seamlessly with the existing products that are used for testing, deployment, and troubleshooting? If not, contingency plans for this new way of packaging applications must be incorporated. Ensure that the choice of virtualization technology allows for easy integration with the existing enterprise-application integration (EAI) strategies.
  • Additional hardware requirements—When we deploy multiple versions side by side on a machine, we must consider issues such as port conflicts, network capacity, and other hardware-related questions.

Multitenancy

Multitenancy—in which a single instance of the software, running on a service provider's premises, serves multiple organizations—is an important element of virtualization and implies a need for policy-driven enforcement, segmentation, isolation, governance, service levels, and chargeback/billing models for different consumer constituencies:

  • Isolation and security—Because tenants share the same instance of the software and hardware, it is possible for one tenant to affect the availability and performance of the software for other tenants. Ensure that application resources such as virtual portals, database tables, workflows, and Web services are shared between tenants, so that only users who belong to a tenant can access the instances that belong to that tenant.
  • Database customizability—Because the software is shared between tenants, it becomes difficult to customize the database for each tenant. Virtualization of the data layer and effective use of data partitioning—coupled with selection of the best approach among separate databases, a shared database with separate schemas, and a shared database with shared schema—will enable you to achieve your database-customization goals.
  • Usage-based metering—There are several methods for associating usage-metering data with a corresponding tenant. Authentication-based mapping, general HTTP- or SOAP-request parameter-based mapping, and application separation are some of  the methods that can be used:

Figure 1. Virtualization–SOA–composite-application relationship

Communication

Communication plays a key role in the new world of virtualization. With more and more applications acting as services, and composition coming into play, it is very important that there be a common integration platform to help these disparate services communicate with each other.

In traditional architectures, synchronous communication patterns are used. Synchronous communication requires both endpoints to be up and running. However, because of load-balancing or operational reasons such as patching, virtual endpoints could become unavailable at any time. Thus, to ensure the high-availability asynchronous communication, patterns should be used.

In recent years, the enterprise service bus (ESB) has permeated the IT vernacular of many organizations that are looking for a new "magic bullet" to deal with the ongoing challenge of connecting systems. In computing, an ESB consists of a software-architecture construct that provides fundamental services for complex architectures via an event-driven and standards-based messaging engine (the bus):

  • Brokered communication—ESBs send data between processes on the same or different computers. As with message-oriented middleware, ESBs use a software intermediary between the sender and the receiver to provide a brokered communication between them.
  • Address indirection and intelligent routing—ESBs typically resolve service addresses at runtime by maintaining some type of repository. They usually are capable of routing messages, based on a predefined set of criteria.
  • Basic Web-services support—ESBs support basic Web-services standards that include the Simple Object Access Protocol (SOAP) and Web Services Description Language (WSDL), as well as foundational standards such as TCP/IP and XML.

 

Virtualization, SOA, and Composite Applications

Composite applications are built from servicelets or microservices, which by nature are distributed, federated, and replicated. Servicelets are self-contained hardware and software building blocks. They could possibly communicate by embracing service-oriented architecture (SOA). It would be wrong to assume that composite applications are by definition SOA, although SOA is a major technology for implementing them. However, it would be interesting to infer that virtualization is the platform that could enable SOA-based composite applications. The inherent nature of virtualization provides a necessary indirection that is needed both for SOA and to build composite applications (see Figure 1).

Composite-Application Life Cycle

In a typical scenario, composite applications follow the Model-Assemble-Deploy-Manage2 SOA life-cycle pattern. However, for the purposes of our discussion, we propose a modified version of the pattern: Model [with optional Wrapper for legacy applications]-Assemble [through Service Discovery/Publish/Negotiation]-Deploy-Execute-Manage. We use this model to identify the patterns for each phase of the composite-application life cycle, as Table 2 shows:

Table 2. Application life-cycle patterns

 

Virtualization Patterns

For the purposes of this discussion, we consider access virtualization, workload virtualization, information virtualization, and platform virtualization (see Figure 2).

Access Virtualization & Related Patterns

Access virtualization is responsible for authentication and makes it possible for any type of device to be used to access any type of application. This gives developers an opportunity to develop applications without being restricted to any specific device.

Identity management and resource access are the main aspects of access virtualization.

 

Federated Identity Pattern

Identity in a virtualized environment must be implemented based on standards and should be federated across unrelated security domains. In a virtualized environment, the application components can be anywhere; hence, the identity should be context-aware. For example, for load-balancing purposes, your application component might move into a different security domain; to work it seamlessly in a new environment, the original identity should be federated into the new environment.

Figure 2. Virtualization patterns

 

Endpoint Virtualization Pattern

The Endpoint Virtualization pattern is used to hide client-device details—delivering a robust operating environment, irrespective of equipment, connectivity or location. This pattern provides application-streaming capabilities, application isolation, and licensing  control.

Application installation and configuration can be captured and streamed back in virtualized layers. The deployment process should complement the capturing process, identify the component dependency, and so on. Although complete isolation is an idealistic view, an application might require sharing OS resources and might also have linking to coexisting applications and resources. For example, it is unreasonable to expect that every .NET application would be virtualized with its own copy of .NET.

Also, while we are designing applications, we should refrain from making any specific assumptions about endpoint capabilities.

The Endpoint Virtualization pattern can also be a key component to various thin-client approaches, as it allows for the distribution of application layers and the capture of user sessions to maintain the personal preferences of the individual user.

Workload Virtualization & Related Patterns

Workload virtualization is responsible for selecting the most appropriate system resources to execute the workload tasks. This selection could be policy-driven and based on different criteria, such as resource availability and priority.

Workload virtualization uses the scheduler to optimize resource usage and introduces multiple patterns for resource management and  isolation.

 

Virtual Scaling Pattern

Application scaling is the ability of an application to adapt to enterprise growth in terms of IO throughput, number of users, and data capacity. The software does not need to change to handle these increases. Increased capacity demands can be met by creating a new virtual machine (VM) and running a new instance of the application. The load-balancing component chooses the appropriate VM that will execute the tasks.

The Virtual Scaling pattern (see Figure 3) allows on-demand applications to be scaled up and down, based on policies. These policies could be set to set up new VMs, to maintain the service quality in peak usage while decommissioning the VMs when demand subsides. The application should be able to communicate to the virtual-machine monitor/Virtual Machine Manager (VMM) to report its status. This pattern simplifies the scalable application design. The VMM hides all of the resource details from the application. The application resource status could be monitored by analyzing the usage statistics of queues, threads, response times, and memory in real time. This strategy is mostly adopted by enterprise applications that are being targeted for the Software-as-a-Service (SaaS) model.

Figure 3. Virtual Scaling pattern

 

MapReduce Pattern

The MapReduce pattern divides processing into many small blocks of work, distributes them throughout a cluster of computing nodes, and collects the results.

Applications can be encapsulated into VMs, and—because VM images are used as templates to deploy multiple copies, when needed—only the template images must be maintained. The nodes can communicate through virtual networks. Typically, in a virtualized environment that employs the MapReduce pattern, a VM image that has the necessary software is prepared and uploaded to the virtual-workspace factory server. When a new node is requested, the virtual-workspace factory server in each site selects the available physical nodes and starts the VM cloning process. VM clones are booted and configured by using the appropriate network addresses.

Information Virtualization & Related Patterns

Information virtualization ensures that appropriate data is available to processing units (PUs). The data could be cached on the system, if  required. Storage virtualization and I/O virtualization, together with a sophisticated distributed-caching mechanism, form the information-virtualization component.

Information virtualization could provide information from heterogeneous storage and I/O technologies. In a virtualized environment, applications are not bound to specific resources or endpoints. Applications could be provisioned on different endpoints, based on business requirements. To ensure that computation resources are utilized properly and are not wasted due to network-latency problems, the proper data should be made available to applications in a ubiquitous manner.

With the growth of Web 2.0 applications, there is an exponential rise in unstructured data, in addition to classic structured data. The unstructured data is by nature not suitable for the Relational database management system (RDBMS). Different types of storage facilities must be used, so that even unstructured data is indexed and quickly available. Also, the data must be replicated across systems; due to the network-latency data, it could become stale over a period of time.

 

Federated Virtualized Storage Pattern

The goal of the Federated Virtualized Storage pattern (see Figure 4) is to make data available across disparate systems through a pool of storage facilities. In this pattern, each data source is represented by its proxy data agent, which acts as an intermediary. The data agent can federate multiple data sources. It is also responsible for replication, the staging of data, caching, transformation, and encryption/decryption. The data agent has its low-level metadata that can contain the information regarding underlying data sources. The application maintains a data catalog that contains the higher level of metadata, such as replica locations. The data that is provided by the data agent can be stored in a local cache.

However, performance characteristics should be optimized and closely followed, as the aggregation logic takes place in a middle-tier server, instead of in the database.

Figure 4. Federated Virtualized Storage pattern

 

Distributed Caching Pattern

For information virtualization, caching is an important aspect, as the performance is affected by network latency. Latency could be attributed to three factors: network, middleware, and data sources. When these factors are within the network boundary, the solution could be designed by considering the maximum latency.

The Distributed Caching pattern is ideal when applications are distributed remotely. In this pattern, a central cache repository that is synched with local caches could be maintained. Local caches do not need to maintain the full copy.

Another variant of the system in which caches are maintained with each instance is complicated, however, because of high transaction costs, especially when the number of application instances increases exponentially.

 

Virtual Service Pattern

The Virtual Service pattern uses the service intermediary to mask the actual service. The service metadata is used to model the virtual services. Service models are executed by the service-virtualization engine to provide different service behaviors at runtime.

Because all communication travels through the service intermediary, it is possible for this central node to intercept each service invocation and introduce additional service behaviors (for  example, operation versioning, protocol mapping, monitoring, routing, run-time-policy enforcement, and custom behaviors), without  having any impact on the client or service code.

Thus, it is possible to provide service isolation by using this pattern. Machine-level isolation is ensured through the use of virtual machines. Machine-level and service-level isolations are in fact complementary. For example, all services from an organization can be configured to run at the service-isolation level, while the actual service implementations (which might be from different applications) could be isolated at the VM level.

Platform Virtualization & Related Patterns

Platform virtualization is made up of the actual virtualization layer and its monitoring component. It is responsible for management and orchestration of the workload on different virtual systems. This component ensures that the proper OS, middleware, and software stack is provisioned on the end systems that are used for execution.

The software architecture should complement the orchestration by providing application resource-usage metrics (feedback loop), as well as supporting deployment technologies for dynamic (de)provisioning.

 

Dynamic Resource Manage Pattern

In the Dynamic Resource Manage pattern, service level agreement (SLA)-based provisioning is supported, based on application characteristics (such as application criticality, and current and historic resource usage). The Provisioner and Orchestrator components work with workload virtualization to ensure that the business requirements are met. The Provisioner builds the software stack and allocates storage and network bandwidth, based on the attribute information—such as OS, middleware, and storage—that is provided by the Application Profile. The Orchestrator uses business policies to acquire new endpoints, based on the workload metrics—such as current load and queue sizes—that are provided by the applications.

 

Architectural Styles for Virtualization

Traditionally, we have been using the n-tier model for designing enterprise applications. This model no longer meets our needs, due to the virtualized nature of modern systems, which introduces a high degree of agility and complexity. To address the new challenges—such as demand for dynamic linear scaling, high availability, and model-driven applications—we must explore new architectural styles. We examine two prominent styles: event-based architecture and space-based architecture.

Event-Based Architecture

In event-based architecture, services that constitute applications publish events that might or might not be consumed by other services. The subscribing services can use the event to process some task. The data to process this task might be made available through the Information Virtualization layer. The publishing service is decoupled from the services that are dependent on it. This makes the architecture so resilient that any part of the system can go down without a disruption to the overall availability. To ensure proper distributed transaction support without the addition of much overhead, apology-based computing could be used.

By making heavy use of asynchronous communication patterns, durable queues, and read-only data caches, event-based architecture allows us to cope with the temporal unavailability of parts of the system. For example, the Windows Azure provides the .NET Service Bus for queue-based communication.

For additional information, please refer to http://msdn.microsoft.com/en-us/library/dd129913.aspx.4

Figure 5. Linear scalability in SBA

Space-Based Architecture (SBA)

The space-based architecture style is useful for achieving linear scalability of stateful, high-performance enterprise applications by building self-sufficient PUs. Unlike traditional tier-based applications, scalability is easily achieved by combining PUs.

Each of the PUs has its own messaging and storage facilities—thus, minimizing the external dependencies. Because each PU is an independent component, the sharing of state information across different software components is no longer needed. To ensure high availability and reliability, each of the PUs can have its failover instance running on different machines. Messages and data can be replicated asynchronously between PUs(see Figure 5).

A built-in SLA-driven deployment framework automatically deploys the PUs on as many machines as is necessary to meet the SLA. Deployment descriptors define the services that each PU comprises, their dependencies, their software and hardware requirements, and their SLAs.

The SBA architecture could be used to extend the tier-based applications to support linear scalability.

 

Conclusion

There is plenty more to be said about each of the topics that this article addresses. By this point, however, we hope that you have read enough to begin developing a software architectural stack on the constructs of virtualization, and see how you and your customers might benefit from it. Virtualization represents a new paradigm in software architecture—a model that is built on the principles of multitenant efficiency, massive scalability, and metadata-driven configurability to deliver good software inexpensively to both existing and potential customers.

This article is by no means the last word in software patterns that are based on virtualization. However, these approaches and patterns should help you identify and resolve many of the critical questions that you will face.

 

References

  1. Amdahl, G. M. "The Validity of the Single-Processor Approach to Achieving Large-Scale Computing Capabilities." In Proceedings of AFIPS Spring Joint Computer Conference (April 1967). Atlantic City, NJ: AFIPS Press, 483–485.
  2. Sanchez, Matt, and Naveen Balani. " Creating Flexible Service-Oriented Business Solutions with WebSphere Business Services Fabric, Part 1: Overview." developerWorks (IBM website), August 2007.
  3. N.R. Jennings et al. "Autonomous Agents for Business Process Management." Applied Artificial Intelligence. Volume 14, no. 2 (2000): 145–189.
  4. Chou, David. " Using Events in Highly Distributed Architectures." The Architecture Journal (Microsoft website), October 2008.

 

About the Authors

Nishant Thorat ( Nishant.thorat@ca.com) is a Principal Software Engineer at CA in the Virtualization and Service Automation group. Nishant has filed several patents with the United States Patent and Trademark Office (USPTO) in the fields of desktop management, virtualization, and green IT. He is an active proponent of open standards and is a member of DMTF Cloud Incubator. His interests include cloud computing, virtualization, service management, and software architecture. Nishant received a bachelor's degree in Computer Engineering from the University of Pune, India, in 2001.

Arvind Raghavendran ( Arvind.Raghavendran@gmail.com) is a Principal Software Engineer with CA and specializes in Microsoft technologies in Hyderabad, India. He has filed patents with the USPTO in the field of desktop IT management. His interests include virtualization and software-design patterns. Arvind received a bachelor's degree in Computer Science from Bangalore University, India, in 1998.