Problems and Innovations
Summary: The second in a four-part series of articles about software factories, focusing on the chronic problems that are slowing the inevitable transition from craftsmanship to manufacturing (described in the previous article), and the critical innovations that will help us overcome them. The previous article, published in the July 2004 edition of the Microsoft Architects Journal, described the forces that will drive the transition. (21 printed pages)
What's Holding Us Back?
The Current Economic Model
Chronic Software Development Problems
How Do We Move Forward?
Development by Assembly
In the previous article, we concluded that we know how to increase the level of automation in software development. Why, then aren't we doing it? There are two reasons.
The first reason is the current economic model for commercially sustainable reuse. In the previous article, we discussed the Language Framework pattern, which describes the progression toward automation:
- After developing a number of systems in a given problem domain, we identify a set of reusable abstractions for that domain, and then we document a set of patterns for using those abstractions.
- We then develop a runtime, such as a framework or server, to codify the abstractions and patterns. This lets us build systems in the domain by instantiating, adapting, configuring, and assembling components defined by the runtime.
- We then define a language and build tools that support the language, such as editors, compilers and debuggers, to automate the assembly process. This helps us respond faster to changing requirements, since part of the implementation is generated, and can be easily changed.
The first two parts of this pattern are within reach for most organizations, but the third part is not. Developing language-based tools is currently quite expensive, and therefore makes economic sense only in broad horizontal domains, such as user interface construction and data access, where the necessary investments can be amortized over a large customer base. For this reason, commercial implementations of the Language Framework pattern are supplied almost exclusively by platform vendors.
This economic model is threatened by the trade off between scope and value for abstractions. As Jackson puts it, the value of an abstraction increases with its specificity to the problem domain [Jac00]. The higher the level of an abstraction, the narrower the domain to which it applies, but the more value it provides in solving problems in that domain, as illustrated in Figure 1.
Figure 1. Scope versus value for abstractions
We are now at a point in the evolution of the software industry where frameworks and tools must become more vertically focused, in order to realize further productivity gains. They must provide more value for smaller classes of problems. This breaks the current economic model because markets for vertically focused frameworks and tools are too small to support platform plays, and because the domain knowledge required to develop them is housed mainly in end user organizations that have not traditionally been software suppliers.
The second reason is chronic problems with current software development methods and practices. Four major problems have resisted solution for at least two decades, namely monolithic construction, gratuitous generality, one off development and process immaturity.
We have not been able to implement development by assembly on a commercially significant scale. The problem is not that we fail to recognize the opportunity. Development by assembly has been part of the vision of industry pioneers for decades, certainly since the advent of object orientation, if not earlier. It is not that we have under invested in trying to realize these gains. Development by assembly has been the object of academic and commercial investment in object-oriented and component-based development methods. Garlan and others have suggested the following reasons [GAO95].
- Communications protocols closely tied to component implementation technologies have made assembly across platform boundaries difficult, including boundaries between different versions or configurations of the same products.
- Weak component specification and packaging technologies have made it difficult for suppliers to communicate information about components and their interactions, making it hard for consumers to predict the properties of the software that results from assembling them. For example, it is generally quite difficult, given an arbitrary set of components, to determine what they do, what dependencies they have, what resources they consume, what performance they exhibit, or what security risks they introduce. Unstated and conflicting assumptions create architectural mismatches, making assembly difficult or impossible, or degrading the operational qualities of the resulting system.
- Encapsulation technologies that fix component boundaries at development time make it difficult to adapt components during assembly, deployment or execution in order to fit them into contexts that differ even slightly from the ones for which they were designed.
- Consumer supplier relationships in the software industry are immature compared to similar relationships in other industries, due to a lack of technologies and practices that make reuse predictable, sustainable and mutually beneficial for consumers and suppliers.
Current software development methods and practices generally offer more degrees of freedom than are necessary for most applications. Most business applications, for example, consist of a few basic patterns. They read information from a database, wrap it in business rules, present it to a user, let the user act on it under the governance of the rules, and then write the results back to the database. Of course, this is an over simplification, since real world business applications generally contain a variety of challenges, such as interactions with legacy systems, large amounts of data, large numbers of concurrent users, or stringent quality of service constraints, which make the implementation of these basic patterns harder than it sounds. Having developed many business applications, one comes to realize that while there are always wrinkles that make each project unique, and specifics that vary, the work is mostly the same from project to project. Do we need to use general purpose, third-generation languages (3GLs) like C# and Java to build more than a small part of the typical business application? Probably not. Why then, do we rely on them almost exclusively?
- One reason is the premature standardization of modeling languages. The Unified Modeling Language (UML), in particular, is quite suitable for modeling for documentation, but not for model-driven development, since it lacks the necessary focus, extensibility and semantic rigor required to generate code effectively or to perform other kinds of automation. Most of the progress in automating development over the last decade exploits programming languages and XML. A detailed discussion of the issues preventing the use of UML as a language for model-driven development is beyond the scope of this article. Cook provides well grounded discussion [Coo04], however, and Fowler adds some key insights on his blog page [Fow04].
- Another reason is poor code generation and round trip engineering, and the failure to integrate modeling with other aspects of the software life cycle. A generation of products, called Computer Aided Software Engineering (CASE) tools, tried to automate software development using models. These products failed miserably for many reasons. Perhaps the most significant reason was their failure to deliver on promises to generate code for multiple platforms from platform independent models, allowing customers to preserve investments in the models as the platform technologies changed underneath them. They tried to fill the gap between the abstractions and the platform by generating large amounts of source code, as illustrated in Figure 2. Unlike programming language compilers, they did not exploit platform features effectively, producing naive, inefficient, least common denominator code. Also, their round-trip engineering facilities were weak, making it difficult to tune the generated code, or to integrate it with hand-written code required to obtain complex behavior. These problems caused developers to abandon the models and to write the code by hand, when they got far enough along in the development process. Iterative development practices exacerbated these problems, since each iteration added new requirements, or changed existing ones, requiring changes to models, and to the generated code and hand-written code.
Figure 2. Naive code generation
- CASE tools also failed to deliver on the promise to integrate the software life cycle using metadata captured by models. The vision they offered was appealing. Information captured by requirements, for example, would be used to drive test case and test harness development. Test results would be used to generate defect records, and to scope defects to components. Information about infrastructure dependencies would be used to validate deployment, and to automatically provision resources in the execution environment. Efforts like the AD/Cycle initiative tried to standardize information models for the software life cycle, but because they were closely tied to implementation, even minor changes required consensus among multiple vendors, which eventually proved untenable. Information captured by development tools is still poorly integrated across the software life cycle. One source of difficulty is the tension between file and database resident information. Until models are completely at home in the file-based world of software development, they cannot be first class development artifacts. Another source of difficulty is a misplaced focus on standardizing tool architectures rather than information interchange formats. Tool integration should require only that the vendors agree on interchange formats. Architecture standardization is not required, as shown by the success of XML-based information interchange among opaque service components.
One Off Development
We are unable to achieve commercially significant levels of reuse beyond platform technology. The primary cause of this problem is that we develop most software products as individuals in isolation from other software products. We treat every software product as unique, although most are more similar to others than they are different from them. Return on investment in software development would be far higher if multiple versions or multiple products were taken into account during software product planning. Consequently, we rarely make commercially significant investments in identifying, harvesting, packaging and distributing reusable assets. The reuse that does occur is ad hoc, rather than systematic. Given the low probably that a component can be reused in a context other than the one for which it was designed, ad hoc reuse is almost an oxymoron. Reuse rarely occurs unless it is explicitly planned in advance.
We are unable to consistently develop software products on schedule and within budget. This suggests that software development processes are immature. Most tend to one of two extremes. They are either overly prescriptive, optimizing for managing complexity at the expense of managing change, or overly permissive, optimizing for managing change at the expense of managing complexity. A mature development process is a prerequisite for automation, since tools cannot automate poorly defined tasks.
The economic and technical problems preventing a transition from craftsmanship to manufacturing can be overcome by applying critical innovations that take new ground in the assault on complexity and change. All of these innovations exist today, and have displayed notable potential in commercial products, although most of them are not yet fully mature. They fall into four areas: systematic reuse, development by assembly, model-driven development, and process frameworks. Let's consider each of these areas in turn.
One of the most important innovations in software development is defining a family of software products, whose members vary, while sharing many common features. According to Parnas, such a family provides a context in which the problems common to the family members can be solved collectively [Par76]. This enables a more systematic approach to reuse, by letting us identify and differentiate between features that remain more or less constant over multiple products and those that vary.
A software product family may consist of either components or whole products. For example, one family might contain variants of a portfolio management application, and another might contain variants of a user management framework used by portfolio management and customer relationship management applications. Software product families are created by Systems Integrators (SIs), when they migrate an application from customer to customer, or build new applications by customizing existing ones. They are also created by Independent Software Vendors (ISVs) when they develop multiple applications in a domain like Customer Relationship Management (CRM), or multiple versions of one application through maintenance and enhancement. They are also created by IT organizations when they customize applications, develop multiple related applications, or multiple versions of one application through maintenance and enhancement.
Software Product Line Practices
Software product lines exploit product families, identifying common the features and recurring forms of variation in specific domains to make the production of family members faster, cheaper, and less risky. Rather than hope naively for ad hoc reuse opportunities to serendipitously arise under arbitrary circumstances, they systematically capture knowledge of how to produce the family members, make it available in reusable assets, and then apply those assets during the production of family members. Products developed as family members reuse requirements, architectures, frameworks, components, tests and many other assets.
Of course, there is a cost to developing a product line. In other words, product lines exhibit a classical cost-benefit trade-off. While the benefit side of the equation cannot be increased by producing many copies in markets that support limited distribution, it can be increased by producing many related but unique variants, as documented by many case studies [CN01]. Using software product lines is the first step toward industrialization. Making them cheaper to build and operate is the second step. Figure 3 describes the major tasks performed and artifacts produced and consumed in a product line.
Figure 3. A software product line
Product line developers build production assets applied by product developers to produce family members in much the same way that platform developers build device drivers and operating systems used by application developers. A key step in developing the production assets is to produce one or more domain models that describe the common features of problems in domains addressed by the product line, and the recurring forms of variation. These models collectively define the scope of the product line, and are used to qualify prospective family members. Requirements for family members are derived from them, providing a way to map variations in requirements to variations in architecture, implementation, executables, development process, project environment, and many other parts of the software life cycle.
In the previous article, we saw that raising the level of abstraction is one of the keys to progress. We have also seen that it reduces the scope of abstractions, and therefore the amount of control given to developers over the implementation. The result of this loss of control is a corresponding increase in power. Most business application developers, for example, would rather use higher level abstractions like those provided by C# and the Microsoft .NET Framework, than assembly language and system calls. Moving to higher levels of abstraction yields many benefits, including higher productivity, fewer defects, and easier maintenance and enhancement.
Unfortunately, raising the level of abstraction with tools is expensive, as we have seen. If we can find ways to make it faster, cheaper and easier, however, we can provide higher levels of automation for narrower problem domains. This is the goal of model-driven development (MDD). MDD uses models to capture high level information, usually expressed informally, and to automate its implementation, either by compiling models to produce executables, or by using them to facilitate the manual development of executables. This is valuable because this information is currently lost in low level artifacts, such as source code files, where it is difficult to track, maintain or enhance consistently.
Some development activities, such as building, deploying and debugging, are partially or completely automated today using information captured by source code files and other implementation artifacts. Using information captured by models, MDD can also provide more extensive automation of these activities, and more advanced forms of automation, such as debugging with models and automatically configuring tools. Here are some more examples.
- Routine tasks, such as generating one kind of artifact from another, can often be fully automated. For example, test harnesses can often be generated automatically from user interface models, to confirm that appropriate page transitions occur in response to simulated user actions.
- Other tasks, such as resolving differences between artifacts, can be partially automated. For example, differences between columns in a table and fields on a form might be flagged as issues for the user to resolve, and then automatically corrected based on the user decision.
- Adapters, such as Web service wrappers, can be generated automatically from models to bridge differences in implementation technologies. Models can also be used for reflection, protocol configuration, and other adaptive integration mechanisms.
- Models can be used to define the configurations of artifacts that comprise deployment units, and to automate the deployment process. Models of deployment environments can be used to constrain designs, so that implementations of those designs deploy to them correctly.
- Models can be used to describe configurations of deployed components, capturing information about operational characteristics, such as load balancing, fail over, and resource allocation policies, and to automate management activities, such as data collection and reporting.
For the purposes of MDD, we are not interested in closed end languages like 4GLs, nor in using a single high level language for all aspects of development. The weakness of those strategies has been amply demonstrated. We are also not interested in models rendered by hand on white boards, or on note pads. Unfortunately, models have been used to date mainly as documents intended more for human consumption than for computation. This has created the widely held impression that models are not first class development artifacts with the significance of source code. We are interested in models that can be processed by tools, and we propose to use them in the same way that we currently use source code. Models used in this way cannot be written in languages designed for documentation. They must be precise and unambiguous. Also, in order to raise the level of abstraction, a modeling language must target a narrower domain than a general purpose programming language. This has several consequences:
- The purpose for which the language is designed must be explicitly stated, so that an observer familiar with the domain can evaluate the language and determine whether or not it fulfills its purpose.
- The language must capture concepts used by people who work with the domain. A language for developing and assembling Web services, for example, must contain concepts like Web services, Web methods, protocols and connections between Web services based on protocols. Similarly, a language used to visualize and edit C# source code must contain concepts that appear in C#, such as classes, and members, such as fields, methods, properties, events and delegates.
- The language must use names for those concepts that are familiar to the people who use them. For example, a C# developer will find a model of a class that has fields and methods more natural than a model of a class that has attributes and operations.
- The notation for the language, whether graphical or textual, must be easy to use for the purpose in question. Things that people often do with the concepts must be easy to do. For example, it must be easy to manipulate an inheritance hierarchy in a language for visualizing and editing C# source code.
- The language must have a well defined set of rules, called a grammar, governing the way the concepts can be combined to form expressions. This makes it possible to use tools to validate the expressions, and to help the user write them.
- The meaning of every well formed expression must be well defined, so that users can build models that other users understand, so that tools can generate valid implementations from models, and so that the metadata captured by models does what is expected when used for tasks like configuring a server.
A language that meets these criteria is called a domain-specific language (DSL), because it models concepts found in a specific domain. A DSL is defined with much greater rigor than a general purpose modeling language. Like a programming language, it may have either textual or graphical notation.
Two examples of textual DSLs are SQL and HTML, which serve the relational data definition and Web page definition domains, respectively.
Two examples of graphical DSLs are illustrated in Figure 4, which is a screen shot from the Microsoft Visual Studio 2005 Team System. The DSL on the left describes components, such as Web services. It is used to automate component development and configuration. The DSL on the right describes logical server types in a data center. It is used to design and implement data center configurations. Web service deployments are described by dragging service components onto logical servers. Differences between the resources they require and the resources services available on the logical servers onto which they are deployed are flagged as validation errors, as illustrated.
Figure 4. Domain-specific languages
Incremental Code Generation
The key to effective code generation is to generate small amounts of code that span only small abstraction gaps. This allows the tool to take advantage of platform features, and therefore to produce focused, efficient, platform specific implementations.
One way to make code generation more incremental is to move the models closer to the platform, as illustrated in Figure 5. For example, a modeling tool for a specific programming language can achieve higher fidelity using the programming language type system, than it can using a type system defined by a general purpose modeling language. The model can now become a view of the code, where the developer manipulates program structures like class and method declarations graphically. The tool exposes relationships and dependencies that are hard to see in the code, and saves time and effort by generating the code for program structures. It may also implement programming idioms like collection-based relationships, or provide advanced features like refactoring, and pattern authoring, application and evaluation.
Figure 5. Modeling a programming language
Of course, this reduces the power of the model, by limiting it to abstractions already available on the platform, or only slightly more powerful, such as programming idioms. How then, do we work at higher levels of abstraction? We use more abstract models, and move the platform closer to the models with either frameworks or transformations, as illustrated in Figure 6. Let's look at each of these in turn.
Figure 6. Using higher-level abstractions
- We can use a framework to implement higher level abstractions that appear in the models, and use the models to generate snippets of code at framework extension points. Conversely, the models help users complete the framework by visualizing framework concepts, and exposing its extension points in intuitive ways. When Microsoft Windows was first introduced, for example, building graphical applications was difficult. Later, Microsoft Visual Basic made the task much easier using the Form and Control abstractions. Forms and Controls are part of a DSL implemented by the .NET Framework and the Windows Forms Designer. Some hand coding is usually required to complete the generated implementations. A pattern language can be used instead of a framework, as described by Hohpe and Woolf [HW03]. This requires the tool to generate the pattern implementations. This approach is illustrated in Figure 6 (a).
- Instead of a framework or pattern language, we can generate to a lower level DSL. We can also use more than two DSLs to span a wide gap, leading to progressive transformation, where models written using the highest level DSLs are transformed into executables through a series of refinements, as shown in Figure 6 (b). This is how compilers work, transforming expressions written in a relatively high level language like Java or C# first into an intermediate representation like byte code or IL, and then into a binary format for a target platform using just-in-time (JIT) compilation.
Of course, hand-written code must often be combined with generated or predefined framework code to produce the complete implementation. Several different mechanisms can be used for this purpose. The primary distinction between them is binding time.
- Design time binding merges hand-written and framework code in the same artifacts before compilation, as illustrated in Figure 7. This may involve restricting the editing experience to prevent the user from modifying the framework code (for example, using an editor that uses read-only regions). In some tools, the user adds hand-written code in a special window.
Figure 7. Design-time composition
Run-time binding merges hand-written and framework code using call backs. A variety of run-time binding mechanisms based on delegation are described by design patterns, such as the following from Gamma, et. al.: events (Observer), adapters (Adapter), policy objects (Strategy), factories (Abstract Factory), orchestration (Mediator), wrappers (Decorator), proxies (Proxy), commands (Command) and filters (Chain of Responsibility) [GHJV95]. Two advantages of run-time binding are formalizing the contract between the hand-written and framework code using interfaces, and allowing dynamic configuration of the resulting implementation at run time by object substitution. Also, delegated classes allow hand-written code to be preserved across regeneration. One minor disadvantage is the run-time overhead of method invocation. Several mechanisms based on run-time binding are popular in component programming models, as illustrated in Figure 8. All of them have been used successfully on a large scale in commercial products.
- Hand-written sub class. The user supplies hand-written code in a sub class of the framework code. Abstract methods in the framework code define explicit override points. For example, the user subclasses a framework entity, fields incoming calls in hand-written code, and brackets calls to the super class with pre and post condition code in the sub class using the Template Method pattern.
- Framework sub class. The user supplies hand-written code in a super class of the framework code. Abstract methods in the hand-written code are overridden in the framework code. For example, a framework entity fields all incoming calls, and brackets calls to a hand-written super class with pre and post condition code.
- Hand-written delegate class. The user supplies hand-written code in a delegate class called by the framework class. For example, a framework entity calls a hand-written entity at designated points, such as before and after setting a property value. This is actually a variant of the Proxy pattern.
- Framework delegate class. The user supplies hand-written code that delegates to a framework class to obtain services. For example, a hand-written entity calls a framework entity to set and get property values.
Figure 8. Run-time composition
- Compile-time binding merges hand-written and framework code during compilation, as illustrated Figure 9. One technique is to merge before the compiler is called (for example, the C macro preprocessor). A better way is to use partial specifications and to merge in compiled form during compilation. An example of partial specifications is partial classes, as supported by the Visual Basic and C# language compilers in Visual Studio 2005.
Figure 9. Compile-time composition
Critical innovations in the areas of platform independent protocols, self description, variable encapsulation, assembly by orchestration and architecture-driven development are required to support development by assembly.
Platform Independent Protocols
Web services technologies succeed where earlier component assembly technologies failed by separating the technologies used to specify and assemble components from the technologies used to implement them. Because XML is a technology for managing information, not a technology for constructing components, Web services require the use of wrappers to map Web method invocations onto local method invocations based on the underlying component implementation technology. While CORBA attempted to use a similar strategy, its complexity required major investment by platform vendors, which limited its scope. The simplicity of XML-based protocols significantly lowers the barrier to implementation, ensuring their greater ubiquity. By encoding remote method invocation requests as XML, they avoid interoperability problems caused by platform specific remote procedure call encoding and argument marshalling. Also, by obtaining broad industry agreement on standards, they have designed platform interoperability in from the beginning.
Self description reduces architectural mismatch by improving component packaging to make assumptions, dependencies, behavior, resource consumption, performance, and certifications explicit. The metadata it provides can be used to automate component discovery, selection, licensing, acquisition, installation, adaptation, assembly, testing, configuration, deployment, monitoring and management.
The most important form of self description is describing component assumptions, dependencies and behavior, so that developers can reason about interactions between components, and so that tools can validate assemblies. The most widely used forms of specification in object orientation are class and interface declarations. They declare behaviors offered by a class, but only imply significant assumptions and dependencies by naming other classes and interfaces in method signatures. Contracts are a richer form of specification. A contract is an agreement that governs the interactions between components. It is not enough to know how to invoke a component. We must also know which sequences of invocations are valid interactions. A contract therefore describes interaction sequences called protocols, and responses to protocol violations and other unexpected conditions.
Of course, contracts are worthless unless enforced. There are two ways to enforce contracts:
- Refuse to assemble components with mismatched contracts.
- Use the information supplied by the contracts to either provide adapters that let the components interact directly, or to mediate their interactions.
Garlan proposes using cook books of standard adaptation techniques, and tools that offer wrapper generation and data transformation [Gar96]. One of the most promising adaptation strategies is to publish partial components that can be completed at assembly time with encapsulated aspects that provide the code required for assembly. This strategy, called variable encapsulation, is described below.
Another important aspect of self description is certification. If components can be certified to have only specified dependencies, to consume only specified resources, to exhibit specified performance characteristics under certain conditions, or to be free of specified vulnerabilities, then it becomes possible to reason about the functional and operational characteristics of an assembly consisting of those components. This is being studied at the Software Engineering Institute (SEI) at Carnegie Mellon University (CMU), where it is known as Predictable Assembly From Certifiable Components (PACC) [SEI04].
We have seen that static encapsulation decreases the probability that a component can be used in a given assembly by statically binding its functional or intrinsic aspects with non-functional or contextual ones. Variable encapsulation reduces architectural mismatch by enabling the publication of partially encapsulated components that can be adapted to new contexts by selecting and weaving appropriate non-functional aspects with their functional ones, as illustrated in Figure 10. The form of a component in a specific assembly can then be determined by the context in which it is placed. This improves agility by making component boundaries more flexible, and reduces architectural mismatch by making it easy to adapt mismatched components. It also allows functional aspects to be refactored across component boundaries by removing non-functional assumptions from them. Valid adaptations can be identified in advance, and in some cases can even be performed automatically by tools, given a formal description of the context into which the finished component will be placed.
Figure 10. Variable encapsulation
Variable encapsulation is an adaptation of Aspect Oriented Programming (AOP). AOP is a method for separating and then recombining, respectively, different aspects of a system [KLM97]. Variable encapsulation differs from AOP in three ways
- Variable encapsulation weaves encapsulated aspects, while AOP, as commonly practiced, weaves unencapsulated lines of source code. Weaving unencapsulated aspects produces the same problems as assembling poorly packaged components, namely architectural mismatch and unpredictable results. Indeed, source code-based aspect weaving is even more prone to these problems than component assembly, because components at least have interfaces that describe their behavior and some packaging that helps to prevent undocumented dependencies. The lack of packaging in AOP makes it hard for developers to reason about the compatibility of the aspects and the functional features into which they are woven, or about the characteristics of the resulting implementations, and almost impossible for tools to validate aspect weavings.
- AOP weaves aspects during component development, while variable encapsulation weaves them later, such as during component assembly or deployment. This is important because the contexts into which a component may be placed may not be known until long after the component has been published. In fact, in order to support development by assembly, as described by this article, third parties must be able to predictably assemble and deploy independently developed components. This requires a formal approach to aspect partitioning, encapsulation, specification and packaging. Variable encapsulation can also be progressive, meaning that it can occur in stages, with some aspects woven in each stage and some left unbound to be woven in later stages. For example, we might bind some aspects at assembly time, some at deployment time, and some at run time.
- Variable encapsulation is architecture driven, while AOP is not. The aspects separated from the functional kernel must be defined explicitly using an interface, an abstract class, a WSDL file, or some other form of contract.
Assembly by Orchestration
Given adequate contractual mechanisms, services can be assembled using an orchestration engine, such as Microsoft BizTalk Server, to manage the sequences of messages exchanged between them, as illustrated in Figure 11. Assembly by orchestration makes development by assembly much easier, since services have far fewer dependencies on one another than binary components. Unlike classes, they do not have to reside within the same executable. Unlike components that use platform specific protocols, they can be assembled across platform boundaries. Two services can interact if their contracts are compatible. They can be developed and deployed independently, and then assembled later by orchestration. If appropriate interception and redirection services are available, they can even reside in different administrative or organizational domains. In other words, assembly by orchestration eliminates design, compile and deployment time dependencies between components.
Figure 11. Assembly by orchestration
Assembly by orchestration is essentially mediation, as described by the Mediator pattern from Gamma, et. al. A mediator orchestrates interactions among components that could not otherwise interact. A mediator has powerful properties. One of these is the ability to filter or translate information being passed between the interacting components. Another is the ability to monitor an interaction, maintaining state across multiple calls, if necessary. This allows mediators to reason about interactions, and to change them if necessary, using conditional logic. Mediators can also perform a variety of useful functions, such as logging, enforcing security policy, and bridging between different technologies or different versions of the same technology. A mediator can also become part of the functional portion of an assembly, enforcing business rules or performing business functions, such as brokering commercial transactions.
While preventing the assembly of mismatched components is better than constructing invalid assemblies, it does not necessarily promote the availability of well matched components. This is the purpose of architecture. According to Shaw and Garlan, a software architecture describes the components to be assembled, the interactions that can occur among them, and the acceptable patterns of composition [SG96]. This reduces the risk of architectural mismatch by imposing common assumptions and constraining design decisions.
Of course, developing software architectures is challenging. It takes most architects many years to become proficient with even a limited number of architectural styles or application domains. Development by assembly will not be realized on an industrial scale without significant progress in architectural practice, and much greater reliance on architecture in software development. These are the goals of architecture-driven development (ADD), which includes:
- Standards for describing, reviewing and using architectures that make them more portable across organizations, projects and technologies.
- Methods for predicting the effects of design decisions.
- Patterns or architectural styles that codify design expertise, helping designers to exploit recurring patterns of component partitioning.
An architectural style is a coarse grained pattern that provides an abstract framework for a family of systems. It defines a set of rules that specify the kinds of components that can be used to assemble a system, the kinds of relationships that that can be used in their assembly, constraints on the way they are assembled, and assumptions about the meaning of an assembly. For example, a Web service composition style might specify components offering ports defined by Web services, and related by wires connecting ports, with the constraint that two ports may not be connected by a wire unless they are compatible, and with the assumption that communication over a wire uses Web SOAP over HTTP. Other examples of architectural styles include the dataflow, layered and model view controller styles.
An architectural style improves partitioning and promotes design reuse by providing solutions with well known properties to frequently recurring problems. It also promotes
- Implementation reuse by identifying common structural elements shared by systems based on the style.
- Understanding by defining standard architectures (for example, client server).
- Interoperability by defining standard communication mechanisms (for example, Web services).
- Visualization by defining standard representations (for example, boxes and lines for Web service assembly).
- Tool development by defining constraints that must be enforced.
- Analysis by identifying salient characteristics of systems based on the style (for example, performance in a Web service assembly).
An architectural description is a document that defines a software architecture. An IEEE standard, 1471, Recommended Practice for Architectural Description for Software-Intensive Systems, provides guidelines for architectural description [IEEE1471]. According to these guidelines, a system has one or more stakeholders. A stakeholder has specific concerns or interests regarding some aspect of the system. In order to be useful to a community of stakeholders, an architectural description must have a format and organization that the stakeholders understand. An Architectural Description Standard (ADS) is a template for describing the architecture of a family of systems.
A viewpoint formally defines a perspective from which a given aspect of a software product can be described, and provides a pattern for producing such a description, defining its scope, purpose and audience, and conventions, languages and methods used to produce it. The salient elements used to specify a viewpoint include:
- An identifier and other introductory information (for example, author, date, document reference, and so on).
- The concerns addressed by the viewpoint.
- The conventions, languages, and methods used to produce views based on the viewpoint.
- Consistency and completeness tests for conforming views.
A view describes a software product from a given viewpoint. A view is semantically closed, meaning that it describes the entire software product from that perspective. A view consists of one or more artifacts, each developed according to the requirements of the viewpoint. The view is an instance of the viewpoint, and must conform to the viewpoint in order to be well formed. A view that conforms to a Web page layout viewpoint, for example, should describe the Web page layout of a specific software product, and should use the notation defined by the viewpoint for describing Web page layouts. The salient elements used to specify a view include:
- An identifier and other introductory information (for example, author, date, document reference, and so on).
- The identifier of the viewpoint to which the view conforms.
- A description of the software product constructed using the conventions, languages, and methods define by that viewpoint.
To understand the distinction between a view and its viewpoint, consider a logical database design for a business application. The logical database design is a view of the application, or more precisely, a view of one of its components. The aspect of the application and the language used to describe it are both prescribed by the logical database design viewpoint. Many different business applications could be described using the same viewpoint, yielding many different views, each describing a logical database design for some business application. The views would describe the same aspect and would use the same language, but they would have different content, since each would describe a different application. A view of an assembly can be decomposed into views of its components from the same viewpoint.
According to IEEE 1471, an architectural description must identify the viewpoints it uses and the rationale for using them. An architectural description standard (ADS) for a specific purpose can be defined by enumerating the set of viewpoints to be used. For example, an ADS for consumer to business Web applications might require one viewpoint for Web page layout and one for business data layout. Each view within an architectural description must conform to one of the viewpoints defined by the ADS on which it is based.
The key to process maturity is preserving agility while scaling up to high complexity created by project size, geographical distribution, or the passage of time. Experience suggests that a small amount of structure increases agility by reducing the amount of work that must be done. This principle can be applied in the context of a software product family to manage complexity without reducing agility using process frameworks.
One of the complaints against formal processes is that they are too abstract. The guidance they offer is obvious to a seasoned developer, but not concrete enough for a novice to apply. In order add value, it must be brought down to the specifics of the current project. Of course, since every project is unique in many ways, it is impossible to publish a process that meets the needs of all projects.
We know how to solve this kind of problem. Rather than throw out the baby with the bath water, we can specialize and tailor a formal process for a specific product family. The UP would not have succeeded in the marketplace without customization vendors. Some of the vendors customize it for specific customers, often adding ingredients from other processes like XP. Others, especially system integrators and ISVs, tailor it to support a specific product or consulting practice. Either way, the key to using any process effectively is making it highly specific to a given project, so that it contains only immediately usable material, organized according to the needs of the project. The change produced by this type of customization is often quite profound, yielding a result that looks very little like the original. A highly focused process contains detailed information about the project, such as tool configuration, paths to network shares, developer start up instructions, pointers to documentation for APIs to be used by the team, names of key contact people for important process steps like CM, defect tracking or build, team policies regarding check-in, coding style and peer review, and many other details specific to the project and to the project team.
As with other forms of systematic reuse, this kind of customization makes sense only when we can use it more than once. When that is the case, however, it can be highly cost effective. Also, reusing highly focused process assets increases agility by eliminating work, just as any reusing other asset does. As Jacobson likes to say, the fastest way to build something is to reuse something that already exists, especially when the reusable asset is designed to be customizable and extensible. This is true for any wheel that can be systematically reused, rather than reinvented, including development processes.
A process framework decomposes a process into micro processes attached to the viewpoints of an ADS. Each micro process describes the requirements for producing a view based on the viewpoint to which it is attached. It can enumerate key decision points, identify the trade-offs associated with each decision point, describe required or optional activities, and enumerate the resources required and the results produced by each activity. Constraints are added to describe the preconditions that must be satisfied before an artifact is produced, the postconditions that must be satisfied after it is produced, and the invariants that must hold when the artifacts have stabilized. For example, we might require scoping and requirements capture to be performed before an iteration begins, and exit criteria to be satisfied before it can end. We might also require that all source code checked into source control build and unit test correctly. We call this structure a process framework because it defines a space of possible processes that could emerge, depending on the needs and circumstances of a given project, instead of prescribing one general purpose process for all projects.
Once a process framework is defined, micro processes can be stitched together to support any work flow that the project requires, including top down, bottom up, outside in, test then code, code then test, in any combination, or a totally ad hoc work flow, as long as the constraints are satisfied. The work flow can also be resource driven, allowing work to be scheduled to maximize overall throughput using methods like PERT and CPM, and to be started when the required resources become available. Many different kinds of resources can drive the scheduling, including artifacts like requirements and source code, people like developers or program managers, products like configuration management or defect tracking systems, and facilities like opening of a port or allocating storage on server. This is called constraint-based scheduling.
Constraint-based scheduling balances the need for agility with the need for a small amount of structure. Instead of prescribing a process, constraint-based scheduling provides guidance, and imposes constraints on development artifacts. This fosters agility by letting a work flow arise dynamically and opportunistically within the constraints, adapting to the innumerable variations in circumstances that will inevitably occur, while at the same time incorporating lessons learned to minimize the amount of time and money wasted on rediscovering existing knowledge.
Note that a process framework need not be large or elaborate, and may contain as much or as little detail as desired. This provides a way to scale a process up or down, depending on the circumstances. For example, a small and highly agile team might use minimal framework that provides only a few key practice recommendations, such as XP. A larger organization, on the other hand, might add quite a lot of detail around the build process, check in procedures, testing procedures or rules governing shared components.
This article has described chronic problems that are slowing the inevitable transition from craftsmanship to manufacturing, and critical innovations that can help us overcome these problems. While each of these critical innovations holds significant promise, none of them is sufficient, on its own, to catalyze the transition. The key is to integrate the critical innovations into a coherent methodology. This is the goal of Software Factories, as described in the next article in this series.
A great many issues are discussed only briefly in this article, probably leaving the reader wanting evidence or more detailed discussion. This subject is covered, with much more detailed discussion, in the book Software Factories: Assembling Applications with Patterns, Models, Frameworks and Tools, by Jack Greenfield and Keith Short, from John Wiley and Sons.
Copyright © 2004 by Jack Greenfield. Portions copyright © 2003 by Jack Greenfield and Keith Short, and reproduced by permission of Wiley Publishing, Inc. All rights reserved.
[CN01] P. Clements and L. Northrop. Software Product Lines : Practices and Patterns. Addison-Wesley, 2001. ISBN: 0201703327
[Coo04] S. Cook. Domain-Specific Modeling and Model Driven Architecture. http://www.bptrends.com/publicationfiles/01-04%20COL%20Dom%20Spec%20Modeling%20Frankel-Cook.pdf.
[Fow04] M. Fowler. Unwanted Modeling Language. http://martinfowler.com/bliki/UnwantedModelingLanguage.html.
[GAO95] D. Garlan, R. Allen, J. Ockerbloom. Architectural Mismatch: Why Reuse Is So Hard. IEEE Software 12, 6, 1995.
[Gar96] D. Garlan. Style-Based Refinement for Software Architecture. Proceedings of the Second International Software ArchitectureWorkshop, 1996.
[GHJV95] E. Gamma, R. Helm, R. Johnson and J. Vlissides. Design Patterns, Elements of Reusable Object-Oriented Software. Addison-Wesley, 1995. ISBN 0-201-63361-2.
[HW03] G. Hohpe and B.Woolf. Enterprise Integration Patterns. Addison-Wesley, 2003.
[IEEE 1471] 1471-2000 IEEE Recommended Practice for Architectural Description for Software-Intensive Systems. ISBN 0-7381-2519-9.
[Jac00] Problem Frames: Analyzing and Structuring Software Development Problems. Addison-Wesley, 2000. ISBN: 020159627X.
[KLM97] G. Kiczales, J. Lamping, A. Mendhekar, C. Lopes, J. Loingtier, J. Irwin. Aspect Oriented Programming. Proc. European Conf on Object Oriented Programming. M. Aksitand S. Matsuoka (eds.). Spinger-Verlag, 1997.
[MAJ3] Microsoft Architects Journal, Issue 3, July 2004. http://msdn.microsoft.com/architecture/journal/default.aspx.
[Par76] D. Parnas. On the Design and Development of Program Families. IEEE Transactions on Software Engineering, March 1976.
[SEI04] Predictable Assembly from Certifiable Components (PACC). http://www.sei.cmu.edu/pacc/index.html.
[SG96] M. Shaw and D. Garlan. Software Architecture: Perspectives On An Emerging Discipline. Prentice Hall, 1996.
About the author
Jack Greenfield is an Architect for Enterprise Frameworks and Tools at Microsoft. He was previously Chief Architect, Practitioner Desktop Group, at Rational Software Corporation, and Founder and CTO of InLine Software Corporation. At NeXT, he developed the Enterprise Objects Framework, now called Apple Web Objects. A well known speaker and writer, he also contributed to UML, J2EE, and related OMG and JSP specifications. He holds a B.S. in Physics from George Mason University. Jack can be reached at email@example.com.