Software Factories: Assembling Applications with Patterns, Models, Frameworks, and Tools
Summary: The third in a four-part series of articles about software factories, focusing on overcoming the chronic problems explored in the last article, and integrating critical innovations into a coherent methodology for Software Factories. (14 printed pages)
This paper is about a methodology developed at Microsoft that we call Software Factories. In a nutshell, a Software Factory is a development environment configured to support the rapid development of a specific type of application. While Software Factories are really just the logical next step in the continuing evolution of software development methods and practices, they promise to change the character of the software industry by introducing patterns of industrialization.
This is the third in a four part series of articles about software factories, focusing on the Software Factories methodology. The first article described the forces that will drive a transition from craftsmanship to manufacturing. The second described the chronic problems that are preventing the transition and the critical innovations that will help us overcome them. The last article answers some frequently asked questions about software factories.
In the last article, we looked at chronic problems that are preventing the transition and critical innovations that will help us overcome them. While each of these critical innovations holds significant promise, none of them is sufficient, on its own, to catalyze a transition from craftsmanship to manufacturing. The key is to integrate the critical innovations into a coherent methodology. This is the goal of Software Factories. Before we can talk about Software Factories, we must introduce software factory schemas and software factory templates.
Software Factory Schemas
A software factory schema is a document that categorizes and summarizes the artifacts used to build and maintain a system, such as XML documents, models, configuration files, build scripts, source code files, SQL files, localization files, deployment manifests and test case definitions, in an orderly way, and that defines relationships between them, so that we can maintain consistency among them.
A common approach is to use a grid, like the one illustrated in Figure 1. The columns define concerns, while the rows define levels of abstraction. Each cell defines a perspective or viewpoint from which we can build some aspect of the software. For example, for a 3-tiered application, one cell might define an abstract viewpoint on the presentation tier, a second might define an abstract viewpoint on the data tier, and a third might define a concrete viewpoint on the data tier, used for developing database schemas. Once the grid has been defined, we can populate it with views that comprise the development artifacts for a specific software product.
Figure 1. A Grid Describing A Product Family
Of course, a grid can be used to build more than one software product. Before it has been populated with specific views, it defines the bill of materials required to build the members of a software product family. Taking a cue from software product lines, we can go one step further and add information to each cell, identifying production assets used to build the views required by each viewpoint, including DSLs, patterns, frameworks, tools, and micro processes.
The grid itself is not an innovation. What is novel is applying the grid to a product family, identifying production assets for each cell, and defining mappings between and within the cells that can be used to fully or partially automate model transformations, code generation, pattern application, test harness construction, user interface layout, database schema design and many other development tasks. As we have seen, we must use first class development artifacts based on high fidelity languages, like XML, C# and SQL in order to provide this automation. For models, this means using DSLs, not general purpose modeling languages designed for documentation. In some cases, the artifacts described by viewpoints are models, but this is frequently not the case. They can be any source artifacts based on a formal language, such as high level work flow scripts, general purpose programming language source code files, WSDL files, or SQL DDL files.
Note that the viewpoints define not only the languages used to develop the views, but also requirements on the views, usually expressed as constraints or patterns. For example, a software schema might contain two viewpoints that both use the same class modeling DSL. We might require all of the classes modeled from one of these viewpoints to inherit, either directly or indirectly, from classes supplied by a user interface widget framework associated with the viewpoint. Similarly, we might require all of the classes modeled from the other viewpoint to play defined roles in one of several business entity implementation patterns associated with the viewpoint. In a real software schema, the same languages are typically used by multiple viewpoints, especially viewpoints close to the platform based on general purpose programming languages.
Software Factory Schemas as Graphs
Of course, even a highly detailed grid is a gross over simplification of reality. A software factory schema is better represented as a directed graph whose nodes are viewpoints and whose edges are computable relationships between viewpoints called mappings. This allows nodes that would not be adjacent in a grid representation to be related. Also, it relaxes the artificial constraint imposed by a grid that the viewpoints must fit into neat classification schemes, creating rows and columns. Finally, and most importantly, it allows the schema to reflect the software architecture. So, for example, a schema for a family of business applications might contain several clusters of viewpoints, one for each subsystem like customer management, catalog management, or order fulfillment. The viewpoints in each cluster might then be further grouped into subsets reflecting the layered architecture of each subsystem, as illustrated in Figure 2.
A software factory schema describes the artifacts that comprise a software product, just as an XML schema describes the elements and attributes that comprise a document, and a database schema describes the rows and columns that comprise a database. Like an Architectural Description Standard (ADS), a software factory schema is a template for describing the members of a software product family. Despite this similarity, however, there are several major differences between a software factory schema and an ADS:
- While an ADS deals only with architecture, a software factory schema deals with many other aspects of a software product family, such as requirements, executables, source code, test harnesses and deployment artifacts.
- While an ADS organizes design documentation, a software factory schema organizes development artifacts.
- While an ADS implies a software product family, it does not explicitly identify one, or incorporate mechanisms to support family based development, such as a way to express how the members of the family differ from a family archetype. A software factory schema, on the other hand, targets a specific software product family and can be instantiated and customized to describe a specific family member in terms of its differences from the family archetype.
- While an ADS does not necessarily support automation, a software factory schema can be implemented by a software factory template to automate software development tasks, as we shall see shortly.
Of course, the essential property of a software factory schema is that it provides a multi dimensional separation of concerns based on various aspects of the artifacts being organized, such as their level of abstraction, position within architecture, functionality or operational qualities. According to Coplien:
We can analyze the application domain using principles of commonality and variation to divide it into subdomains, each of which may be suitable for design under a specific paradigm. [Cop99]
We can now see the grid as a two dimensional projection of the graph that plots one or more aspects on the horizontal axis and different levels of abstraction on the vertical axis. Another two-dimensional projection is an aspect plane, which projects related viewpoints onto part of the product architecture, providing a consolidated view from that viewpoint. Examples of aspects planes include logical data, security policy and transaction planes.
A software factory schema essentially defines a recipe for building members of a software product family. Clearly, the viewpoints describe the ingredients and the tools used to prepare them, but where is the process of preparing them described? Recall that a process framework is constructed by attaching a micro process to each viewpoint, describing the development of conforming views, and by defining constraints like preconditions that must be satisfied before a view is produced, postconditions that must be satisfied after it is produced, and invariants that must hold when the views have stabilized. This framework defines the space of possible processes that could emerge, depending on the needs and circumstances of a given project. Clearly, there is a resemblance between a process framework and a software factory schema. The viewpoints of a software factory schema already define micro processes for producing the artifacts they describe. Adding constraints to a software factory schema to govern the order of execution makes it a process framework, as well. We now have a recipe for the members of a product family. It defines the ingredients, the tools used to prepare them and the process of preparing them.
Figure 2. A Software Schema
Customizing A Software Factory Schema
An important aspect of using a software factory schema is customization. A raw software factory schema is a recipe for building a family of software products. However, for any given project, of course, we only care about one software product. We must therefore customize the software factory schema to create a recipe for building that specific member of the product family. This is similar to what cooks do when they modify a recipe before using it, taking into account special requirements, such as the number of servings to be made, the actual ingredients or tools available, or the amount of time that will elapse before the meal is served.
Software schema customization is actually quite straightforward. In order to support it, a software factory schema must contain both fixed and variable parts. The fixed parts remain the same for all members of the product family, while the variable parts change to accommodate the unique requirements of each specific member. A software factory schema for Online Order Entry Systems, for example, might include viewpoints that describe the artifacts and tools used to build personalization subsystems. For some members of the family, however, a personalization subsystem may not be required. Before we set out to build such a system, we would first remove the viewpoints related to personalization, configuring our project structure and tools accordingly. Different parts of the customization can be performed at different times, according to the needs of the project. Some parts, such as adding or dropping whole viewpoints, and relationships between them, might be performed up front, based on major variations in requirements, such as dropping personalization. Other parts, such as modifying viewpoints and the relationships between them, might be performed incrementally, as the project progresses, based on more fine grained variations in requirements, such as deciding what mechanisms to use to drive the user interface.
Now, all of this is quite appealing, but at this point, it is all on paper. If all we have is the software factory schema, then we can describe the assets used to build family members, but we do not actually have the assets. Before we can build any family members, we must implement the software factory schema, defining the DSLs, patterns, frameworks and tools it describes, packaging them, and making them available to product developers. Collectively, these assets form a software factory template.
A software factory template includes code and metadata that can be loaded into extensible tools, like an Interactive Development Environment (IDE), or an enterprise life cycle tool suite, to automate the development and maintenance of family members. We call it a software factory template because it configures the tools to produce a specific type of software, just as a document template loaded into a tool like Microsoft Word or Excel configures it to produce of a specific type of document.
Like a software factory schema, a software factory template must be customized for a specific family member. While customizing a software factory schema customizes the description of the software factory for the family member, however, customizing a template customizes the assets used to build the family member. The software factory template is generally customized at the same time as the software by adding, dropping or modifying the assets associated with those viewpoints.
Examples of software factory template customization include creating projects for the subsystems and components to be developed, populating a palette with patterns to be applied, setting up references to libraries to be used, and configuring builds. For example, enabling content personalization for an online commerce application might cause the following things to happen:
- The viewpoint used to configure personalization is added to the schema.
- A personalization configuration tool appears in the IDE menu.
- A folder for the personalization subsystem is added to the project for the application.
- A framework and patterns for personalization are imported into the project.
- The Front Controller pattern is applied automatically in the transformation between the user interaction model and the web front end design model, and appears in the web front end designer, instead of the Page Controller pattern, so that the application will vector to different pages for different users, instead of showing the same content to all users.
- Policy on the project folder for the presentation layer is modified so that we cannot create classes that derive from PageController.
We can now define a software factory as:
A software product line that provides a production facility for the product family by configuring extensible tools using a software template based on a software schema.
A schematic of a Software Factory is shown in Figure 3.
Figure 3. A Software Factory
Of course, the goal of building a Software Factory is to rapidly develop members of a product family. Building a product using a Software Factory involves the following activities:
- Problem Analysis. This determines whether or not the product is in scope for the Software Factory. Depending on the fit, we may choose to build some or all of it outside the Software Factory. We may still choose to build a product that fits poorly within the Software Factory, and in some cases, this can change the software factory schema and template, so that they better accommodate the parts that did not fit well in future products.
- Product Specification defines the product requirements in terms of differences from the product line requirements. A range of product specification mechanisms can be used, depending on the extent of the differences in requirements, including property sheets, wizards, feature models, visual models and structured prose.
- Product design maps the differences in requirements to differences in the product line architecture and the product development process, producing a product architecture, and a customized product development process.
- Product Implementation. This involves familiar activities, such as component and unit test development, builds, unit test execution, and component assembly. A range of mechanisms can be used to develop the implementation, depending on the extent of the differences, such as property sheets, wizards and feature models that configure components, visual models that assemble components and generate other artifacts like models, code and configuration files, and source code that completes frameworks extension points, or that creates, modifies, extends or adapts components.
- Product Deployment. This involves creating or reusing default deployment constraints, logical host configurations and executable to logical host mappings, by provisioning facilities, validating host configurations, reconfiguring hosts by installing and configuring the required resources, and installing and configuring the executables being deployed.
- Product Testing. This involves creating or reusing test assets, including test cases, test harnesses, test data sets, test scripts and applying instrumentation and measurement tools.
The production assets for the product, including the requirements, development process, architecture, components, deployment configuration and tests, are specified, reused, managed and organized from the viewpoints defined by the customized software factory schema.
A range of mechanisms can be used for product specification and implementation, depending on the extent of the variation required. This lets us move from the totally open ended hand coding that forms the bulk of software development today, to more constrained activities. Czarnecki and Eisenecker [CHE04] describe a spectrum of variability mechanisms that ranges from routine configuration, through staged configuration based on feature models, to creative construction using models, as illustrated in Figure 4. Hand coding, which is not shown, would appear to the right of models on this spectrum. Patterns, which are also not shown, would appear between models and hand coding.
Figure 4. Variability Mechanisms
Product development is organized and constrained by the development process that was defined when the Software Factory was developed. This process is customized during the customization of the software factory schema for a specific family member.
Since a software factory schema is a process framework, workflow is not prescribed. We can work in any way we choose, according to the needs of the project, the environment and the circumstances that arise, provided that the process constraints defined in the software factory schema are satisfied.
- We can work top down from requirements to executables, keeping related artifacts synchronized. For example, we might develop a business entity model, use it to produce a logical data model and then use that to produce an optimized database schema.
- We can leverage constraints defined by horizontal mappings between viewpoints. For example, we might use information about the deployment environment, such as the available protocols, to constrain designs of service interactions, to ensure that implementations of the services will deploy correctly into the execution environment.
- We can work bottom up, developing test harnesses for various parts of the product first, and testing components as we develop them.
When the software factory schema is completely populated, the process is complete. Note that most of the work that would ordinarily be bottom up development occurs during the development of the Software Factory, as languages, patterns, frameworks and components are created to supply abstractions for product development. A certain amount of bottom up work will always occur in product development, however, since each product has unique requirements that may not be addressed by the preexisting abstractions. Once a customized software factory schema has been adequately populated, meaning that we have developed artifacts for a working subset of the viewpoints, we should be able to rapidly modify the implementation to accommodate changing requirements. This rapid feedback allows us to scale up to significant levels of complexity without losing agility.
In a mature software factory, application development consists primarily of component selection, customization, adaptation, extension and assembly. Instead of writing large amounts of new code, application developers obtain most of the required functionality from existing components. Case studies documented by Clements and Northrop [CN01], Bosch [Bos98], Weiss and Lai [WL99], and others suggest that less than a third of each product is developed from scratch, and that even higher levels of reuse can be sustained in some cases.
Looking around the industry today, however, we see only a few cases of systematic family based product development, most notably packaged enterprise applications. Why don't we see more? One reason may be lack of knowledge. Product line practices are not well understood in the software industry. Another reason may be inertia. Established practices are often hard to replace, especially when not only technical, but business and organizational changes are required. Perhaps the most significant reason, however, is the high cost of developing the reusable assets, especially the tools. One of the keys to reducing the cost is avoiding the mistake of trying to build everything in house.
As product developers increasingly depend on external suppliers, supply chains will emerge, as they have in other industries. According to Lee and Billington [LB94], a supply chain is a network that starts with raw materials, transforms them into intermediate goods, and then into final products delivered to customers through a distribution system. Each participant consumes logical and/or physical products from one or more upstream suppliers, adds value, usually by applying them to develop more complex products, and supplies the results to downstream consumers.
In other words, suppliers are connected, so that outputs from upstream suppliers become inputs to downstream suppliers. Upstream suppliers can supply implementation assets, such as components, or process assets, such as tools and process documentation, to downstream ones. There are important differences between these two types of assets in a supply chain. Since products produced by downstream suppliers incorporate implementation assets supplied by upstream ones, the downstream suppliers have narrower scope and produce larger products than the upstream ones. For example, a supplier of Online Commerce Systems that consumes a user interface framework has a narrower focus and produces larger products than the supplier of the user interface framework. This may not hold for process assets. The user interface framework supplier, for example, may consume configuration management and defect tracking tools that are both larger and narrower in scope than the user interface frameworks its supplies.
Products developed by a supply chain span organizations, separating consumers from suppliers. Common components are commoditized and custom suppliers specializing in a wide variety of domains appear, making products that would be too expensive for a single supplier to build readily available. Requirements become the focus of customer relationship management. Service level agreements govern transactions between consumers and suppliers. Repairs and assistance are provided under warranty.
One of the keys to success in a software supply chain is aligning the architectures and processes of suppliers. As we have seen, one of the main impediments to component composition is architecture mismatch. If components are not designed with compatible assumptions about how they will interact with each other and with the platforms on which they execute, then they cannot be assembled without extensive modification, adaptation or mediation. The less compatible these assumptions, the greater the level of architecture mismatch, and the greater the amount of adaptation required to assemble the components. At some level of architecture mismatch, it may not be possible to assemble the components successfully. Similar alignment issues apply to the development process for suppliers that seek to consume or supply process assets. For example, a supplier that uses testing tools from one supplier may not be able to consume defect tracking tools from another supplier, if they cannot be integrated with the testing tools. Integration may require the tools to make compatible assumptions about the development process.
One way to ensure the required alignment is to enable constraints to propagate downstream, by including them explicitly as metadata in component packages. Examples of this kind of constraint propagation already exist between development and deployment tools, where a development tool places constraints on the configuration of the target platform as metadata in deployable component packages. The deployment tool then picks up these constraints and either configures the target platform as necessary to satisfy the constraints, or validates that the target platform is correctly configured and disallows deployment if the constraints are violated.
Constraint propagation affects requirements, architecture and implementation.
- Downstream suppliers inherit both fixed and variable requirements from upstream suppliers. They then bind some of the variable requirements, turning them into fixed requirements for suppliers further downstream. For example, the Online Commerce System supplier may inherit fixed user interface requirements from the user interface framework supplier, such as the requirement that all Online Commerce Systems be web based.
- Downstream suppliers inherit or aggregate some parts of the architectures of their suppliers. For example, the Online Commerce System supplier inherits the user interface portion of its architecture from the user interface framework supplier.
- Downstream suppliers may adapt or aggregate components from their suppliers. For example, the Online Commerce System supplier may extend the user interface framework with a custom control for browsing product catalogs.
Some of the most advanced examples of software supply chains can be found in embedded software development in the automotive and telecommunications industries. The alignment issues described above have been significant challenges for these pioneers. Daimler Chrysler, for example, works as a systems integrator, prescribing the overall architecture for the systems they place in their automotive products, while relying on commissioned suppliers like Bosch and Siemens for most of the software components. Managing the relationship between the systems integrator and the component suppliers can be challenging. The challenges include specifying commissioned components without seeing their source code, while keeping the number of iterations and revisions low to manage costs, deciding what knowledge to keep in house and what to outsource, and protecting the intellectual property of each of the parties, some of whom may be competing, rather than cooperating in other contexts.
Because of these relationships, a supply chain creates pressure to standardize:
- Product specification formats, to streamline negotiations between consumers and suppliers.
- Assets for common domains, especially architectures and patterns, to make independently developed components easier to assemble across platform boundaries with predictable results.
- Packaging formats containing component metadata, to make components easier to discover, select, license, configure, install, adapt, assemble, deploy, monitor and manage.
Fine, you say, these kinds of relationships already exist. While this is certainly true, the level of interdependency in the software industry is much lower than in more mature industries. This is good news, however, since it means that we have an opportunity to significantly reduce costs, reduce time to market, and improve the quality of software products by increasing the level of interdependency, and by taking advantage of lessons learned in other industries about how to make supply chains more efficient. Exploring this opportunity further is unfortunately beyond the scope of this article.
How Supply Chains Form
Software Factories promote the formation of supply chains by partitioning software factory schemas, either vertically or horizontally, to shift responsibility to external suppliers.
- A vertical partition lets a Software Factory assemble components provided by upstream suppliers. Imagine, for example, that the entity framework in the example above came from an Independent Software Vendor (ISV).
- A horizontal partition separates product line and product developers, enabling product developers to use production assets provided by external product line developers at the same level of the supply chain. This can take one of two forms:
- Product line development is outsourced. Imagine, for example, that instead of working for the ISV in the example above, the product line developers who developed the Software Factory worked for an outside Systems Integrator (SI). Instead of building Software Factories for in house developers, they build them for developers in customer organizations.
- Product development is outsourced. Imagine, for example, that the product developers in the example above work for the SI, using the Software Factory developed by the product line developers working for the ISV. The product developers may be located off shore, in lower cost labor markets.
Of course, both types of partition can appear anywhere in the schema, and they can be combined in arbitrary ways. They can also disappear and reappear, as business conditions change. In mature industries, many levels of suppliers contribute to finished products, and there is constant churn in the supply chain, as suppliers enter or leave relationships.
Some industries, such as the web based PC business, produce product variants on demand cheaply and quickly for individual customers today. While software product mass customization will not appear for some time, the broad adoption of Software Factories will make it possible. This occurs in other industries today when businesses optimize their internal value chains. According to Porter [Por95], a value chain is a sequence of value adding activities within a single organization, connecting its supply side with its demand side. Value chain optimization requires integrating business processes like customer relationship management, demand management, product definition, product design, product assembly and supply chain management, as illustrated in Figure 5. Software product mass customization will appear when software suppliers optimize their value chains. Imagine ordering a customized financial services application from a web site, the same way customized PCs are ordered today, with the same level of confidence and comparable delivery times
Figure 5. Value Chain Integration
Software Factories are based on the convergence of key ideas in systematic reuse, model driven development, development by assembly and process frameworks. Many of these ideas are not new. What is new is their synthesis into an integrated approach that lets organizations with domain expertise implement the Software Factory pattern, building languages, patterns, frameworks and tools to automate development in narrow domains.
We think key pieces of the Software Factory vision will be realized quickly, and some over several years. Commercial tools that can host Software Factories are available now, including Microsoft Visual Studio .NET and IBM WebSphere Studio. DSL technology is much newer than most of the other technologies, and relies on families of extensible languages. DSL development tools and frameworks are currently under development, however, and have already started to appear in commercial form.
This article has described a methodology called Software Factories that integrates critical innovations to promote the transition from craftsmanship to manufacturing. The next article in this series answers frequently asked questions about software factories.
A great many issues are discussed only briefly in this article, probably leaving the reader wanting evidence or more detailed discussion. This subject is covered, with much more detailed discussion, in the book "Software Factories: Assembling Applications with Patterns, Models, Frameworks and Tools", by Jack Greenfield and Keith Short, from John Wiley and Sons.
Copyright © 2004 by Jack Greenfield. Portions copyright © 2003 by Jack Greenfield and Keith Short, and reproduced by permission of Wiley Publishing, Inc. All rights reserved.
[Bos98] J. Bosch. "Design Patterns as Language Constructs." Journal of Object Oriented Programming (JOOP), May 1998.
[CHE04] K. Czarnecki, S. Helsen, U. Eisenecker. Staged Configuration Using Feature Models. Proceedings Of Software Product Line Conference 2004.
[CN01] P. Clements and L. Northrop. Software Product Lines : Practices and Patterns. Addison-Wesley, 2001. ISBN: 0201703327
[Cop99] J. Coplien. Multi-paradigm design. In Proceedings of the GCSE '99 (co-hosted with the STJA 99).
[LB94] H. Lee and C. Billington. "Designing Products and Processes for Postponement" in S. Dasu and C. Eastman (eds.) Management of Design: Engineering and Management Perspectives (Kluwer, Boston 1994), pp. 105-122.
[Por95] M. Porter. "Competitive Advantage". Ch. 1, pp. 11-15. New York: The Free Press, 1985.
[WL99] D.M. Weiss, C.T. Robert Lai. Software Product-Line Engineering. Addison-Wesley, 1999.
About the author
Jack Greenfield is an Architect for Enterprise Frameworks and Tools at Microsoft. He was previously Chief Architect, Practitioner Desktop Group, at Rational Software Corporation, and Founder and CTO of InLine Software Corporation. At NeXT, he developed the Enterprise Objects Framework, now called Apple Web Objects. A well known speaker and writer, he also contributed to UML, J2EE, and related OMG and JSP specifications. He holds a B.S. in Physics from George Mason University. Jack can be reached at firstname.lastname@example.org.