Bare-Naked Languages or What Not to Model


by Jack Greenfield

Summary: Domain-specific language (DSL) technology was introduced at Microsoft as part of the software factories methodology. While DSLs are useful as stand-alone languages, placing them in the context of a software factory makes them more powerful and helps avoid some common pitfalls in applying the technology. This article explains how DSLs fit into the software factories methodology and how using the methodology can help DSL developers avoid some common pitfalls.


When Not to Use DSLs
Best Practices for MDD
Generalizing Problems
When to Use DSLs
Employing a Maturity Curve
Supporting a Development Process
Profile of an Architect: Jack Greenfield
Jack Greenfield's Resume
About the Author

If you have been involved in the Visual Studio community over the last two years, you're aware of the excitement around domain-specific languages (DSLs) in the Visual Studio community (see Resources). DSLs are visual, powerful, and easy to use. More important, there are compelling examples of DSLs already in use or under development, including DSLs for defining work item types, object relational mappings, and service contracts.

Unfortunately, in all the excitement, we sometimes see people misapplying the technology. The problem is not that the motivation for using DSLs is poorly understood. On the contrary, the community seems to clearly understand the benefits they offer over general-purpose modeling languages like unified modeling language (UML), which is gratifying, since we have devoted a lot of article, podcast, panel, and keynote space to that topic over the last two years. Here is a quick recap.

A general-purpose modeling language is designed to support the development of models that serve primarily as documentation. Such a language can describe any domain, but only imprecisely. The imprecision is a consequence of using generic abstractions, such as class, activity, use case, or state. The use of such generic abstractions is what makes a general-purpose modeling language broadly applicable. In the case of UML the generic abstractions are defined using informal, natural language rather than formal or semiformal semantic, definition techniques, such as translation, execution, or denotation. The combination of generic abstractions and informal semantics prevent such a language from describing any particular domain precisely.

A DSL, by contrast, is designed to describe precisely a specific domain, such as a task, platform, or process. Instead of generic abstractions, it uses concepts taken directly from the target domain. The precision offered by such a language is further improved by defining its semantics using either translation or execution. This level of formality is possible because the language designer knows how models based on the language will be used for computation. Examples of the kinds of computations performed with DSL-based models include: generating code or other artifacts; creating relationships among model elements and code or other artifacts; validating relationships among model elements and code or other artifacts; configuring a run-time component, such as a Web server; and analyzing models and related code or other artifacts (for example, computing the impact of a given change).

At least that's the theory. The problem is that in practice people don't always seem to know when to use or not to use a DSL, what to model, or how to develop a set of related models that can be stitched together to support a development process. These problems can be avoided by building software factories, instead of bare naked languages, which are DSLs that are not defined in the context of software factories. A software factory usually supplies a variety of reusable assets, including DSLs, to support software development. More importantly, it helps developers build useful assets by placing them in the context of a product architecture and a development process for a specific type of deliverable.

When Not to Use DSLs

Let us look more closely at the most common pitfalls in applying DSL technology. We see people using DSLs to solve problems that are not well suited to modeling, such as problems that can be solved more easily with other tools, or problems in poorly understood or rapidly-evolving domains, where reusable patterns and practices have not yet emerged or stabilized. For example, we have seen DSLs for entity relationship mapping and business application development become obsolete while they were still being developed because the platform technology underneath them was still evolving.

We also see people who are confused about what to model. This confusion is usually manifested by scoping problems. The two most common scoping problems are using a single DSL to capture information about too many different kinds of things, or to capture too many different kinds of information about the same thing, and factoring concepts poorly among multiple DSLs. For example, we have seen one DSL used to describe the decomposition of systems into collaborating service-based components, and the messages exchanged by the components, and the payloads carried by the messages, and the structure of the component implementations, and the configuration of the components for deployment.

Industry experience with patterns and aspects has clearly demonstrated that tangling many different concerns in a single piece of code creates problems. This principle applies to models as well as to code. For example, I might want to change a message payload definition for a component, while someone else changes the component implementation structure using the DSL described previously. We will face a complex merge problem when we check in our changes because two different concerns have been tangled in a single model. The problem is that the model is poorly factored. A poorly factored model is like poorly factored code. It fails to separate concerns, making change difficult and error prone. We can improve poorly factored models the same way we improve poorly factored code, by refactoring to patterns that cleanly separate pieces of code or concepts that represent different concerns. A good clue that two concepts in a language represent different concerns is that they change at different rates or for different reasons.

As for stitching models together to support a development process, we see a lot of interest in the Visual Studio community, but not many working examples based on DSLs. Object-oriented analysis and design (OOA&D) popularized the idea of using a set of interrelated models to span gaps in the development process, especially the gap between requirements and design (see Resources). Unfortunately, it never quite lived up to the promise, since the models involved were merely documentation based on general-purpose modeling languages that only loosely described the requirements or the design.

Also, the relationships between the models were defined only informally, with the goal of helping humans perform the transformations between them by hand. Perhaps the most serious problem, however, is that the gap between requirements and design must be spanned in different ways for different types of applications. A lot of important detail was lost as people tried to apply a one-size-fits-all approach using generic models to complex problems.

Despite these shortcomings, the vision remains powerful, and we see many people attempting to realize it in a variety of ways. Most of them do it manually, some have developed DSL integrations, and some have tried model-driven architecture (MDA) from the Object Management Group (OMG). Of course, MDA never quite lived up to the promise, either, as has been explained in several podcasts and articles. I will recap briefly the reasons here to save you the effort of looking them up. You can then refer to them if you like for more detailed discussion (see Resources).

MDA emphasizes platform independence. In practice, platform independence is not an absolute quality of a model. It's a quality that a model can have to varying degrees. For example, an entity relationship model may not depend on the features of a specific version of a database management system from a specific vendor, but it still assumes the existence of a relational platform with certain capabilities. I have yet to see a model that is truly independent of all assumptions regarding the capabilities of the underlying computational platform.

MDA assumes that the world consists entirely of models. Of course, we know that it contains many other types of artifacts manipulated by many techniques and technologies other than modeling. A model-driven development methodology should embrace those other techniques and technologies and should explain how models integrate with those other types of artifacts.

MDA relies on a single general-purpose modeling language: UML. I have already explained the motivation for using DSLs rather than a general-purpose modeling language like UML. A model-driven development (MDD) methodology should use modeling languages that are powerful enough to support application development in the real world.

MDA assumes that only the same three kinds of models are needed regardless of what is being modeled. One is called a computation-independent model, one is called a platform-independent model, and one is called a platform-specific model. This assumption is really another manifestation of a generic approach. It does not make sense to define a specific set of models for a specific type of software if the modeling language is not capable of describing that type of software any differently than it would describe some other type of software. In practice, many different kinds of models are needed to describe any given piece of software, and the kinds of models needed are different for different types of software.

MDA focuses entirely on transformation. From the platform-independent model, we push a magic button that generates the platform-specific model. From there, we push another one to generate the code. Computer-aided software engineering (CASE) demonstrated in the late 1980s and early 1990s that there are no magic buttons. MDA is trying to go one better than CASE by relying on two of them. In practice, transformation is usually the last computation to be supported effectively between two models or between a model and code or other artifacts, simply because it requires so much knowledge about the two domains. Also, the engineering challenges of managing changes to one or both of the artifacts are daunting. Long before transformation can be supported, other forms of model-based computation are possible.

Best Practices for MDD

At this point, you may be wondering if there are some best practices you could follow to avoid these problems. Not only are there best practices, there is a methodology that embodies a set of integrated best practices for MDD. I am talking about the methodology we call software factories. If you have been involved in the Visual Studio community over the last two years, you are also aware of the excitement around software factories. If you're not familiar with software factories, their home page on the MSDN site is a good starting point for finding more information (see Resources). You may also have noticed that DSLs and software factories are often mentioned in the same breath, and you may be wondering how they are related. Let us look at the answer to that question.

Let us start our discussion of software factories by looking at the root cause behind the pitfalls described previously. All of the problems boil down to the desire for a universal solution technology that can be applied to all problems. In the case of DSL technology, this desire manifests itself in the assumption that a DSL is the best solution, no matter what the problem. This tendency, which I call "the hammer syndrome," is not unique to DSLs, of course. It afflicts any technique or technology that has proven useful for solving a frequently encountered class of problems. What causes the hammer syndrome? Michael Jackson offers this diagnosis (see Resources):

Because we don't talk about problems, we don't analyze or classify them, and so we slip into the childish belief that there can be Universal Development Methods, suitable for solving all problems.

Despite the marketing hype, we know that there cannot be a universal modeling language that describes every domain precisely enough to support computation. By the same reasoning, we should also know that a DSL cannot be the best solution to every problem. Given the power and elegance of DSL technology, however, it's easy to start thinking that every problem looks like a nail.

If we could describe the classes of problems that DSLs are good at solving, then we should be able to codify some guidance for using them appropriately. But why stop there? Why not develop a systematic way to analyze and classify problems? If we could describe frequently encountered classes of problems, then we should be able to work on developing techniques or technologies that are good at solving the problems in each class. This approach is the basic thinking behind patterns and frameworks. It is also the basic thinking behind software factories, which are based on the same underlying principles and assumptions as patterns and frameworks.

A factory describes one or more classes of problems that are encountered frequently when building a specific type of deliverable, such as a smart client, a mobile client, a Web client, a Web service, a Web portal, or a connected system. These examples are horizontal, meaning that they are based on specific technologies and architectural styles. A software factory can also target vertical deliverables, such as online auctions, commerce sites, banking portals, or even retail banking portals. As we shall see, the description of the frequently encountered classes of problems supplied by a factory identifies one or more domains that may be candidates for DSLs.

Why target a specific type of deliverable? Why not use a generic problem analysis and classification mechanism like an enterprise architecture framework (EAF)? There are several well-known EAFs, such as the Zachman Framework, the Open Group architecture framework (TOGAF), and the pattern frame from Microsoft patterns & practices (see Resources). As you might have already guessed from these examples, EAFs are often expressed as grids with rows representing different levels of abstraction and columns representing different aspects of the development process.

The motivation for using a factory instead of an EAF is the same as the motivation for using a DSL instead of a general-purpose modeling language. Like a general-purpose modeling language, an EAF is merely documentation that imprecisely describes almost any deliverable. By contrast, a factory contains a schema that describes a specific type of deliverable precisely enough to support many useful forms of computation.

Clearly, different classes of problems are frequently encountered when building different types of deliverables. A mobile client presents different problems than a smart client, and a Web service presents different problems than a Web portal. Such contrasts do not mean that we should think in terms of building two different DSLs, one for each type of deliverable. We need to think more broadly than a single language when dealing with problems as large as building an entire application, to avoid the kinds of scoping problems described previously.

We should therefore think in terms of building two different factories. Those two factories may indeed contain two different DSLs. More likely, however, each factory will contain multiple DSLs, since multiple DSLs will be needed to support the development of the entire deliverable. How many of those DSLs will appear in both factories? The answer would require a bit more analysis than space allows here, but we can safely assume that there will be some DSLs in common, and some DSLs unique to each factory. In other words, the differences between the two factories will probably go beyond two different DSLs.

There will be many different DSLs and they will be integrated in different ways. The two factories will have different schemas. Also, because a factory schema is not obliged to be rectangular, we can define as many classes of problems as we need to accurately analyze a given type of deliverable. Relationships between different classes of problems are also easier to express, since they do not rely on adjacency between cells arranged in rows and columns.

To develop a factory schema we analyze the deliverables of a given type to identify and classify the problems we frequently encounter when building them. We start by breaking down the problem of building a typical deliverable into smaller problems, and we continue until the leaves of the tree are bite-sized problems dealing with just one part of the deliverable, such as data contracts, service agents, workflows, or forms. In practice, we often end up with multiple, bite-sized problems working with different aspects of the same part, such as its structure, its behavior, or its security policy. By decomposing the problem, we separate concerns and identify discrete problems that can be worked relatively independently.

Generalizing Problems

The next step is to generalize from specific problems encountered when building several typical deliverables to classes of problems that we can reasonably expect to encounter when building any deliverable of the same type. The specifics will vary from one deliverable to the next, of course, but the classes of problems will remain the same. The specific services accessed will vary from one smart client to the next, for example, but we can reasonably expect to encounter service access problems when building any smart client.

Look more closely at the smart client example to see the kinds of problems we can define. A smart client has a composite user interface that displays multiple views. Each view reflects the state of some underlying model. The state of the view and the state of the model are synchronized by a presenter. Changes in one view may affect another view through events. Presenters collaborate by publishing and subscribing to events, enabling scenarios like master detail, where the contents of a detail view, such as a form, track the selection in a master view, such as a tree or list view.

Models, views, presenters, and events are grouped into modules that can be loaded on demand. Smart clients also often interact with databases through data access layers or with remote services through service agents. These components can cache the results of database or service interactions for efficiency or let an application continue working offline when a database or service goes offline, synchronizing changes when it comes back online. Security is another key concern, as a smart client may interact with multiple databases or remote services using different authorization or authentication mechanisms.

As the example suggests, a good way to find frequently encountered classes of problems for any deliverable type is to follow the architecture. We start with the largest components of the architecture and break them down into smaller ones. We then identify the problems we face in designing, implementing, testing, configuring, or changing each component. The example also illustrates that different parts of a deliverable present different kinds of problems in different phases of the life cycle, and at different levels of abstraction. Building service interfaces presents different kinds of problems than building service proxies, and testing the data access classes presents different kinds of problems than designing the business logic.

To put this discussion into the vocabulary of DSLs, we can think of each class of problems as a problem domain. Building an entire deliverable is a large problem domain containing many concerns. We therefore decompose it into smaller problem domains until the leaves of the tree are bite-sized problem domains that can be worked relatively independently. For a smart client, these independent problem domains might include:

  • User interface design, consisting of view development, model development, presenter development, and view integration
  • Service design, consisting of contract design and service interface construction
  • Business logic construction, consisting of business logic workflow and business logic classes
  • Data access construction, consisting of service proxy construction and data access classes

Some of these domains may be good candidates for DSLs.

This conclusion is important. We can now answer one of the questions posed earlier: How do we know what to model? The first step in answering that question is to identify the type of deliverable we are trying to build. A fundamental tenet of DSL technology is that a modeling language must focus on a specific domain to provide precision. We cannot focus on a specific domain outside the context of a specific type of deliverable. The second step is to decompose the target deliverable type into bite-sized problem domains that can be worked relatively independently.

A factory schema captures the results of this analysis in a computable model that identifies those problem domains. (The factory schema is a model based on a DSL supplied by the factory-authoring environment.) If we do a good job of defining the factory schema, then DSLs developed for those problem domains will be scoped in a way that effectively separates concerns.

Now we will look more closely at the factory schema to see exactly how those problem domains are described, how DSLs map onto them, and how they are organized in the factory schema in a way that makes the DSLs easy to stitch together to support a development process.

When to Use DSLs

Of course, DSLs are not the only choice of technology available, as noted earlier. DSL-based designers are just one type of asset that can be supplied by a factory. Other asset types include Visual Studio templates, text generation templates (known as T4 templates in the DSL Tools), recipes defined using the Guidance-Automation Toolkit (GAT), code snippets, documentation describing patterns and practices, class libraries, and frameworks (see Resources).

We have already seen that different parts of a deliverable may present different kinds of problems. Building them may therefore require different techniques or technologies. For example, we might use .NET framework classes and code snippets to build data access classes, but we might use a wizard to expand a template that creates a starting point for a service agent, and then complete the coding by hand using guidance provided by documentation. Note that in this example, we are not using any DSLs. It's quite legitimate to build a factory that provides only passive assets like patterns and practices.

We are now ready to look at another question posed earlier: How do we know when to use or not to use a DSL? I will rephrase the question based on this discussion. How do we know that we should use a DSL, instead of some other type of asset, to solve the problems in a given domain?

To answer that question, we need to look more closely at the factory schema. We have already said that the factory schema is a model designed to support computation, and that it describes one or more problem domains representing the classes of problems that are encountered frequently when building a specific type of deliverable. We have also talked about decomposing large problem domains into smaller ones that can be worked relatively independently. We are now ready to put these ideas together to explain how a factory schema works, and how it helps us determine when to use or not to use a DSL.

A factory schema is a tree. Each node in the tree is called a viewpoint. A viewpoint corresponds to a problem domain. The viewpoint at the root of the tree corresponds to building an entire deliverable. The viewpoints below the root are derived by decomposition. A viewpoint describes its corresponding problem domain by describing the problems we may encounter and telling us how to solve them. It describes the solutions in terms of activities we may perform, explaining how to perform each activity, and how to recognize when the activity is required. The activities are described in terms of the work products they manipulate. Work products are the artifacts we build to produce a deliverable. Finally, a viewpoint describes a set of assets supplied by the factory to help us solve the problems we may encounter in the domain. The assets are designed to support the activities, often by fully or partially automating them.

Take the contract design viewpoint as an example. The work products are data contracts and message contracts. The activities include creating or destroying a data contract; adding, removing, or modifying data elements in a data contract; creating or destroying a message contract; adding, removing, or modifying methods in a message contract; and connecting data contracts to method arguments in message contracts. The assets supplied by the factory for the contract design viewpoint include a DSL-based Contract Designer, including a set of T4 templates for generating the contract implementations; a set of documents explaining how to build a data contract and how to build a message contract; a checklist for validating contract designs; a set of common message exchange patterns; and a set of patterns describing the message contracts and data contracts used by each message-exchange pattern.

Clearly, this viewpoint was a good candidate for a DSL. We can now codify some guidance regarding when to use or not to use a DSL: A DSL should be used instead of some other type of asset when it is the best mechanism available for supporting the activities of a specific viewpoint.

Employing a Maturity Curve

This answer begs another question, of course. How do we know when a DSL is the best mechanism available?

The best way I know to answer that question is with a maturity curve. We start by making modest investments in simple assets for a given viewpoint and then gradually increase our level of investment over time, building increasingly sophisticated assets as we gain a deeper understanding of the problem domain targeted by the viewpoint and greater confidence that we have scoped and defined the viewpoint correctly. Then we place it correctly in the context of enclosing and neighboring viewpoints as part of a factory schema.

At the low end of the maturity curve are simple assets like guidelines and other unstructured documents that help the reader know what to do and how to do it. After using and refining the documents, we may decide to formalize them into patterns. Over time, we may decide to implement the patterns as templates that can be instantiated rapidly using wizards or recipes developed with the GAT. At some point, we may decide to replace the templates with class libraries delivered as example code or as assemblies.

The class libraries, in turn, may become frameworks, servers, or other platform components, as we connect the classes to form mechanisms that embody the underlying patterns. At this point, we have reached the high end of the maturity curve, where DSLs are most effective. We can use DSLs to wrap a framework, a server, or other platform components. They are particularly effective when the metadata captured by the designer can be consumed directly by the framework, a server, or other platform components, or used to generate configuration and completion code. DSLs are very good at solving these kinds of problems:

  • Providing a graphical editor for an existing XML document format
  • Rapidly generating a tree view and form-based user interface
  • Capturing information used to drive the generation of code or other artifacts
  • Expressing a solution in a form that is easier for people to understand
  • Describing abstractions that cannot be discovered easily from the code
  • Describing domains that are easy to represent diagrammatically, such as flows, conceptual maps, or component wirings

In general, they are good at handling domains containing moderate amounts of variability. On the one hand, using a DSL to configure a run time that has a few simple variables with fixed parameter values would be overkill. A form containing a few drop-down choices, check boxes, or radio buttons would be a better choice. On the other hand, a DSL may not be powerful enough to support the construction of complex procedural logic. A conventional, textual programming language would be a better choice.

It is also possible to use a DSL to implement a pattern language without the assistance of a framework, a server, or other platform components. One situation in which this approach makes sense is generating from models based on the DSL to models based on less abstract DSLs. For example, we might generate models describing message contracts from models describing business collaborations. Another situation in which this approach makes sense is generating from models based on a DSL to code that effectively implements a framework, a server, or other platform components. For example, we might generate a device-independent platform used by high-level code by generating services that run on a target device from a model describing the device characteristics.

Supporting a Development Process

Of course, it sometimes makes sense to build a DSL without climbing the maturity curve described previously. Building simple DSLs can be quite inexpensive, and a small team may be able to rapidly refine their understanding of a target domain by iterating over the definition and implementation of a DSL. However, I would not recommend attempting to build a complex designer or attempting to deliver even a simple designer into a large user base that requires significant training, such as a field organization, without first making sure you understand the target domain and its relationships to enclosing and neighboring domains.

Some factories may consist of nothing more than documents describing patterns and practices organized around the factory schema. Other factories may have a DSL for every viewpoint. In practice, we tend to see a mix of asset types in the typical factory, with a large number of viewpoints served by code-oriented assets, such as patterns, templates and libraries, and a modest number of viewpoints, usually the most stable and well understood, served by powerful DSL-based designers.

We have addressed two of the three questions introduced earlier. Can we answer the third? How do we stitch DSLs together to support a development process? As it turns out, we have already covered most of the ground required to answer that question. Given a good factory schema that decomposes the problem of building a specific type of deliverable into relatively independent viewpoints, the next step is to discover relationships between the viewpoints.

We start by looking at how the work products in one viewpoint relate to the work products in other viewpoints. Since we are working with DSL-based viewpoints, the work products are defined in terms of model elements on diagrams. There are many ways in which they can relate. Here are just a few:

  • Work products in one viewpoint may provide details about work products in another. For example, a class diagram may provide details about the implementation of an application described in a system diagram.
  • Work products in one viewpoint may use work products in another. For example, a business collaboration diagram may use message types defined in a contract design diagram.
  • Work products in one viewpoint may be wholly or partially derived from work products in another. For example, a business collaboration diagram may be partially derived from a use case on a use diagram.
  • Two viewpoints may provide different information about the same work products. For example, a class diagram describes the structure of a group of classes, while a sequence diagram describes their behavior in terms of interactions.
  • Work products in one viewpoint may describe the relationship between work products in others. For example, a deployment diagram contains elements that describe the relationship between an element in a logical, data-center diagram and one or more elements in a system diagram.

Once we have relationships between viewpoints based on work products, the next step is to look at how activities in the related viewpoints affect each other. For example, take the relationship between the logical data-center diagram, the deployment diagram, and the system diagram. In the system design viewpoint supported by the system diagram, the primary activity is configuring systems, applications, and endpoints. In the logical, data-center design viewpoint supported by the logical, data-center diagram the primary activity is describing logical host types and their configurations and settings. In the deployment diagram the primary activity is mapping systems onto logical host types.

The relationships between the viewpoints support an activity that combines these activities to form a workflow called design for deployment, where a solutions architect configures systems, then maps the system configurations onto logical host types captured by an infrastructure architect in a trial deployment, and then validates the trial deployment to see if the system configurations will deploy into the data center correctly.

We can now see how DSLs fit into the larger picture of software factories. We introduced DSLs at Microsoft to support the Software Factory methodology, and the technologies are highly complementary. A factory schema breaks a large problem domain into smaller ones. The viewpoints at the leaves of the schema target relatively independent components or aspects of the target deliverable type. Some of the viewpoints may be candidates for DSLs from day one. Others may start with assets further down the maturity curve and gradually evolve toward DSLs over time as experience is gained. Some may remain volatile and challenging, relying on relatively unsophisticated assets, such as documentation, for the life of the factory.

DSLs that target the factory viewpoints will be well scoped, if the factory is well designed, and will have well-defined relationships to other DSLs that make it easy to stitch them together to support development processes. At some point, we may offer technology that generates DSL integration code from the factory schema, and run-time technology that supports the generated integrations.

The next time someone asks you to build an isolated DSL for a broad domain like banking applications, ask them what aspect of banking they want you to target. When you get a blank stare in response, you will know that what they really need is a factory that breaks the domain into bite-sized nuggets, some of which may be supported by DSLs and some of which may be supported by other types of assets, not a bare naked language.

Profile of an Architect: Jack Greenfield

Who are you, and where do you work?
I am Jack Greenfield. I work as an architect in the Enterprise Frameworks and Tools (EFT) group in the Visual Studio Team System organization, which is part of the server and tools business at Microsoft.

Can you tell us a little about your career at Microsoft?
When I joined EFT, the group was already building the set of tools that we now call the Distributed System Designers in Visual Studio 2005 under the leadership of Anthony Bloesch and Kate Hughes. Keith Short saw the tools as a "down payment" on a vision of model-driven development (MDD), where organizations would use models as source artifacts for automating application development, not just as documentation.

EFT was a natural fit for the vision of MDD presented in my book, which was partially completed when I joined Microsoft. My vision was similar to Keith's, in seeking to leverage MDD, but it brought a new twist into the picture by placing MDD in the context of software product line engineering, which emphasizes the development of reusable assets supporting a custom process for building a specific type of deliverable. Keith and I worked together to create a unified vision and to describe it in the book. The result was the methodology we now call software factories.

While we were writing the book we saw an opportunity to use the modeling framework used by the Distributed System Designers as a basis for domain-specific languages (DSLs), which constitute an important component of the methodology. We put together a business case for productizing the framework and for building tools to support it as the first step toward implementing software factories.

We took the proposal to Eric Rudder, supported by a demo, and asked for resources to build a team. We were able to show that we had a good foundation to build on. Eric agreed and provided the funding. We were fortunate to be able to hire an all-star team led by Steve Cook and Stuart Kent to do the work. The result is what we now call the DSL Tools, which were just released in the Visual Studio SDK. Steve and Stuart also joined us in writing the book, contributing chapters on language design and implementation.

During that time, I also led an effort to apply DSL technology to the tools being developed for the Microsoft Business Framework, working with a team led by Steve Anonsen, Lars Hammer, and Christian Heide-Damm. We used some of the ideas from the book about model references and model management on that project. I also did a lot of speaking, writing, and customer interviews to evangelize software factories. Several people helped us promote, refine, and implement the methodology. Michael Lehman, Mauro Regio, and Erik Gunvaldson, in particular, were instrumental. Mauro and I are writing a new book called Software Factories Applied that shows how to build factories using the prototypes he developed for HL7 and UMM as examples.

We are now building additional products to support the software factories vision. One of the most important steps enabling our current work was Mike Kropp becoming the product unit manager for EFT and integrating it with the patterns & practices group, another all-star cast led by Wojtek Kozaczynski and Tom Hollander. The merger of the two groups has put the wood behind the arrow to build the vision.

What kind of advice would you give to someone who wants to become an architect?
Good question. In general, you do not apply for a job as an architect; it just kind of happens to you, and I think it happens because of the way you think, the way you talk, and the kind of work you do. For me, an architect is someone who goes beyond the fine details of programming and thinks about the big picture.

I like a description that was offered by someone I met recently from the Netherlands: an architect mediates communication among people who think about end-user concerns, people who think about technology, and people who think about the business case. They see how the thing can come together in much the same way that an architect in the construction trade sees how a building comes together to serve the needs of its users in an aesthetically pleasing way within the owner's budget and schedule using available building technologies.

At some point someone will say about you, "he or she is our architect," and that is how you will know you are functioning in that role. Over the last few years a lot of good material has been written about architecture and the role of the architect, and an industry consensus regarding best practices is emerging. In addition to development and organizational skills, an architect needs to know how to find and apply relevant research in product designs.

How do you keep up to date?
First, I spend a lot of time with folks inside Microsoft. I have a lot of interactions with other teams that help me see the bigger picture and where my work fits within it. I think being familiar with your company's products is one of the most important obligations of an architect. Of course, Microsoft is huge, and it is hard to get your head around even a small chunk of it, but the more you can learn the better.

From there, I tend to follow movements in industry and academia, such as the design patterns and agile development movements. I read a lot of books and articles by other folks who push the envelope, such as Martin Fowler and Rohan McAdam. On the academic side, I follow the Software Engineering Institute (SEI), especially the work being done by Linda Northrup, Len Bass, Paul Clements, and others on the practice of architecture, which contributed to the design of software factories. I also stay in touch with colleagues like Krzysztof Czarnecki at Waterloo and Don Batory at Austin. I also serve on conference program committees and peer review boards for journals, which give me opportunities to see a lot of good papers. I also do my share of speaking and writing to stay engaged in the community.

Name the most important person you have ever met and why?
Apart from sitting on Lyndon Johnson's knee as a five year old, the most important person I have ever met has to be Bill Gates. I have presented to Bill on a couple of occasions, sitting right across the table from him. On every occasion he has played the same kind of role, which was to ask a lot of questions and then to point us at related work taking place in other parts of Microsoft. I think this really demonstrates the point about knowing your company. Bill is always concerned about leveraging as much as possible of what we are doing around the company.

Another person I would put on my list is Steve Jobs. He is a charismatic and visionary leader. He creates a tremendous amount of excitement in people. It gave me the feeling of doing something world changing.

What is the one thing that you regret most in your career?
I have done more than my share of things that I would do differently in hindsight, but the one thing I regret most is not a single thing, but rather a trend or pattern that if I could I would correct from the start. In the past, I have often thought, "I can do it!" and have just plowed ahead. Instead, I should have stepped back and spent more time working with others as part of a team.

It is hard to overemphasize the value of being part of a team. No one has all the cards as an individual. There have been many times in my career when I have held nine of the ten cards I needed to win, but could not get the tenth card without a team. Unless you work in a way that brings out and synthesizes the contributions of other people, you will not have all the cards when you need them. Thinking you can do it alone is an affliction that plagues many technically gifted people. Look around some time and notice how many B students who work well with teams end up being more successful than A students who know most of the answers. It should be clear from the number of people I have named in this interview that the success of my work depends on the work of many other people and vice versa, and there are many people who have made key contributions to the software factories vision. Some of them are named in the acknowledgements in the first book, and a whole new set will be named in the second book.

What does Jack Greenfield's future look like?
I want to see software factories change the industry. This goal relates to the comment I just made about teams. To have industry-wide impact, software factories will have to be owned and developed by many people. Object orientation emerged in the late 70s and early 80s, but in the course of being scaled out to the masses in the 90s a lot was lost along the way. A lot of what we call object-oriented code today is not object oriented; it just happens to be written in an object-oriented language. I think the same thing is happening with service orientation.

The key to fixing these kinds of problems is changing software development from a craft to an engineering discipline. A lot of people have said it cannot be done, and for some time our industry has been in a phase of focusing on craftsmanship at the expense of developing and applying better engineering practices. Unfortunately, craftsmanship does not scale, and there is a tidal wave coming, driven by changes in the global economy, which will require approaches that do scale.

I think the key to scaling up and scaling out is to leverage experience, which is the goal of software factories. In the simplest terms a factory is experience codified and packaged in such a way that others can apply it. As much as I appreciate "following design smells," if you are still smelling your way through the fifth or sixth Web application, something is wrong. You may smell your way through a part of it, through some new requirement or some new algorithm, but by and large most of the work we do is similar to work that has already been done, and we need to get much better at leveraging that experience. We are getting there slowly but surely. If I can help bring about transition, I will have achieved my professional goals.

Jack Greenfield's Resume


B.S. Physics, 1981; George Mason University; GPA, 3.15/4.00
Graduate Course Work
1988, George Mason University
Continuing Education
1991, Oregon Graduate Institute


Microsoft Corporation (Redmond, WA)
10/2002–present: architect, enterprise tools
Rational Software Corporation (Redmond, WA)
11/2000–7/2002: chief architect, practitioner desktop group
Inline Software Corporation (Sterling, VA)
5/1997–11/2000: chief technical officer
Objective Enterprise LLC (Reston, VA)
4/1995–4/1997: principal
Personal Library Software Inc. (Rockville, MD)
6/1994–4/1995: manager, core technology
3/1994–6/1994: senior architect
NeXT Computer Inc. (Redwood City, CA)
12/1992–3/1994: senior software engineer
3/1989–12/1992: software engineer
Micro Computer Systems Inc. (Silver Spring, MD)
10/1986–3/1989: senior architect
RDP Inc. (Manassas, VA)
3/1986–10/1986: senior systems analyst
TransAmerican Computer Systems Inc. (Fairfax, VA)
6/1985–3/1986: senior software engineer
Business Management Systems Inc. (Fairfax, VA)
10/1981–6/1985: systems programmer/analyst


Dynamic object communication protocol, 1994
Method for providing stand-in object, 1994
Method and apparatus for mapping objects to multiple tables of a database, 1994
Method for associating data-bearing objects with user-interface objects, 1994
Systems and methods that synchronize data with representations of the data, 2004
Partition-based undo of partitioned object graph, 2005


Software Factories: Assembling Applications with Patterns, Models, Frameworks, and Tools. Indianapolis, IN: Wiley, 2004.
Software Factories Applied. Indianapolis, IN: Wiley, 2007.

About the Author

Jack Greenfield is an architect for enterprise frameworks and tools at Microsoft. He was previously chief architect, practitioner desktop group, at Rational Software Corporation, and founder and CTO of InLine Software Corporation. At NeXT Computer, he developed the enterprise objects framework, now a part of Web objects form Apple Computer. A well-known speaker and writer, he is a coauthor of the bestselling and award-winning book Software Factories: Assembling Applications with Patterns, Models, Frameworks and Tools, with Keith Short, Steve Cook, and Stuart Kent. Jack has also contributed to UML, J2EE, and related OMG and JSP specifications. He holds a B.S. in Physics from George Mason University.


MSDN: Enterprise Solution Patterns Using Microsoft .NET
Microsoft patterns & practices

MSDN: Microsoft Visual Studio Developer Center
Domain-Specific Language Tools

Guidance-Automation Extensions and Guidance-Automation Toolkit

Jack Greenfield's Blog

Software Factories

Booch, Grady. Object-Oriented Analysis and Design with Applications (2nd Edition). Boston, MA: Addison-Wesley Professional, 1993.

OMG Model-Driven Architecture

The Open Group
Architecture Forum

Jackson, Michael. Problem Frames: Analyzing and Structuring Software Development Problems. Boston, MA: Addison-Wesley Professional, 2000.

Greenfield, Jack, Keith Short, Steve Cook, and Stuart Kent. Software Factories: Assembling Applications with Patterns, Models, Frameworks, and Tools. Indianapolis, IN: Wiley, 2004.

Enterprise Architecture

The Zachman Institute for Framework Advancement

This article was published in the Architecture Journal, a print and online publication produced by Microsoft. For more articles from this publication, please visit the Architecture Journal website.