James Lapalme
December 2007
Revised February 2008
Summary: This article will discuss the need for separation
between the semantics of an internal DSL and its implementation, and how to
achieve it. (13 printed pages)
"All problems in computer science can be solved by
another level of indirection (abstraction)."
–Butler Lampson
Contents
Introduction
A Simple Look at DSLs
Internal and External Domain-Specific Languages
A Quick Look at NMock2
Programming and Modeling
Separation of Concerns Between Modeling and Simulation
The Rules of the Game
A Simple Example
Implementing a Simulation Framework
Putting Everything Together
Benefits of the Approach
Conclusion
Acknowledgements
References
Introduction
Many people believe that by using Domain-Specific Languages (DSLs),
it is possible to be more productive when developing applications, because of
the higher level of abstraction and the richer semantics. Many people also have
debated about the best approach to use when implementing a DSL; the two basic
schools of thought are embedding (or internal) and stand-alone (or external).
The question now is, "Should an internal DSL be separated
into two parts when implemented: (1) its semantics (or modeling semantics) and
(2) the implementation of those semantics?" With Microsoft .NET Framework
2.0/C#, it is possible to create a clean separation of concerns between the
semantics of an internal DSL and the implementation of those semantics. A clean
separation can be achieved by creating internal DSLs that are based on a few simple
rules. Moreover, because of the reflective capabilities of .NET Framework 2.0
(especially on generics), it becomes possible to bind a solution that is
written with an internal DSL to an implementation for the DSL by using modern
software patterns such as Inversion of Control (IoC) and Proxies.
A Simple Look at DSLs
"What is a DSL?" might be such a simple question, but
the answer is, well, not so simple. No official and universally accepted
definition exists for the moment. Because many people are still arguing over
the correct answer, I propose to jump in and offer a definition of my own.
Simply put, a DSL can be defined by the combination of semantics and syntax in
order to provide a tool to define solutions for a specific group of problems. A
solution that is expressed with a DSL defines the "what" of the
solution, but not the "how" of its implementation.
Because nothing is better then a concrete example, let's look at
the SQL language. The SQL language allows the definition of solutions to data
problems that can be solved by using relational calculus, the backbone of
modern databases. When we write a SQL program, we are basically expressing the
solution of a certain data problem by using a certain number of standardized
semantic and syntactical elements. However, our SQL program abstracts how the
solution will be executed or implemented. The implementation is left to the
relational database engine, which will compile our solution in order to create
and execute a specific execution plan. The SQL programming language allows us
to be more productive by permitting us to concentrate on the solution of a data
problem, and not on the details of its implementation.
If one is ready to accept my simple DSL definition, DSLs are not
a new concept; we have been creating them for many years now. We can view the
various generations of programming languages as a stack of DSLs, in which each
layer of the stack is the "how" of the next:
· First-generation
programming languages (1GL), or binary languages—Allow the definition of
solutions that are based on the semantics of a specific instruction-set
architecture, while abstracting how the instruction set is implemented with
transistors.
· Second-generation
programming languages (2GL), or assembler languages—Allow the definition of
solutions that are based on the semantics of a specific microarchitecture,
while abstracting the encoding of the instructions
· Third-generation
languages (3GL), or general-purpose programming languages (such as C and
C#)—Allow the definition of solutions to problems that can be solved by
algorithmic approaches, by abstracting the microarchitecture that will execute
the solution
· Fourth-generation
languages (4GL), such as SQL and Mathematica—Allow the definition of
solution by using semantics that permit the declaration of the solution,
without specifying the individual steps; hence, abstracting the algorithm
nature of the solution implementation
· Fifth-generation
languages (5GL), such as Prolog and Mercury—Allow the definition of a
solution by specifying the constraints of the problem; hence, abstracting the
solution definition altogether
.gif)
Figure 1. DSL layering
Internal and External Domain-Specific Languages
Most people recognize two approaches to the definition of DSLs:
external and internal. External DSLs are defined by creating the necessary
semantic and syntax elements from scratch. After the language has been defined,
it is necessary to create a compiler or an interpreter to implement the
execution semantics of the language. A simple example of a language of this
type would be HTML.
In contrast, internal DSLs are defined by using as a starting
point the semantics and syntax of a host language, such as a (usually)
general-purpose programming language. Hence, the host language becomes both a
basis and a limit for the definition of the DSL. A simple example of a language
of this type would be Ruby on Rails.
Many people have debated over the better approach between
internal and external DSLs. However, internal DSLs are gaining a lot of
popularity rapidly, because of strong Ruby, Smalltalk, and Lisp communities.
These communities have demonstrated the effectiveness of their respective host
languages to support elegant internal DSLs that can be developed rapidly.
Despite all of the debate about internal versus external and which language is
best suited for hosting, I have seen very little debate about the separation between
the semantics of an internal DSL and its implementation. This article will
discuss the need for this separation and how to achieve it.
A Quick Look at NMock2
Before discussing the core of this article, I want to take a look
at and discuss a real internal DSL.
The official NMock Web site defines NMock as a "dynamic
mock-object library for .NET." A Mock object is like a proxy for a type
instance that does not delegate to the type instance, but instead impersonates
it. Mock objects are very useful when testing software, because they allow the
isolation of a software component from its dependencies in order to test the
component independently of its dependencies.
Although the official site defines NMock as an object library, it
should be viewed as an internal DSL that permits the definition of the behavior
of a mock object, and I will make that case. The following is some code from
the official cheat sheet:
1. Mockery mocks = new Mockery();
2. InterfaceToBeMocked aMock = mocks.NewMock();
3. Expect.Once.On(aMock).Method(...).With(...).Will(Return.Value(...));
4. Expect.Once.On(aMock).GetProperty(...).Will(Return.Value(...);
5. Expect.Once
6. Expect.Never
7. Expect.AtLeastOnce
8. Expect.AtLeast(<# times>)
9. Expect.Exactly(<# times>)
10. Expect.Between(<# times>, <# times>)
11. mocks.VerifyAllExpectationsHaveBeenMet();
The preceding lines of code are elements of the library. The
NMock2 library is implemented as an object-oriented framework. Lines 1-2 are
necessary to create the mock. Lines 3-10 are the method calls that are used to specify
the behavior of the mock object. I want to emphasize the use of the word specify
in the last line. Method calls, such as those on lines 3-10, enable developers
to declare/define the behavior that they want the mock to exhibit, but they do
NOT define the implementation of the behavior.
If we return to the definition of a solution expressed with a DSL
that I proposed earlier—"A solution that is expressed with a DSL defines
the 'what' of the solution, but not the 'how' of its implementation"—a
program that used NMock2 would definitely fall into this category.
NMock2 uses the syntax and semantics of method calls of the .NET
platform to define new semantic elements that are the specific method
definitions in the framework.
Programming and Modeling
When we work on a problem, we often use tools such as UML to
create models. We also use UML to model solutions to problems. Again, I
want to emphasize the use of the word model. When we use UML (no matter
the level of the diagrams), we consider that we are modeling a solution and not
implementing the solution, because we are not writing code. Through
mapping/interpretation, most modern UML tools will allow the generation of code
for a specific technology. But, why are we modeling and not coding? Or, better
yet, is there a difference between modeling and coding, or is it just a
question of perspective?
I would like to point out two things. Firstly, if we replace the
word model with define in the second sentence of the last
paragraph we get: "We also use UML to define solutions to
problems." It seems that UML would also be a DSL for defining solutions,
while abstracting the final implementation that is usually a 3GL language.
Secondly, it seems that a DSL allows us to program, specify, or model a
solution—all three words being interchangeable.
If we accept the last two points, and accept that defining or
programming a solution is, in fact, modeling a solution, the next question is:
"What is the role of an interpreter or compiler towards a model?" One
possible point of view is that both interpreters and compilers have the role of
understanding a model that is defined with the semantics and syntax of a DSL.
Then, they create a representation of that model that can be executed. The main
difference between an interpreter and a compiler is that the interpreter
executes the executable representation, while the compiler generates it but
does not execute it. In some communities, the execution of a model is often
referred to as the simulation of the model.
If we take a moment to summarize what was just discussed, we
started the article by proposing a definition for DSLs. We then worked our way
to creating a parallel between programming with a DSL and creating models. We
then finished with the conclusion that executing programs that are written with
DSLs can be viewed as simulating models.
So, if we come back to our discussion on NMock2 with our new
point of view, we can state that creating an NMock2 program is, in fact, the
activity of creating a model of a behavior that we want to define; and
executing the program is, in fact, simulating that behavior. So, now the
question is: "What does this have to do with this article?"
Separation of Concerns Between Modeling and Simulation
Frameworks such as NMock2 and Ruby on Rails actually define two
things: a modeling language (or domain-specific language) and an infrastructure
that is capable of simulating the modeling language. When working with external
DSLs, there exists a clear separation of concerns between both elements: The
modeling language is defined by a language specification (usually textual), and
the simulation infrastructure that implements the semantics of the language is
implemented in a compiler or interpreter. This separation of concerns is NOT
present in most internal DSLs. The rest of this article will discuss how it is
possible to use the .NET Framework in order to create pairs of modeling and
simulation frameworks, to achieve internal DSLs that posses a clear separation
of concerns between modeling and simulation aspects.
By separating the modeling and simulation aspects of an internal
DSL, we are creating a true language, because no implementation details are
present. However, it is clear that, to execute (simulate) a model that is created
with a modeling framework, it is necessary to have the model interpreted by the
associated simulation framework. Figure 2 and Figure 3 contract the ideal
separation of concerns that we should try to achieve, to compare to typical
approaches.
.gif)
Figure 2. Typical separation
.gif)
Figure 3. Ideal separation
The Rules of the Game
To keep a modeling framework "clean" of all simulation
semantics and artifacts, it is necessary to eliminate all traces of
implementation details. If a framework does not contain implementation aspects,
it necessarily contains only semantic and syntactical elements.
Object-oriented platforms such as .NET offer a well-defined set
of semantic elements, such as methods, properties, inheritance, interfaces, and
so forth. The objective is to use the right combination of theses elements as a
basis for defining a modeling framework. Because our objective is to eliminate
all implementation artifacts, we must use only elements that by definition do
not contain implementation, such as interfaces, abstract classes, abstract
methods (and properties), inheritance, attributes, and class variable
declarations (not instantiation). We can consider that the aforementioned
elements do not contain implementation aspects, because they are by definition
not executable; they do not add any behavioral information to the framework.
All other elements (such as static/virtual/instance members and
class variable instantiations) must have implementation details. A modeling
framework that is defined with only non-implementation-oriented semantic
elements of the .NET platform will be inherently non-executable; hence, it will
contain only the semantics of the modeling language that is represented by the
framework. By our rules, a modeling-framework design basically defines a
modeling-language specification—but in binary form, instead of textual.
NMock2 is almost designed according to theses rules; only a
couple of slight modifications would be needed, such as making the Expect class
abstract, and making all of the methods of the class protected abstract.
To get access to the methods, a programmer would derive a class from the Expect
class. In this context, the Expect class can be compared to a scope; the derive
class in a model would be like a scope instantiation.
The modeling-framework design approach that we just discussed was
successfully used in SoCML a software-/hardware-modeling framework. For more
information about this DSL, see the "References"
section of this article.
A Simple Example
To demonstrate the design approach and the separation of concerns
that I propose, we will create a very simple DSL. All traditional programming
languages have three basic concepts: scopes, operational semantics, and
symbols. A scope permits the definition of an enclosing context. This context
allows the declaration of identifiers, and determines the possible operational
semantics that are available. Symbols are used to represent operational
semantics in a concrete way in programs. For example, in a language such as C#,
a method body is a scope, the "+" token is a symbol, and the
operational semantics of "+" are either addition or concatenation.
Let's use the NMock2 API as a starting point for our language,
which we will call NMock3. We first must create an implementation for the
concept of a scope. Our scope will define a context in which the usage of an
"Expect" operational semantic will be possible. The semantics of
"Expect" will be the same as in NMock2.
To begin, we will create our scope. To do this, we will use an
abstract class to create the definition of a scope, and use inheritance from
this class to instantiate scopes. We will call our scope
"MockingContext".
1. public class MockingContext{}
In this scope definition, we will make our "Expect"
operational semantic available by defining an Expect method within the scope
definition.
public class MockingContext{
protected
something Expect(){body}
}
The preceding code has implemented our scope definition, our
"Expect" operational semantic, and the "Expect" symbol. The
preceding example also has the implementation of the "Expect"
operational semantic. In the example, the something and body are
left intentionally vague, because they are not important, for the moment.
Now, if we wanted to instantiate our scope and use the
"Expect" operational semantic, we might do something like the
following:
public class MyMockingScope: MockingContext{
void
MyTest(){
...
Expect()...;
...
}
}
If we wanted to achieve a chaining of symbols, such as the
following:
Expect.Once.On(aMock).Method(...).With(...).Will(Return.Value(...));
from the NMock2 API, we could use properties instead of methods
to eliminate unwanted parentheses, and then use a chaining of compatible
properties and return values to chain the symbols.
public class MockingContext{
protected
MockingContext Expect{get{body}}
protected
MockingContext Once{get{body}}
protected
MockingContext On(Mock mock){body}
protected
MockingContext Method(...){body}
protected
MockingContext With(...){body}
}
The preceding method definition would permit the same chaining of
symbols as in NMock2. It would also permit a lot of unwanted chains, such as
the following:
Once.Expect.With(...)
The sequencing of the symbols (of the grammar) can be achieved by
designing the types of the properties and the return values of the methods, to
create constraints.
public abstract class ExpectContext{
protected
abstract OnceContext Expect{get;}
}
public interface OnceExpression{
OnceContext
Once{get;}
}
public interface OnExpression{
WithExpression
On(Mock mock);
}
public interface MethodExpression{
WithExpression
Method();
}
public interface WithExpression{
void
With{get;}
}
Our new API achieves two things: (1) It implements a basic
grammar for the symbols by designing the return types of the properties, and
(2) it eliminates all of the implementation of the operational semantics by
using only interfaces and abstract classes. In our first design, the Expect
method had an implementation, so that the framework combined both modeling and
simulation (implementation) semantics. Because our new design does not have
implementation, it achieves perfect separation of concerns between modeling and
simulation aspects.
Implementing a Simulation Framework
Now that we have defined the rules of the game for our modeling
language and we have a simple modeling language with which to work, the
question becomes: "How do we put the implementation aspects back in?"
There exist many modern software-design approaches and technologies that can
come to our aid, such as inversion of control and reflection.
Class Instantiation Problem
The first implementation challenge is how to instantiate the
implementation of the variable declaration that might be present in the modeled
solution. The problem of instantiating the implementation of a variable of a
given type is basically the problem of class instantiation. The software-design
pattern that is called Inversion of Control (IoC) is a perfect fit for the
problem.
IoC is a design pattern that enables the decoupling between types.
A type instance that is designed according to IoC does not instantiate objects
that fulfill its dependency needs. Instead, it delegates the instantiation
responsibility to an execution environment, and consumes the dependency through
an interface contract that the dependency instance implements. The execution
environment—through the use of a defined dependency-need declaration between it
and the requesting type or type instance—locates an implementation for each
required dependency and instantiates it. After the implementation has been
instantiated, the environment gives the requester access to it through another
defined convention.
Figure 4 represents a typical UML diagram that depicts the
players in an IoC scenario.
.gif)
Figure 4. Inversion of Control
Inversion of Control with Reflection
The mainstream techniques that are used to implement IoC—such as
construction injection, setter injection, and so forth—are usually satisfactory
in the concept of business application, but they are not sufficiently
transparent enough to be used in the context of a modeling framework. It would
be necessary to "pollute" the modeling framework with IoC implementation
mechanisms that have nothing to do with modeling with the DSL. An example of
this "pollution" would be the use of superfluous setters when using
setter injection; the setters would add value only to the IoC implementation.
It is possible to use model analysis through reflection in order to act as a
container. By using reflection to traverse the declaration hierarchy of a model
recursively, it is possible to detect variable declaration, and then set those
variables with the necessary implementations.
The .NET Framework 2.0 offers very flexible reflective
capabilities when manipulating generic types—capabilities that are not present
in Java and other versions of .NET. Because .NET generics are resolved at run
time, through the use of reflection, it is possible to determine whether an
object is an instance of a generic type, as well as the bound types of the
generic instance. It is possible also to bind a generic type dynamically and
create an instance of that binding. These capabilities enhance the ease of
creating modeling and simulation frameworks.
Replacing Constructors
By not allowing a programmer to instantiate an instance variable,
the problem of variable initialization arises. This can be solved simply by
defining an abstract method in a base modeling class that will serve as an
initialization method. The programmer will have to inherit from the base class
an implementation of this method. Within the method, the programmer will be
able to set the properties of its instance variables.
1. public abstact class Base{
2. public abstract void Init();
3. }
4.
5. public abstract aClass : Base{
6. AType instance;
7.
8. public override void Init(){
9. instance.AProperty = aValue;
10. }
11. }
The following section will address the problem of instantiating
the instance variable, so I won't discuss it here. However, for the preceding
approach to work, it is necessary for the instantiation mechanism to
instantiate all variables, before calling recursively in a top-down manner the
Init methods of the declaration hierarchy.
Abstract-Method Implementation Problem
The problem of implementing an abstract method or class without
the user being aware is a problem that arises often in the context of
distributed applications. In traditional distributed applications, a designer
uses an object that impersonates a remote object. The responsibility of the
impersonating object is to offer a simple interface to the user and marshal the
calls to the remote object. The same basic technique can be used for an
abstract-method implementation problem. The technique is based on the Proxy
design pattern.
Combining IoC and Proxy Design Pattern
To solve the problem, we can use a combination of the IoC and the
Proxy design pattern. By creating a class that inherits from a class that
contains abstract methods, it is possible to implement the methods and assign
variables of the original class type with instances of our proxy, because of
polymorphism. The same approach is often used in aspect-oriented containers to
intercept method calls and inject aspects.
We can achieve the creation of the proxy classes and instances by
using the DynamicProxy.NET Framework that is distributed by the Castle Project.
The framework supports the creation of proxy types for generic types. The
DynamicProxy.NET Framework works by using reflection to analyze an abstract
class or interface that is to be "proxied." The framework then emits
the necessary IL code in order to create a proxy. All of the methods of the
proxy are implemented by delegating the calls to an interceptor, which is
provided by the code that requests the proxy. The implementation of the
interceptor mechanism that is used by the framework resembles the one that is
used by the CLR to intercept calls on context-bound objects.
Putting Everything Together
By using the techniques that we have just discussed, we can
create an implementation for our NMock3 language by creating a Mocking engine
class that is capable of:
· Taking
a subclass of ExpectContext.
· Creating a
dynamic subclass of the original subclass by using a DynamicProxy.NET-style
tool.
· Intercepting
the method calls to Expect, to implement the necessary operational semantics.
· Implementing
all of the interface definitions of the API.
Our engine is basically an interpreter for the language; instead
of consuming source code, it consumes the compile form of our program (or
model) through the type class of our program—our subclass of ExpectContext. To
create an executable representation of our model that is created with our
NMock3 language, we must create a third program that instantiates our engine
with the type class of our program—our subclass of ExpectContext :
MyMockingScope.
Benefits of the Approach
The modeling/simulation framework approach has many subtle but
very important benefits.
Possible Interception for Verification
The utilization of design patterns such as IoC and Proxy enables
a simulation framework to create chains of interceptors that can monitor
different aspects of a model under simulation, without having to add the monitoring
elements in the model itself. The combination of a flexible framework design
and the reflective capabilities of the .NET Framework offer many possibilities
for the creation of simulation framework enhancements.
Transparency to Alternative Simulation Implantation
Because all models that are created with the modeling framework
are separated from the implementation of the simulation framework, it becomes
possible to use a model with different simulators in order to take advantage of
alternative implementations. An alternative implementation scenario could be as
simple as a new version of the same simulation framework. It would be nice to
be able to take compiled NMock2 programs and use them with another, better
implementation of the NMock2 frameworks without recompiling. This might sound
fairly trivial, because of the versioning support of the .NET Framework; but
other platforms of language such as Java and Ruby do not have this support.
Conclusion
So, what was this article about? My first objective was to give a
simple but clear definition of the concept of domain-specific languages, and
then demonstrate that we have been using them for a long time. My second
objective was to propose a new perspective on the relationship between programming
with a domain-specific language and modeling. My third objective was to make
the case for the separation of modeling and execution aspects in current
internal domain-specification languages based on object-oriented frameworks. My
final objective was to demonstrate an approach to the design and implementation
of modeling and execution frameworks based on separation of concerns.
Acknowledgements
I would like to thank Christian Dubé and Gia-Hao Banh of PSP
Investments for their invaluable help and feedback on the material in this
article.
References
Fowler, Martin. "Language Workbenches:
The Killer-App for Domain Specific Languages?"
MartinFowler.com, June, 2005.
Fowler, Martin. "Inversion of Control
Containers and the Dependency Injection Pattern."
MartinFowler.com, January, 2004.
Freeman, Steve, and Nat Pryce. "Evolving an Embedded
Domain-Specific Language in Java." International Conference on
Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA),
Portland, Oregon (U.S.), October 22-26, 2006.
Gamma, Erich, Richard Helm, Ralph Johnson, and John Vlissides. Design
Patterns: Elements of Reusable Object-Oriented Software. Reading, MA:
Addison-Wesley, 1995.
Lapalme, James, El Mostapha Aboulhamid, Gabriela Nicolescu, and
F. Rousseau. "Separating Modeling and Simulation Aspects in
Hardware/Software System Design." In Proceedings of the 18th IEEE
International Conference on Microelectronics (ICM '06), Dhahran, Saudi
Arabia, December 16-19, 2006. (6 pages)
Wikipedia, the Free Encyclopedia
About the author
James Lapalme currently works as a solution (Microsoft
technologies) and enterprise-architecture consultant for CGI. He possesses a
recognized expertise in the fields of system modeling and design. James has
published a number of articles in international IEEE/ACM peer-reviewed
conferences, and has been invited to present his research during international
conferences. He has extensive knowledge of OO technologies and software
engineering. Currently, James
is a PhD candidate at l'Université de Montréal.