Textual Domain Specific Languages for Developers - Part 1

[This content is no longer valid. For the latest information on "M", "Quadrant", SQL Server Modeling Services, and the Repository, see the Model Citizen blog.]**

 

Shawn Wildermuth
Agilitrain

Last Updated: February 2010 (for 2009 PDC CTP)

Developing software is hard. Actually, that is not entirely correct.  Developing software that solves real-world problems is hard. The problem is not necessarily technical. Early in my career I talked with a lot of people outside of the IT world who thought that software development was a very solitary existence. In fact, I was often pegged as an introvert and loner because of my chosen profession. For anyone who has spent much time in software development it should be clear that this job entails an extraordinary amount of inter-personal communication. It may seem obvious that language is important because we need to be able to communicate our ideas to our co-workers and perhaps to our bosses but it's not nearly that simple. Being able to communicate with non-technical people is even more important because it is our job to often create software to solve non-technical problems. It is our job to program, or 'code', their intent into magical series' of zeros and ones that make a system work.

One of the most important parts of communicating in software development is requirements gathering. In fact, the term "requirements gathering" is part of the problem. Information Technology has its own language. If you don't believe me, ask your significant other.  They know that you have your own language that they likely don't understand entirely. All professions and businesses have their own languages as well. Each of these areas of expertise is called a domain. To be successful, we must be able to communicate using the conversational vocabulary of a business; or to be able to communicate in their domain specific language.

The Problem of Intent

In most organizations, technology is used to help improve the way that organization works. Whether it is development of a website to better market a company's products or creation of enterprise software that enables complex business processes to happen, the ultimate goal is to improve the way business happens. That's where we come in. Our job as developers is to use technology to solve problems.

Today this happens by talking to the organization personnel, or stakeholders, that have a problem to solve. As the stakeholders discuss the problem they are trying to solve, it is our job as developers to understand their intent. Once we believe we understand their intent, we go and document what we heard from the stakeholders. The level of this documentation is very different depending on the process your organization uses, but in all cases there is a step of writing down what you thought you heard the stakeholders say. This documentation is then presented to the stakeholders to ensure that it describes their intent.  If it matches their intent, the documentation is handed to developers to code their intent. This process usually involves a lot of iteration to both ensure that the documentation reflects the intent and that the code reflects the documentation.

This middle step of documentation of the intent is at the heart of the problem. Much like the typical grapevine story, every hand the stakeholder's intent passes through introduces the possibility of confusion or obscuring of the original intent. An example of this workflow is shown in Figure 1.

Figure 1: Common Intent Workflow

This workflow does not work as well as it should. The real intent is dependent on the Stake Holder and the developer understanding the documentation (and its ambiguities) in the same way.

For example, imagine a company holds public technology classes. When talking with the sales department, it was determined that they need to price the class based on three pieces of data: a default price per class, a coupon-based price and company-based contracted prices. If a coupon is presented, that price should used, otherwise a company contract price should be searched for.  The lowest price should be used if more than one price is determined. A developer writes up a small bit of code to find the price for a class like so:

public static decimal CalculateEventPrice(int workshopId,

                                          string couponCode,

                                          string companyName)

{

  StudentEntities ctx = new StudentEntities();

  var wsQry = from workshop in ctx.Workshop

              where workshop.WorkshopId == workshopId

              select workshop.DefaultPrice;

  decimal defaultPrice = wsQry.FirstOrDefault();

  decimal couponPrice = decimal.MaxValue;

  decimal contractPrice = decimal.MaxValue;

  // Determine if Coupon is used

  if (string.IsNullOrEmpty(couponCode) != true)

  {

    var qry = from coup in ctx.Coupon

              where coup.Workshop.WorkshopId == workshopId &&

                    coup.Code.ToLower() == couponCode.ToLower() &&

                    coup.Expiration >= DateTime.Today

              orderby coup.Expiration

              select coup;

    // Retrieve the first coupon that qualifies

    Coupon discountCoupon = qry.FirstOrDefault();

    // Return the coupon price if it is found

    if (discountCoupon != null)

    {

      couponPrice = discountCoupon.Price;

    }

  }

  // Determine if Company Price applies

  if (string.IsNullOrEmpty(companyName) != true)

  {

    var qry = from contract in ctx.CompanyContract

              where contract.Workshop.WorkshopId == workshopId &&

                    contract.Company.CompanyName.ToLower() == companyName.ToLower() &&

                    contract.Expiration >= DateTime.Today

              orderby contract.Expiration

              select contract;

    // Try and find the first one that qualifies

    CompanyContract validContract = qry.FirstOrDefault();

    // If so, return that price

    if (validContract != null)

    {

      contractPrice = validContract.Price;

    }

  }

  // If none of these work, then return the default price

  return Math.Min(defaultPrice, Math.Min(couponPrice, contractPrice));

}

For the stakeholder, reading the code is rarely the pass of least resistance. While this code uses data from an Entity Framework model and LINQ to load the prices from the database and then calculates the price, it is not easy for stakeholders to make sense of whether this code fulfills their intent. For a developer this code is fairly clear to understand but it is still code. The developer needs to understand the domain as well as the code to make sure it fulfills the intent of the stakeholders. The developer could take these business rules and move them out of code. Maybe that would be easier for the stakeholders to understand:

<?xml version="1.0" encoding="utf-8" ?>

<PriceCalculator xmlns="https://litwareinc.com/PriceCalculator" >

  <LowestWins>

    <CheckCoupon>

      <Properties>

        <Property Name="workshopId"

                  Type="System.Int32" />

        <Property Name="couponCode"

                  Type="System.String" />

      </Properties>

    </CheckCoupon>

    <CheckCompany>

      <Properties>

        <Property Name="workshopId"

                  Type="System.Int32" />

        <Property Name="companyName"

                  Type="System.String" />

      </Properties>

    </CheckCompany>

    <UseDefault />

  </LowestWins>

</PriceCalculator>

This should allow the developer to write code to perform the business rules but this artifact is still not clear enough for the stakeholders to understand. XML is not clear enough for non-technical people to understand. If the XML is too terse for the non-technical people, perhaps a graphical version of these rules would work? So the developer refactors again to change this XML file into a Windows Workflow (WF) activity as shown in Figure 2.

Figure 2: Business Rules Activity

This is better, but it is still not that clear to the non-technical folks. The Workflow’s visual representation uses its own visual vocabulary to define how the process flow happens. In addition, the Workflowoften has some or part of its implementation hidden from this view (e.g. in code).

All of these solutions are technical in nature. They are still hard for non-technical stakeholders to grasp whether their intent is described in these artifacts. At the end of the day it is still the responsibility of someone to confirm that the documentation of the intent of the stake holder matches the implementation. We need a solution that bridges this gap between intent and runtime information. That is where Domain Specific Languages (or DSLs) come in.

Domain Specific Languages

What is a Domain Specific Language (DSL)? In a broad sense, a domain specific language is simply a way to represent information for an area of knowledge. For example music notation is a domain specific language for representing what the notes, tempo and even lyrics should be used by musicians. The vocabulary of music notation does not try to be general purpose; it's finely grained for the people who need it. For example, Figure 3 is an example of the opening notes of Für Elise by Beethoven.

Figure 3: Music Notation DSL

In case you're not familiar with musical notation, there is a lot of information conveyed in just the first two bars of the music notation shown in Figure 3:

1.      Speed

2.      Clef (e.g. What notes the five lines represent, different clefs mean different areas of the sound range)

3.      Key of song (e.g. Key of C, which means no sharps or flats in this particular example)

4.      Beats per measure (e.g. Time Signature)

5.      Pitch and length of tones

6.      Volume

7.      Volume Enhancements (e.g. increase then a decrease in volume)

In contrast there are general purpose languages (e.g. English or even C# and VB) that can be used to describe a wide variety of subjects. If you used a general purpose language to convey just the information contained in Figure 3, you might find it difficult to explain all the information in English.  This is especially true if you want to explain the music in a way that musicians can consume easily. That's where a domain specific language can replace the general purpose language and be much more concise and consumable. While a graphical domain specific language can be of great use in some scenarios for developers, the more important type of DSL's are Textual Domain Specific Languages. Textual DSLs are important to development is that it provides a bridge between the textual nature of code and the ability to be comprehended by non-technical stakeholders.  In addition, Textual DSLs are much simpler to define than creating a visual representation of the same data. A graphical DSL (like the music notation) is very useful, but creating the notation takes time and skills.  Creating the same notation that is consise, clear and machine-readable is much more straightforward than inventing a graphical version of the same data.

Back in 1986, Jon Bentley introduced a concept of "Little Languages". These are small textual languages that are used as input into software. These languages should be simple enough to type by hand by ordinary people. These "Little Languages" are textual domain specific languages. In other words, they are ways of communicating information into software (as input) and should be able to be created or at least understood by non-technical people.

A Brief Example

Let's continue our Litware, Inc. example. The business rules can be defined as a domain specific language. Our definition of the above language might look something like this:

Start Rule (Smallest Wins)

  Get Coupon Price

  Get Company Price

  Get Workshop DefaultPrice

End Rule

This simple language tells the story of what the actual intent is. This is something that could be handed to the stake holder to confirm the intent. That's the essence of what domain specific languages can do for developers. It allows developers to have human readable metadata. For us to use this DSL, we need to define what the structure and makeup of the language is. This structure and makeup (called grammar) is the hard part of the equation. This is because if our DSL is not structured and well defined, then it makes it much harder to consume as input for software. This grammar is defined but does not need to be complex or difficult. Defining grammar for general purpose languages is notoriously complex but for domain specific languages it does not need to be that hard. In fact, the grammar of your language should be simple.  If it is not simple, then how is it going to be "simple enough to type by hand by ordinary people"?

So our example really defines some simple grammar.  It starts with "Start Rule" and ends with "End Rule".  It requires that the "Start Rule" specifies how the rule determines the winner ("Smallest Wins") and then contains a list of instructions for finding a price. Each of these instructions is simple a "Get" phrase with a source and a price type. Writing code that parses the file to follow these instructions is not as easy as it should be.  That is where the Oslo toolset comes in.

How Oslo Empowers Textual DSLs

Textual DSL's are a part of the Oslo strategy. In fact much of the tooling that comes with the Oslo SDK is specifically meant to be used to work with DSL's.  These tools include;

  • M: A set of languages for defining DSLs and structured data.
  • Intellipad: An editor for developing MGrammar (and other M languages).
  • Tools: Command-line tools for compiling and parsing DSLs into structured data.
  • API: Managed API's for consuming grammars and data.

The "M" languages are a great place to start. To create a DSL there are two parts to the equation, creating a grammar for a DSL and being able to parse documents. The M language for creating grammars is called "MGrammar".  This language allows you to set up semantics of a language and determine how parsing will create a structured set of data based on input. For example, the MGrammar for our little DSL was shown above is:

// mprice.mg

module Training.Events

{

  language PricingLanguage

  {

    // Syntax for Rules

    syntax Main  = Rule "(" WinnerRule ")"  Operations EndRule;

    syntax Operation = Get Source Noun;

    syntax Operations = OperationListLineList(Operation);

    syntax WinnerRule = SmallestWins | FirstWins;

    syntax OperationList =

             o:Operation => { o }

             |

             list:OperationList o:Operation => { valuesof(list), o };

    syntax Noun = Price | DefaultPrice;

    // Keywords

    @{Classification["Keyword"]}

    token Rule = "Start Rule";

    @{Classification["Keyword"]}

    token EndRule = "End Rule";

    @{Classification["Keyword"]}

    token Get = "Get";

    // Literals

    @{Classification["Literal"]}

    token SmallestWins = "Smallest Wins";

    @{Classification["Literal"]}

    token FirstWins = "First Wins";

    @{Classification["Literal"]}

    token Price = "Price";

    @{Classification["Literal"]}

    token DefaultPrice = "DefaultPrice";

    @{Classification["Literal"]}

    token Source = ('A'..'Z' | 'a'..'z' | '0'..'9')+;

    // Non-Language Elements

    @{Classification["Comment"]}

    token Comment = "//" !("\r" | "\n")+;

    interleave skippable = ' ' | '\r' | '\n' | Comment;

  }

}

In MGrammar you are specifying the structure of your DSL by creating a language.  This MGrammar file is the instructions on how to parse files for your language.  In fact, the editor included with Oslo allows you to interactively write your grammars with their MGrammarMode that allows for a three-pane view as shown in Figure 4.

Figure 4: Intellipad's MGrammar mode (MGMode)

In the first pane (marked with the #1) is the exemplar of our language. The middle (#2) is the MGrammar file and the right pane (#3) is the results of parsing the exemplar with the MGrammar file.  The format of the tree in #3 is actually an M language called MGraph.  MGraph is a way of describing structured data using a Labeled Directed Graph (LDG).  If you are familiar with XML, you can make some comparisons in that MGrammar is XML Schema documents for DSLs and MGraph is the XML Infoset, though the LDG is much more powerful than the simple tree structure of XML.

Once you have built a grammar, you can use the command-line tools to work with your grammar and language files. The m.exe tool compiles grammars into a re-useable form.  This compiled version of the grammar is similar to the concept of an assembly in the CLR.  It is a reusable unit of work that contains securable version of the rules in the parser.

The other tool that is used is the mgx.exe tool.  This tool is used to parse files based on a grammar. This tool will parse a language file against a grammar and output to either MGraph or XAML. The power here is not in the XAML format (or a future XML format) but the fact that the grammar can be used to create formats that are easy to parse and understand. For example, our small exemplar of our language parsed to create XAML (XML Application Markup Language) would look like so:

<?xml version="1.0" encoding="utf-8"?>

<Sequence label="Main">

  <Atom>Start Rule</Atom>

  <Atom>(</Atom>

  <Sequence label="WinnerRule">

    <Atom>Smallest Wins</Atom>

  </Sequence>

  <Atom>)</Atom>

  <Sequence label="Operations">

    <Node>

      <Sequence label="Operation">

        <Atom>Get</Atom>

        <Atom>Coupon</Atom>

        <Sequence label="Noun">

          <Atom>Price</Atom>

        </Sequence>

      </Sequence>

      <Sequence label="Operation">

        <Atom>Get</Atom>

        <Atom>Company</Atom>

        <Sequence label="Noun">

          <Atom>Price</Atom>

        </Sequence>

      </Sequence>

      <Sequence label="Operation">

        <Atom>Get</Atom>

        <Atom>Workshop</Atom>

        <Sequence label="Noun">

          <Atom>DefaultPrice</Atom>

        </Sequence>

      </Sequence>

    </Node>

  </Sequence>

  <Atom>End Rule</Atom>

</Sequence>

This XAML is not the user interfaced based XAML that is used in WPF or Silverlight, or even the XAML that is used in Workflow Foundation. It is just a serialization format for an in-memory representation of an object graph.  In this case, Term's and Atom's (instead of Windows and Controls).

As you look at DSLs in the larger Oslo context, a standard methodology here will be to translate your DSL files to MGraph then store them in the Oslo Repository. Oslo-based DSLs are a natural way to build languages that drive your model-driven applications.

The Oslo toolset is built with the same API's that are being exposed by the Oslo team for use in your own applications. While the toolset seems to imply that parsing DSLs may be a build-time operation, there are situations where reading DSLs at runtime are appropriate. Again, you can think of this much like the XML space where you can parse and interpret the XML at runtime but there may be tools that perform actions at build-time as well.  For example, the DynamicParser is a managed class exposed directly by the Oslo team to allow you to use your DSL at runtime:

Dim mgStream As Stream = ...

Dim results As CompilationResults = _

  Compiler.Compile(New CompilerOptions With {.Sources = _

    { New TextItem With _

      {.Reader = New StreamReader(mgStream), _

       .ContentType = TextItemType.MGrammar} }})

Dim theParser As DynamicParser = _

  results.ParserFactories("Training.Events.PricingLanguage").Create()

theParser.GraphBuilder = New NodeGraphBuilder()

' ...

Dim topNode As Node = _

  CType(Parser.Parse(New StringTextStream(rulesDefinition), Nothing), Node)

Stream mgStream = ...;

CompilationResults results = Compiler.Compile(

  new CompilerOptions

      {

        Sources = {

          new TextItem {

            Reader = new StreamReader(mgStream),

              ContentType = TextItemType.MGrammar }

      }

});

DynamicParser theParser =

           results.ParserFactories["Training.Events.PricingLanguage"].Create();

theParser.GraphBuilder = new NodeGraphBuilder();

// ...

Node topNode = (Node)Parser.Parse(new StringTextStream(rulesDefinition), null);

With the API, you can not only load a parser, but walk the tree of elements at runtime:

' Go through each of the sequences and make sense of the

' rules

For Each node As Node In topNode.ViewOrderedNodes()

  If node.Brand.Text = "WinnerRule" Then

       ' Get the determinator

       Dim determinator As String = CStr(node.ViewAllNodes().Single().AtomicValue)

       Select Case determinator

         Case "First Wins"

              container.WinningRule = PriceRuleContainer.WinningBasis.FirstWins

         Case "Smallest Wins"

              container.WinningRule = PriceRuleContainer.WinningBasis.SmallestWins

       End Select

  ElseIf node.Brand.Text = "Operations" Then

       ' Get the operations (Nesting GetSuccessors to dig into the structure)

       For Each operations As Node In node.ViewAllNodes()

         For Each operation As Node In operations.ViewAllNodes()

              Dim getter As New PriceRuleGetter()

              getter.Source = CStr(operation.ViewAllNodes().ElementAt(1).AtomicValue)

              getter.PriceType = CStr(operation.ViewAllNodes() _

                                              .ElementAt(2) _

                                              .ViewAllNodes()  _

                                              .Single() _

                                              .AtomicValue)

              container.Getters.Add(getter)

         Next operation

       Next operations

  End If

Next node

// Go through each of the sequences and make sense of the

// rules

foreach (Node node in topNode.ViewOrderedNodes())

{

  if (node.Brand.Text == "WinnerRule")

  {

    // Get the determinator

    string determinator = (string)node.ViewAllNodes().Single().AtomicValue;

    switch (determinator)

    {

      case "First Wins":

        container.WinningRule = PriceRuleContainer.WinningBasis.FirstWins;

        break;

      case "Smallest Wins":

        container.WinningRule = PriceRuleContainer.WinningBasis.SmallestWins;

        break;

    }

  }

  else if (node.Brand.Text == "Operations")

  {

    // Get the operations (Nesting GetSuccessors to dig into the structure)

    foreach (Node operations in node.ViewAllNodes())

    {

      foreach (Node operation in operations.ViewAllNodes())

      {

        PriceRuleGetter getter = new PriceRuleGetter();

        getter.Source = (string)operation.ViewAllNodes().ElementAt(1).AtomicValue;

        getter.PriceType = (string)operation.ViewAllNodes()

                                            .ElementAt(2)

                                            .ViewAllNodes()

                                            .Single()

                                            .AtomicValue;

        container.Getters.Add(getter);

      }

    }

  }

}

While this code works, how you use the resulting data that was created by the DSL is completely up to you. You may a parsed tree from other sources like a generated XAML document or the Repository.

The Oslo stack allows you to not only build capable DSLs but also use models built with your DSLs at any point in your applications you need, whether it is at compile-time, build-time, deployment-time or run-time. Learning the full breadth of the DSL stack in Oslo is not a trivial task, but it is time well invested as you start to change the way you think about metadata in your .NET applications.

Where we are…

Encoding the intent of stakeholders is a challenge, no matter the size of the project. Not only collecting and understanding the intent but not losing the ability to certify that the code matches the intent is a laudable goal in today's best applications. One way to make this happen is to use domain specific languages to create the rules and logic in a simple form that the stakeholders can understand and validate, if not edit themselves.

Oslo makes building these DSLs easier, but how do these little languages affect the everyday work of developers? Instead of replacing the work of developers, DSL present the opportunity to work on the interesting part of the problem space, not just the rote code that many developers are forced to hand-code or generate from metadata. Because Oslo empowers developers to use metadata driven from DSLs anywhere in their applications, developers can be more productive and create agile, scalable systems that are a better representation of what the customer actually wants; not what the developer thought the customer wanted.

This article has tried to introduce you to a lot of concepts about the basics of building DSLs in Oslo, but in the next two parts of this series, we will build our own DSL using the Oslo toolset and use the DSL driven model in an application.

Resources

  • Microsoft’s Data Developer Center
  • On Little Languages: Jon Bentley, Little languages, Communications of the ACM, 29(8):711--21, August 1986.
  • My Blog