Tokens: Handling Variable Fields (MGrammar)

Article
09/22/2010

[This content is no longer valid. For the latest information on "M", "Quadrant", SQL Server Modeling Services, and the Repository, see the Model Citizen blog.]

This tutorial builds on the preceding ones, Hello World (MGrammar), and Handling Spaces (MGrammar). In this tutorial, you tokenize the input: you use the token keyword to identify fixed parts of the input text (keywords or field names), and to specify required conditions imposed on variable parts of the text (field values). You also learn how to make the domain specific language (DSL) accept more than a single line of input text.

This tutorial builds a DSL that parses a text file that consists of object model information that can be generated by using the .NET Framework Reflection API against an assembly. The text file obeys the following rules:

Each line starts with the string TYPE.
The line contains the string Name= followed by the name of the type.
After the Name field, there is a string Access= followed by one of the following strings: public, private, protected, internal.
The Access field is followed by a string Email= followed by an e-mail address.

A typical file might look like this.

TYPE Name=System.String Access=public Email=janedoe@contoso.com 
TYPE Name=System.Integer32 Access=private Email=bbrown@contoso.com 
TYPE Name=System.Byte Access=public Email=johndoe@contoso.com 
TYPE Name=System.Boolean Access=public Email=janedoe@contoso.com

You will learn to do the following in this tutorial:

Add token rules that enable you to identify the keywords in the input.
Add token rules that define the allowable characters for variable fields.
Modify the grammar rules so that the DSL parses multiple input lines.

To show a DSL that recognizes one line of input

The following DSL was created in the preceding tutorial (Handling Spaces (MGrammar). It parses a single line of input, and allows an arbitrary number of spaces between the fields. Open Intellipad and add the following code.

module Types 
{
   language Parser
   {
        syntax Main = Type  Name  Access  Email;
        syntax Type = "TYPE";
        syntax Name = "Name=System.String";
        syntax Access = "Access=public";
        syntax Email = "Email=janedoe@contoso.com";        
        syntax Space = "\u0020";
        interleave Whitespace = Space;
   } 
}

To test this language, enter the following line into the left "DSL Input Mode" pane.
```
TYPE Name=System.String Access=public Email=janedoe@contoso.com 
```
Change the input (in the left-most pane) by changing the type name to System.Integer32. Note that errors are generated. Press CTRL+Z to restore the previous valid input.

To handle different field values

First we break the Name rule apart, into a token rule that identifies the "Name=" string, and another token rule that specifies what a name should look like. We add the first token rule with the following code.
```
        token NameLit = "Name=";
```
Next, define what a Type name must look like by defining what characters are allowed. There must be alphabetic and numeric characters, and for types you must also use the "." character. The following rule says that a character consists of any of those.
```
        token Char = 
                    "A".."Z" 
                    | "a".."z" 
                    | "0".."9" 
                    | ".";
```
Note the use of the "|" character to specify "or", and the use of the range operator "..". We can specify a string of these characters by using the "+" operator in the following code.
```
        token chs = Char+;
```
The chs rule recognizes a string of one or more of the characters allowed by the Char rule.
Next, specify that the value of a name must conform to the chs rule, with the following code.
```
        syntax NameValue = chs;
```
Finally, change the Name rule to the following.
```
        syntax Name = NameLit NameValue;
```
Note that you can now change the value of the Name field without generating errors.
Now apply this procedure to the Access field. However, instead of allowing the Access field to be as unconstrained as the Name field, restrict it to being one of a set of values. The result is the following code.
```
        syntax Access = AccessLit AccessValue;
        token AccessLit = "Access=";
        token AccessValue = "public" | "private" | "internal" | "protected";
```

Finally, do the same thing to the Email field, resulting in the following code.

        syntax Email = EmailLit EmailValue;
        token EmailLit = "Email=";
        syntax EmailValue = chs;

Note that this code generates errors because the character "@" is not allowed. That character does not appear in type names, so replace the EmailValue rule in the preceding fragment with the following code. Note that this is not a general parser for e-mail addresses, which can be considerably more complex.
```
token echs = chs "@" chs;
syntax EmailValue = echs;
```

To handle multiple lines of input

Replace the input in the left pane with the following code. Note the errors that are generated: the parser does not recognize the "return" character ("\r").
```
TYPE Name=System.String Access=public Email=janedoe@contoso.com 
TYPE Name=System.Integer32 Access=private Email=bbrown@contoso.com 
```
Now change the interleave statement to handle returns, and also line feeds ("\l"), with the following code, which replaces the existing interleave statement.
```
        token LF = "\u000A";
        token CR = "\u000D";

        interleave Whitespace = Space | LF | CR;
```
Now the error panel says that the text "TYPE" is unexpected in the 2nd line of input text. This is because the Main rule defines a single type, whereas you really want to specify a collection of one or more types. Replace the Main and Type rules with the following code.
```
        syntax Main = Types;
        syntax Types =
                    Type
                    | Types Type;
        syntax Type = TypeLit Name Access Email;
        token TypeLit = "TYPE";
```
Note the Types rule: this is a common grammar usage for specifying one or more of something.

Example

The following is the complete “M” code used in this tutorial.

module Types 
{
   language Parser
   {
        syntax Main = Types;
        syntax Types =
                    Type
                    | Types Type;
        syntax Type     = TypeLit Name Access Email;
        token TypeLit   = "TYPE";
        syntax Name     = NameLit NameValue;
        token NameLit   = "Name=";
        syntax NameValue = chs;       
        syntax Access   = AccessLit AccessValue;
        token AccessLit = "Access=";
        token AccessValue = 
                            "public" 
                          | "private" 
                          | "internal" 
                          | "protected";
        syntax Email      = EmailLit EmailValue;
        token EmailLit    = "Email=";
        syntax EmailValue = echs;

        token Char  = 
                    "A".."Z" 
                    | "a".."z" 
                    | "0".."9" 
                    | ".";
        token chs   = Char+; 
        token echs  = chs "@" chs;
              
        token LF                = "\u000A";
        token CR                = "\u000D";
        token Space             = "\u0020";
        interleave Whitespace   = Space | LF | CR;       
   } 
}

Tokens: Handling Variable Fields (MGrammar)

To show a DSL that recognizes one line of input

To handle different field values

To handle multiple lines of input

Example

See Also

Other Resources

Additional resources