The material on this page is out of date. For the latest information on SQL Server Modeling-related technologies, i.e. “M”, “Quadrant”, SQL Server Modeling Services and the Repository, read this update.
Modeling in Text
(Edited) Transcript of
Five-Part Video Series by David Langworthy
Editor’s note: in the video series, David Langworthy intentionally talks through every keystroke in his presentations. For brevity, then, this transcript doesn’t follow the video word for word, but generally condenses those coding segments into something more appropriate for reading.
This transcript has been updated for the November 2009 SQL Server Modeling CTP; an update of the videos themselves will be coming soon.
Also note that "M" and "Intellipad" in this document refer to Microsoft code name "M" and Microsoft code name "Intellipad", both part of the SQL Server Modeling CTP technologies.
Hello, my name is Dave Langworthy. I’m an engineer on the SQL Server Modeling team and this is Modeling in Text.
Part 1: Modeling a Language
We’ll start by running "Intellipad" with a new untitled buffer in which we enter a statement defined in some domain:
Now we’re going to model this statement, switching to "three-pane" mode with Ctrl+Shift+T. [To do this, you’ll need a blank grammar file called contacts.mg. This can be easily created in Notepad or "Intellipad" by saving a blank file/buffer as contacts.mg.] When prompted for a grammar file, select the empty contacts.mg.
In three-pane mode we have input statements on the left, the model in the middle, and the output on the right. In the middle pane, all models start with a module statement, and we’re going to model a language called Contacts:
A language consists of a collection of rules for recognizing syntax, and it always begins with the distinguished syntax rule called Main:
So now I have a statement, a model that recognizes that statement, and some data in the output pane that is extracted from that statement:
This works, but it is very brittle: if we change anything in the statement (like "Chris" to "Pat") we get tons of errors because the model I’ve created recognizes only one exact statement.
So we want to parameterize the model to accept many statements of this form (different names and different ages). First we want to break the statement up into different pieces called tokens [the technical term for this is "tokenizing"—a token is something that’s indivisible, that is, can’t be broken into smaller parts]:
The data (in the output pane) gets broken apart, but we have errors showing in the input pane because of whitespace. [And note that just by creating a model of the language we automatically get a syntax-directed editor in the input pane as you’d expect in a modern integrated development environment.] What we need is an interleave rule that just says to ignore the whitespace:
With that we should get our data back, now broken up into little pieces. We’re still brittle, though, as changing Chris to Pat produces errors. So let’s parameterize to accept any name and any age, creating token rules for them:
Age is just one or more numbers [".." is the range operator meaning "through" and "+" means one or more; zero or more is "*"]. Name is much the same, one or more ("+") of lowercase ‘a’ through ‘z’ or (|) uppercase ‘A’ through ‘Z’. [Obviously this is not robust for names that use extended characters, but it shows the idea.]
Now we can go and use any name and any age in the input pane, such as "Pat is 23 years old" and it flows to the output.
We have a model now that’s somewhat flexible but if we input multiple statements we don’t get any data out and we have some errors:
The reason (as shown in the error pane) is we’re not ignoring other kinds of whitespace, like carriage returns and line feeds, so we need to add those to the Whitespace interleave:
The remaining error is because the model is written to only accept one statement. So I’m going to create another rule to factor the Main syntax rule into two, one called People so that Main is then a collection (*) of People:
Now the model recognizes all of those statements and produces all the appropriate data in the output pane:
Now much of the output data is rather noisy, and we want to get rid of the extra repeated information (leaving just the name and the age). We’ll do that with a projector (=>):
where n and a are variables that we bind to the tokens as with n:Name and a:Age. This gives us a cleaner set of data that looks a lot more like records [entities or name-value pairs]:
One more thing now, the collection has a somewhat arbitrary name, so I’m going to name that People, where valuesof is simply how we reference a collection:
So we have our statements in a domain, a model of that domain, and standardized data output that’s "M" instance data with which we can do all sort of things. And that’s what we’ll do next after we save the input as contacts.dsl and save contacts.mg.
[Here also is the completed contacts.mg:]
Part 2: Data
In Part 1 we modeled a domain-specific language. Now we’ll go to the command line tools and work with that data. We have two files at this point, contacts.mg (the grammar) and contacts.dsl (the language input code). Note that title.txt is the title slide for the beginning of this video.
The first thing we need to do is compile the language with the m.exe compiler from a standard command shell:
This will produce an image file (contacts.mx) that we can then use with the image utility, mgx.exe to translate that .dsl file into different forms that we might be interested in:
This compiles the input file (contacts.dsl) into XAML, referencing the compiled image (contacts.mx). [The XAML is omitted here for brevity.]
We can also send this same data to SQL via "M", compiling it again using the defaults.
This produces a contacts.m file:
Here we see an "M" program that’s been generated from the DSL. It has a module, Contacts, and People, which is an extent in which are records with the names and ages. We can send this onto SQL using the "M" compiler:
This produces another image file (contacts.mx) that we can then load into a new database using the mx.exe tool: [Note: to run this command with SQL Server Express add "/s:.\SQLEXPRESS" at the end of the line.]
where /c instructs mx.exe to create a database using the name specified with /d. Switching over to SQL Management Studio (or any tool of your choice) we can see what’s there; we see our table under Databases\mayctp\Tables\Contacts.People (along with some other catalog tables that we’ll discuss later).
Selecting the Edit Top 200 Rows menu item on this table we see all our data in SQL now, which is great! We can then write a little query over that to extract the names:
[In code name "Quadrant", you can open the NovCTP database using the File | New | Session command, then select File | New | Workpad and just enter "contacts.People" in the query bar for the same results.]
Remember here that all we typed in "Intellipad" was the contacts.dsl file and the small contacts.mg grammar: just the domain statements and the model. We then simply utilized the command line tools to bring it all into SQL where we could write transformations. Turns out we can also write these transformations in "M" itself, which we’ll do in Part 3.
Part 3: Transformation
In Parts 1 and 2 we modeled a language, got some data out (using an input file in that language), and sent it off to SQL. Now we’ll write some transformations in "M" itself.
[After closing all other open files in "Intellipad"] I’ll first open contacts.m, which we generated in Part 2, and I’m going to write the same query in we saw in SQL directly in M, using LINQ-style syntax [adding it just before the last curly brace]:
[In a real project, by the way, you’d usually have a project file with the language in another file and the transformations in another, constraints in another. But for this demonstration we’ll just have the transformation directly in this one file.]
If we were to take this and run it all the way through the toolchain, it would show up in SQL as a view. Fortunately we don’t have to do that. In "Intellipad" we can go to the "M Mode" menu and select "T-SQL Preview", and that opens a pane showing all the SQL that the module would generate.
[Note: in the May 09 CTP (and the video), there are multiple insert into clauses each with a single value; in the November 09 CTP the generated T-SQL contains a single insert into clause with multiple values as shown here. The create schema clause also appears differently.]
Let’s take a second to see how things map from "M" to SQL. The module name Contacts projects as the schema name, the extent name People projects as a table name, and each field in an entity projects as a column in the table, as you would expect. Down below you see the insert statements for each value in the "M" file, and we see that the "computed value" (the transformation or function) becomes a view with exactly the SQL you’d expect.
Now "M" support a more compact form for common queries, so we can write the transformation like this:
Where value(called "infix select") ranges over each element of the collection and produces almost exactly the same SQL as before. [After the select statement you can also have any complex expression you want.] [Note: in the Nov 09 CTP the select statement in the view changes from select [p].[Name] as [Item] to select [$value].[Name] as [Item] .]
Functions can also take parameters:
This becomes a table value function in the generated SQL: [there are slight differences here between the May 09 CTP and the Nov 09 CTP, but they are cosmetic.]
Again we have a compact form for this:
and the function is the same as before with just a slight variance in variable names: [with slight cosmetic differences between May and Nov CTP's]
And we can go even further with this: for these common cases, value can be put directly on the collection:
which will pull out the values from People where Name is equal to the value of the parameter n. We can do the same with extracting the whole column:
This results in the same view as before just again with slightly different variable names. In these compact forms, the final SQL is as follows: [again with slight cosmetic differences between May and Nov CTP's]
This is the core of transformations, and in Part 4 we’ll talk about constraints.
Part 4: Constraints
Now let’s look at constraints. The data we have so is just data in the wild, coming to us in whatever form it has (and we’ve transformed and generated SQL without question). But there are many reasons why we might constrain the data, perhaps to have a more efficient storage representation or we might constrain it just to say that we expect all the information to be in a certain form.
For example, right now our model just loves all data, so we can have:
It’s all data, it’s fine, we love it, but Age shows up in SQL as a sql_variant type which is inefficient and can be difficult to work with. So let’s put a constraint on it that says the Name field has to be text and the Age field an integer. At the top of the module, then, we add this: [The second set of curly braces between the first colon and the last semicolon are needed in the Nov 09 CTP but not the May 09 CTP]
Where the * means that People is zero of more of the Name/Age value pairs. With just this we’ll have some errors because of the quote marks around Age values—the data does not "fit" that constraint. Fixing those will produce the SQL again. And now instead of
and the insert statements have all been updated appropriately. And that’s a simple constraint.
[For reference, here is the full contacts.m at this point; the Nov 09 CTP version]
Part 5: Identity and Relationships
So far we’ve been dealing with isolated facts, all of which is nice, but what’s really interesting about data are the relationships between that data.
In order to refer to a piece of data we need an identity, so we’ll add an identity field to People with a constraint: [second set of curly braces added for Nov 09 CTP]
Note that the where clause can be used in constraints as in queries, and AutoNumber() creates an auto-increment value as you often use.
[In SQL we see this projecting as a primary key in the last line of the create table statement:]
We can now create a relationship for these to participate in. We’ll call this relationship Marriages: [extra curly braces needed for Nov 09 CTP]
and out of this we get a table with the fields of the identity type and foreign key constraints because the SpouseA and SpouseBfields must come from the People extent.
(Notice that I used an extent name as a type which says both what needs to go in there and also where it is.)
With that we can start putting values into Marriages, and I’d like to say that Pat is married to Chris. In "M" we can label fields:
and so on, so we can then type this:
The compiler will figure out the order of the SQL insert statements, by the way, so despite the fact that we’ve initialized the Marriages extent before People, the insertions into People occur in the generated SQL before this insert into Marriages: [exact T-SQL differs from May to Nov CTP; Nov CTP version shown here]
Now I want to do one more thing, which is to use more sophisticated constraints to ensure that the people participating in a marriage are not married to themselves:
Note the use of parentheses to make the * bind to the entire expression. In the SQL we get a function that checks the constraint for us and adds that check constraint to the table.
We could, of course, put other constraints on the relationship as well, but for now, that's the end of our overview through Modeling in Text.