Microsoft Corporation
May 2009
[This documentation targets the Microsoft "Oslo" May
2009 CTP and is subject to change in future releases. Blank topics are included
as placeholders.]
Sections:
1: Introduction to
"M"
2: Lexical Structure
3: Text Pattern
Expressions
4: Productions
5: Rules
6: Languages
7: Types
8: Computed and
Stored Values
9: Expressions
10: Module
11: Attributes
12: Catalog
13: SQL Mapping
14: Glossary
>>>>>>>1 Introduction
The
"Oslo" Modeling Language ("M") is a language for modeling
domains using text. A domain is any collection of related concepts or objects.
Modeling domain consists of selecting certain characteristics to include in the
model and implicitly excluding others deemed irrelevant. Modeling using text
has some advantages and disadvantages over modeling using other media such as
diagrams or clay. A goal of the M language is to exploit these advantages and
mitigate the disadvantages.
A
key advantage of modeling in text is ease with which both computers and humans
can store and process text. Text is
often the most natural way to represent information for presentation and
editing by people. However, the ability to extract that information for use by
software has been an arcane art practiced only by the most advanced
developers. The language feature of M
enables information to be represented in a textual form that is tuned for both
the problem domain and the target audience. The M language provides simple
constructs for describing the shape of a textual language – that shape includes
the input syntax as well as the structure and contents of the underlying
information. To that end, M acts as both a schema language that can validate
that textual input conforms to a given language as well as a transformation
language that projects textual input into data structures that are amenable to
further processing or storage.
M
builds on 4 basic concepts:
- Language
– a collection of rules that recognize free text an produce a structured
representation of relevant concepts in the text.
- Data
– a sparse textual representation of information amenable to automated
storage, transformation and communication.
- Constraint
– a rule that recognizes specific structure and relationships within data.
- Transformation
– a mapping between source data and result data.
>>>>>1.1 Language
>>1.1.1 Basics
A M
language definition consists of one or more named rules, each of which describe
some part of the language. The following fragment is a simple language
definition:
language HelloLanguage {
syntax Main = "Hello, World";
}
The
language being specified is named HelloLanguage and it is described by one rule named Main. A language may contain
more than one rule; the name Main is used to designate the initial rule that all
input documents must match in order to be considered valid with respect to the
language.
Rules
use patterns to describe the set of input values that the rule applies to. The Main rule above has only one
pattern, "Hello,
World" that describes exactly one legal input value:
Hello, World
If
that input is fed to the M processor for this language, the processor will
report that the input is valid. Any other input will cause the processor to
report the input as invalid.
Typically,
a rule will use multiple patterns to describe alternative input formats that
are logically related. For example, consider this language:
language PrimaryColors {
syntax Main = "Red" | "Green" | "Blue";
}
The
Main
rule has three patterns – input must conform to one of these patterns in order
for the rule to apply. That means that the following is valid:
Red
as
well as this:
Green
and
this:
Blue
No
other input values are valid in this language.
Most
patterns in the wild are more expressive than those we’ve seen so far – most
patterns combine multiple terms. Every pattern consists of a sequence of one or
more grammar terms, each of which describes a set of legal text values. Pattern
matching has the effect of consuming the input as it sequentially matches the
terms in the pattern. Each term in the pattern consumes zero or more initial
characters of input – the remainder of the input is then matched against the
next term in the pattern. If all of the terms in a pattern cannot be matched
the consumption is “undone” and the original input will used as a candidate for
matching against other patterns within the rule.
A
pattern term can either specify a literal value (like in our first example) or
the name of another rule. The following language definition matches the same
input as the first example:
language HelloLanguage2 {
syntax Main = Prefix ", " Suffix;
syntax Prefix = "Hello";
syntax Suffix = "World";
}
Like
functions in a traditional programming language, rules can be declared to
accept parameters. A parameterized rule declares one or more “holes” that must
be specified to use the rule. The following is a parameterized rule:
syntax
Greeting(salutation, separator) = salutation separator "World";
To
use a parameterized rule, one simply provides actual rules as arguments to be
substituted for the declared parameters:
syntax Main =
Greeting(Prefix, ", ");
A
given rule name may be declared multiple times provided each declaration has a
different number of parameters. That is, the following is legal:
syntax
Greeting(salutation, sep, subject) = salutation sep subject;
syntax
Greeting(salutation, sep) = salutation sep "World";
syntax
Greeting(sep) = "Hello" sep "World";
syntax
Greeting = "Hello" ", " "World";
The
selection of which rule is used is determined based on the number of arguments
present in the usage of the rule.
A
pattern may indicate that a given term may match repeatedly using the standard
Kleene operators (e.g., ?, *, and +). For example, consider this language:
language HelloLanguage3 {
syntax Main = Prefix ", "? Suffix*;
syntax Prefix = "Hello";
syntax Suffix = "World";
}
This
language considers the following all to be valid:
Hello
Hello,
Hello,
World
Hello,
WorldWorld
HelloWorldWorldWorld
Terms
can be grouped using parentheses to indicate that a group of terms must be
repeated:
language HelloLanguage3 {
syntax Main = Prefix (", " Suffix)+;
syntax Prefix = "Hello";
syntax Suffix = "World";
}
which
considers the following to all be valid input:
Hello,
World
Hello,
World, World
Hello,
World, World, World
The
use of the + operator indicates that the group of terms must match at least
once.
>>>>>1.1.2
Character Processing
In
the previous examples of the HelloLanguage, the pattern term for the comma
separator included a trailing space. That trailing space was significant, as it
allowed the input text to include a space after the comma:
Hello, World
More
importantly, the pattern indicates that the space is not only allowed, but is
required. That is, the following input is not valid:
Hello,World
Moreover,
exactly one space is required, making this input invalid as well:
Hello, World
To
allow any number of spaces to appear either before or after the comma, we could
have written the rule like this:
syntax Main =
'Hello' ' '* ',' '
'* 'World';
While
this is correct, in practice most languages have many places where secondary
text such as whitespace or comments can be interleaved with constructs that are
primary in the language. To simplify specifying such languages, a language may
specify one or more named interleave patterns.
An
interleave pattern specifies text streams that are not considered part of the
primary flow of text. When processing input, the M processor implicitly injects
interleave patterns between the terms in all syntax patterns. For example,
consider this language:
language HelloLanguage {
syntax Main = "Hello" "," "World";
interleave Secondary = " "+;
}
This
language now accepts any number of whitespace characters before or after the
comma. That is,
Hello,World
Hello,
World
Hello , World
are
all valid with respect to this language.
Interleave
patterns simplify defining languages that have secondary text like whitespace
and comments. However, many languages have constructs in which such
interleaving needs to be suppressed. To specify that a given rule is not
subject to interleave processing, the rule is written as a token rule rather
than a syntax rule.
Token
rules identify the lowest level textual constructs in a language – by analogy
token rules identify words and syntax rules identify sentences. Like syntax
rules, token rules use patterns to identify sets of input values. Here’s a
simple token rule:
token BinaryValueToken = ("0" | "1")+;
It
identifies sequences of 0 and 1 characters much like this similar syntax rule:
syntax
BinaryValueSyntax = ("0" | "1")+;
The
main distinction between the two rules is that interleave patterns do not apply
to token rules. That means that if the following interleave rule was in effect:
interleave
IgnorableText = " "+;
then
the following input value:
0 1011 1011
would
be valid with respect to the BinaryValueSyntax rule but not with respect to the
BinaryValueToken rule, as interleave patterns do not apply to token rules.
M
provides a shorthand notation for expressing alternatives that consist of a
range of Unicode characters. For example, the following rule:
token AtoF =
"A" | "B" | "C" | "D" | "E" |
"F";
can
be rewritten using the range operator as follows:
token AtoF =
"A".."F";
Ranges
and alternation can compose to specify multiple non-contiguous ranges:
token AtoGnoD =
"A".."C" | "E".."G";
which
is equivalent to this longhand form:
token AtoGnoD =
"A" | "B" | "C" | "E" | "F" |
"G";
Note
that the range operator only works with text literals that are exactly one
character in length.
The
patterns in token rules have a few additional features that are not valid in
syntax rules. Specifically, token patterns can be negated to match anything not
included in the set, by using the difference operator (-). The following
example combines difference with any. Any matches any single character. The
expression below matches any character that is not a vowel:
any -
('A'|'E'|'I'|'O'|'U')
Token
rules are named and may be referred to by other rules:
token
AorBorCorEorForG = (AorBorC | EorForG)+;
token
AorBorC = 'A'..'C';
token
EorForG = 'E'..'G';
Because
token rules are processed before syntax rules, token rules cannot refer to
syntax rules:
syntax
X = "Hello";
token
HelloGoodbye = X | "Goodbye"; // illegal
However,
syntax rules may refer to token rules:
token
X = "Hello";
syntax
HelloGoodbye = X | "Goodbye"; // legal
The
M processor treats all literals in syntax patterns as anonymous token rules.
That means that the previous example is equivalent to the following:
token
X = "Hello";
token
temp = "Goodbye";
syntax
HelloGoodbye = X | temp;
Operationally,
the difference between token rules and syntax rules is when they are processed.
Token rules are processed first against the raw character stream to produce a
sequence of named tokens. The M processor then processes the language’s syntax
rules against the token stream to determine whether the input is valid and
optionally to produce structured data as output. The next section describes how
that output is formed.
>>>>>1.1.3
Output
M
processing transforms text into structured data. The shape and content of that
data is determined by the syntax rules of the language being processed. Each
syntax rule consists of a set of productions, each of which consists of a
pattern and an optional projection. Patterns were discussed in the previous
sections and describe a set of legal character sequences that are valid input.
Projections describe how the information represented by that input should be
produced.
Each
production is like a function from text to structured data. The primary way to
write projections is to use a simple construction syntax that produces
graph-structured data suitable for programs and stores. For example, consider
this rule:
syntax
Rock =
"Rock" => Item { Heavy { true
}, Solid { true } } ;
This
rule has one production that has a pattern that matches "Rock" and a
projection that produces the following value (using a notation known as M graphs):
Item
{
Heavy { true },
Solid { true }
}
Rules
can contain more than one production in order to allow different input to
produce very different output. Here’s an example of a rule that contains three
productions with very different projections:
syntax Contents
= "Rock" => Item { Heavy { true }, Solid { true } }
| "Water" => Item { Consumable { true }, Solid { false } }
| "Hamster" => Pet { Small { true }, Legs { 4 } } ;
When
a rule with more than one production is processed, the input text is tested
against all of the productions in the rule to determine whether the rule
applies. If the input text matches the pattern from exactly one of the rule’s
productions, then the corresponding projection is used to produce the result.
In this example, when presented with the input text "Hamster", the rule
would yield:
Pet
{
Small { true },
Legs { 4 }
}
as
a result.
To
allow a syntax rule to match no matter what input it is presented with, a
syntax rule may specify a production that uses the empty pattern, which will be
selected if and only if none of the other productions in the rule match:
syntax Contents
= "Rock" => Item { Heavy { true }, Solid { true } }
| "Water" => Item { Consumable { true }, Solid { false } }
| "Hamster" => Pet { Small { true }, Legs { 4 } }
| empty => NoContent { } ;
When
the production with the empty pattern is chosen, no input is consumed as part
of the match.
To
allow projections to use the input text that was used during pattern matching,
pattern terms associate a variable name with individual pattern terms by
prefixing the pattern with an identifier separated by a colon. These variable
names are then made available to the projection. For example, consider this
language:
language GradientLang {
syntax Main
= from:Color ", " to:Color => Gradient { Start { from }, End { to } } ;
token Color
= "Red" | "Green" | "Blue";
}
Given
this input value:
Red, Blue
The
M processor would produce this output:
Gradient
{
Start { "Red" },
End { "Blue" }
}
Like
all projection expressions we’ve looked at, literal values may appear in the
output graph. The set of literal types supported by M and a couple examples
follow:
- Text
literals – "ABC", 'ABC'
- Integer
literals – 25, -34
- Real
literals – 0.0, -5.0E15
- Logical
literals – true, false
- Null
literal – null
The
projections we’ve seen so far all attach a label to each graph node in the
output (e.g., Gradient, Start, etc.). The label is optional and can be omitted:
syntax Naked = t1:First
t2:Second => { t1, t2 };
The
label can be an arbitrary string – to allow labels to be escaped, one uses the id operator:
syntax Fancy = t1:First
t2:Second => id("Label with Spaces!"){ t1, t2 };
The
id
operator works with either literal strings or with variables that are bound to
input text:
syntax Fancy =
name:Name t1:First t2:Second => id(name){ t1, t2 };
Using
id
with variables allows the labeling of the output data to be driven dynamically
from input text rather than statically defined in the language. This example
works when the variable name is bound to a literal value. If the variable was
bound to a structured node that was returned by another rule, that node’s label
can be accessed using the labelof
operator:
syntax Fancier p:Point
=> id(labelof(p)) { 1, 2, 3 };
The
labelof
operator returns a string that can be used both in the id operator as well as a node
value.
The
projection expressions shown so far have no notion of order. That is, this
projection expression:
A { X { 100 }, Y { 200
} }
is
semantically equivalent to this:
A { Y { 200 }, X { 100
} }
and
implementations of M are not required to preserve the order specified by the
projection. To indicate that order is significant and must be preserved,
brackets are used rather than braces. This means that this projection
expression:
A [ X { 100 }, Y { 200
} ]
is not semantically equivalent to this:
A [ Y { 200 }, X { 100
} ]
The
use of brackets is common when the sequential nature of information is
important and positional access is desired in downstream processing.
Sometimes
it is useful to splice the nodes of a value together into a single collection.
The valuesof
operator will return the values of a node (labeled or unlabeled) as top-level
values that are then combinable with other values as values of new node.
syntax
ListOfA
= a:A => [a]
| list:ListOfA "," a:A => [
valuesof(list), a ];
Here,
valuesof(list)
returns the all the values of the list node, combinable with a to form a new list.
Productions
that do not specify a projection get the default projection.
For
example, consider this simple language that does not specify productions:
language GradientLanguage {
syntax Main = Gradient | Color;
syntax Gradient = from:Color " on " to:Color;
token Color = "Red" | "Green" | "Blue";
}
When
presented with the input "Blue
on Green” the language processor returns the following output:
Main[ Gradient [ "Blue",
" on ", "Green" ] ] ]
These
default semantics allows grammars to be authored rapidly while still yielding
understandable output. However, in practice explicit projection expressions
provide language designers complete control over the shape and contents of the
output.
>>1.2 Expressions
The easiest way to get started with M is to look at some
values. M has intrinsic support for constructing values. The following is a
legal value in M:
"Hello, world"
The quotation marks tell M that this is the text value Hello,
world. M literals can also be numbers. The following literal:
1
is the numeric value one. Finally, there are two literals
that represent logical values:
true
false
We’ve just seen examples of using literals to write down
textual, numeric, and logical values. We can also use expressions to write down
values that are computed.
An M expression applies an operator to zero or more operands
to produce a result. An operator is
either a built-in operator (e.g., +) or a user-defined function
(which we’ll look at in Section 1.3.5).
An operand is a value that is used by
the operator to calculate the result
of the expression, which is itself a value. Expressions nest, so the operands
themselves can be expressions.
M defines two equality operators: equals, ==,
and not equals, !=, both of which result in either true
or false
based on the equivalence/nonequivalence of the two operands. Here are some
expressions that use the equality operators:
1 == 1
"Hello" != "hELLO"
true != false
All of these expressions will yield the value true
when evaluated.
M defines the standard four relational operators less-than <,
greater-than >, less-than-or-equal <=, and
greater-than-or-equal >=, which work over
numeric and textual values. M also defines the standard three logical
operators: and &&, or ||, and not !
that combine logical values.
The following expressions show these operators in action:
1 < 4
1 == 1
1 < 4 != 1 > 4
!(1 + 1 == 3)
(1 + 1 == 3) || (2 + 2 < 10)
(1 + 1 == 2) && (2 + 2 < 10)
Again, all of these expressions yield the value true when evaluated.
>>>>>1.2.1 Collections
All of the values we saw in the previous section were simple values. In M, a simple value is a
value that has no uniform way to be decomposed into constituent parts. While
there are textual operators that allow you to extract substrings from a text
value, those operators are specific to textual data and don’t work on numeric
data. Similarly, any “bit-level” operations on binary values don’t apply to
text or numeric data.
An M collection is
a value that groups together zero or more elements
which themselves are values. We can write down collections in expressions
using an initializer,{
}.
The following expressions each use an initializer to yield a
collection value:
{ 1, 2 }
{ 1 }
{ }
As with simple values, the equivalence operators ==
and !=
are defined over collections. In M, two collections are considered equivalent
if and only if each element has a distinct equivalent element in the other
collection. That allows us to write the following equivalence expressions:
{ 1, 2 } == { 1, 2 }
{ 1, 2 } != { 1 }
both of which are true.
The elements of a collection can consist of different kinds
of values:
{ true, "Hello" }
and these values can be the result of arbitrary calculation:
{ 1 + 2, 99 – 3, 4 < 9 }
which is equivalent to the collection:
{ 3, 96, true }
The order of elements in a collection is not significant.
That means that the following expression is also true:
{ 1, 2 } == { 2, 1 }
Finally, collections can contain duplicate elements, which
are significant. That makes the following expression:
{ 1, 2, 2 } != { 1, 2 }
also true.
M defines a set of built-in operators that are specific to
collections. The most important of which is the in
operator which tests whether a given value is an element of the collection. The
result of the in operator is a logical value that
indicates whether the value is or is not an element of the collection. For
example, these expressions:
1 in { 1, 2, 3 }
!(1 in { "Hello", 9 })
both result in true.
M defines a Count member on
collections that calculates the number of elements in a collection. This use of
that operator:
{ 1, 2, 2, 3 }.Count
results in the value 4. The postfix #
operator returns the count of a collection, so
{ 1, 2, 2, 3 }# == { 1, 2, 2, 3 }.Count
returns true.
As noted earlier, M collections may contain duplicates. You
can apply the Distinct member to get a version of the collection
with any duplicates removed:
{ 1, 2, 3, 1 }.Distinct == { 1, 2, 3 }
The result of Distinct is not just a
collection but is also a set, i.e. a collection of distinct elements.
M also defines set union "|" and set
intersection "&" operators, which
also yield sets:
({ 1, 2, 3, 1 } | { 1, 2, 4 }) == { 1, 2, 3, 4 }
({ 1, 2, 3, 1 } & { 1, 2, 4 }) == { 1, 2 }
Note that union and intersection always return collections
that are sets, even when applied to collections that contain duplicates.
M defines the subset and superset using <=
and >=.
Again these operations convert collections to sets. The following expressions evaluate to true.
{ 1, 2 } <= { 1, 2, 3 }
{ "Hello", "World" } >= {
"World" }
{ 1, 2, 1 } <= { 1, 2, 3 }
Arguably the most commonly used collection operator is the where operator. The where
operator applies a logical expression (called the predicate) to each element in a collection (called the domain) and results in a new collection
that consists of only the elements for which the predicate holds true. To allow
the element to be used in the predicate, the where
operator introduces the symbol value to stand
in for the specific element being tested.
For example, consider this expression that uses a where operator:
{ 1, 2, 3, 4, 5, 6 } where value > 3
In this example, the domain is the collection { 1, 2, 3, 4, 5, 6 } and the predicate is the
expression value > 3. Note that the identifier value is available only within the scope of the
predicate expression. The result of this expression is the collection { 4, 5, 6 }.
M supports a richer set of query comprehensions using a
syntax similar to that of Language Integrated Query (LINQ). For example, the where example just shown can be written in long
form as follows:
from value in { 1, 2, 3, 4, 5, 6 } where value > 3 select
value
In general, M supports the LINQ operators with these
significant exceptions:
1.
ElementAt/First/Last/Range/Skip are not
supported – M collections are unordered and do not support positional access to
elements.
2.
Reverse is not supported– again, position is not
significant on M collections.
3.
Take/TakeWhile/Single – these operators do not
exist in M.
4.
Choose selects an arbitrary element.
5.
ToArray/ToDictionary/ToList – there are no
corresponding CLR types in M.
6.
Cast – typing works differently in M – you can
achieve the same effect using a where operator.
While the where operator
allows elements to be accessed based on a calculation over the values of each
element. There are situations where it would be much more convenient to simply
assign names to each element and then access the element values by its assigned
name. M defines a distinct kind of value called an entity for just this purpose.
>>>>>>>>1.2.2
Entities
An entity consists
of zero or more name-value pairs called fields.
Entities can be constructed in M using an initializer. Here’s a simple entity
value:
{ X => 100, Y => 200 }
This entity has two fields: one named X with the value of 100,
the other named Y with the value of 200.
Entity initializers can use arbitrary expressions as field
values:
{ X => 50 + 50, Y => 300 - 100 }
And the names of members can be arbitrary Unicode text:
{ @[Horizontal Coordinate] => 100, @[Vertical Coordinate] =>
200 }
If the member name matches the Identifer
pattern, it can be written without the surrounding @[ ]. An identifier must begin with an upper or
lowercase letter or "_" and be followed
by a sequence of letters, digits, "_", and "$".
Here are a few examples:
HelloWorld => 1
// matches the Identifier pattern
@[Hello World] => 1
// doesn’t match identifier pattern – escape it
_HelloWorld => 1
// matches the Identifier pattern
A => 1
// matches the Identifier pattern
@[1] => 1 // doesn’t match identifier pattern –
escape it
It is always legal to use @[ ] to escape symbolic names;
however, most of the examples in this document use names that don’t require
escaping and therefore do not use escaping syntax for readability.
M imposes no limitations on the values of entity members. It
is legal for the value of an entity member to refer to another entity:
{ TopLeft { X => 100, Y => 200 }, BottomRight { X => 400,
Y => 100 } }
or a collection:
{ LotteryPicks { 1, 18, 25, 32, 55, 61 }, Odds => 0.00000001 }
or a collection of entities:
{
Color => "Red",
Path {
{ X => 100,
Y => 100 },
{ X => 200,
Y => 200 },
{ X => 300,
Y => 100 },
{ X => 300,
Y => 100 },
}
}
This last example illustrates that entity values are legal
for use as elements in collections.
Entity initializers are useful for constructing new entity
values. M defines the dot, ".", operator over
entities for accessing the value of a given member. For example, this
expression:
{ X => 100, Y => 200 }.X
yields the value of the X member, which in
this case is 100. The result of the dot operator is just a value
that is subject to subsequent operations. For example, this expression:
{ Center { X => 100, Y => 200 }, Radius => 3 }.Center.Y
yields the value 200.
>>>>>1.3 Types
Expressions give us a great way to write down how to calculate values based on other values.
Often, we want to write down how to categorize
values for the purposes of validation or allocation. In M, we categorize values
using types.
An M type describes a collection of acceptable or conformant values. We use types to
constrain which values may appear in a particular context (e.g., an operand, a storage location).
With a few notable exceptions, M allows types to be used as
collections. For example, we can use the in
operator to test whether a value conforms to a given type. The following
expressions are true:
1 in Number
"Hello, world" in Text
Note that the names of the built-in types are available
directly in the M language. We can introduce new names for types using type
declarations. For example, this type declaration introduces the type name My Text as a synonym for the Text
simple type:
type 2[My Text] : Text;
With this type name now available, we can write the
following:
"Hello, world" in @[My Text]
Note that the name of the type @[My Text] contains
spaces and is subject to the same escaping rules as the member names in
entities.
While it is moderately useful to introduce your own names
for an existing type, it’s far more useful to apply a predicate to the
underlying type:
type SmallText : Text where value.Count < 7;
In this example, we’ve constrained the universe of possible Text
values to those in which the value contains less than seven characters. That
means that the following holds true:
"Terse" in SmallText
!("Verbose" in SmallText)
Type declarations compose:
type TinyText : SmallText where value.Count < 6;
The preceding is equivalent to the following:
type TinyText : Text where value.Count < 6;
It’s important to note that the name of the type exists so
an M declaration or expression can refer to it. We can assign any number of
names to the same type (e.g., Text where value.Count < 7)
and a given value either conforms to all of them or to none of them. For
example, consider this example:
type A : Number where value < 100;
type B : Number where value < 100;
Given these two type definitions, both of the following
expressions will evaluate to true:
1 in A
1 in B
If we introduce the following third type:
type C : Number where value > 0;
we can also state this:
1 in C
In M types are sets of values and it is possible to define a
new type by explicitly enumerating those values.
type PrimaryColors { "Red", "Blue",
"Yellow" }
This is how an enumeration is defined in M. Any type in M is
a collection of values. For example the types Logical and Integer8
defined below could be defined as the collections:
{ true, false }
{-128, -127, ..., -1, 0, 1, ..., 127}
A general principle of M is that a given value may conform
to any number of types. This is a departure from the way many object-based
systems work, in which a value is bound to a specific type at
initialization-time and is a member of the finite set of supertypes that were
specified when the type was defined.
One last type related operation bears discussion –the type
ascription operator ":". The type ascription
operator asserts that a given value conforms to a specific type.
In general, when we see values in expressions, M has some
notion of the expected type of that value based on the declared result type for
the operator or function being applied. For example, the result of the logical
and operator "&&" is declared
to be conformant with type Logical.
It is occasionally useful (or even required) to apply
additional constraints to a given value – typically to use that value in
another context that has differing requirements.
For example, consider the following simple type definition:
type SuperPositive : Number where value > 5;
And let’s now assume that there’s a function named CalcIt
that is declared to accept a value of type SuperPositive as an
operand. We’d like M to allow expressions like this:
CalcIt(20)
CalcIt(42 + 99)
and prohibit expressions like this:
CalcIt(-1)
CalcIt(4)
In fact, M does exactly what we want for these four
examples. This is because these expressions express their operands in terms of
simple built-in operators over constants. All of the information needed to
determine the validity of the expressions is readily and cheaply available the
moment the M source text for the expression is encountered.
However, if the expression draws upon dynamic sources of
data or user-defined functions, we must use the type ascription operator to
assert that a value will conform to a given type.
To understand how the type ascription operator works with
values, let’s assume that there is a second function, GetVowelCount, that
is declared to accept an operand of type Text and return a
value of type Number that indicates the number of vowels in the
operand.
Since we can’t know based on the declaration of GetVowelCount
whether its results will be greater than five or not, the following expression
is not a legal M expression:
CalcIt( GetVowelCount(someTextVariable) )
Because GetVowelCount’s declared
result type Number includes values that do not conform to the
declared operand type of CalcIt which is SuperPositive,
M assumes that this expression was written in error and will refuse to even
attempt to evaluate the expression.
When we rewrite this expression to the following legal
expression using the type ascription operator:
CalcIt( GetVowelCount(someTextVariable) : SuperPositive )
we are telling M that we have enough understanding of the GetVowelCount function to know that we’ll always get a
value that conforms to the type SuperPositive. In short,
we’re telling M we know what we’re doing.
But what if we don’t? What if we misjudged how the GetVowelCount
function works and a particular evaluation results in a negative number?
Because the CalcIt function was declared to only accept values
that conform to SuperPositive, the system will ensure that all
values passed to it are greater than five. To ensure this constraint is never
violated, the system may need to inject a dynamic constraint test that has a
potential to fail when evaluated. This failure will not occur when the M source
text is first processed (as was the case with CalcIt(-1))
– rather it will occur when the expression is actually evaluated.
Here’s the general principle at play.
M implementations will typically attempt to report any
constraint violations before the first expression is evaluated. This is called static enforcement and implementations
will manifest this much like a syntax error. However, as we’ve seen, some
constraints can only be enforced against live data and therefore require dynamic enforcement.
In general, the M philosophy is to make it easy for the user
to write down their intention and put the burden on the M implementation to
“make it work.” However, to allow a particular M program to be used in diverse
environments, a fully featured M implementation should be configurable to
reject M program that rely on dynamic enforcement for correctness in order to
reduce the performance and operational costs of dynamic constraint violations.
>>>>>1.3.1 Collection types
M defines a typeconstructor for specifying collection
types. The collection type constructor restricts the type and count of
elements a collection may contain. All collection types are restrictions over
the intrinsic type Collection, which all
collection values conform to:
{ } in Collection
{ 1, false } in Collection
! ("Hello" in Collection)
The last example is interesting, in that it illustrates that
the collection types do not overlap with the simple types. There is no value
that conforms to both a collection type and a simple type.
A collection type constructor specifies both the type of
element and the acceptable element count. The element count is typically
specified using one of the three operators:
T* - zero or more Ts
T+ - one or more Ts
T#m..n – between m and nTs.
The collection type constructors can either use operators or
be written longhand as a constraint over the intrinsic type Collection:
type SomeNumbers : Number+;
type TwoToFourNumbers : Number#2..4;
type ThreeNumbers : Number#3;
type FourOrMoreNumbers : Number#4..;
These types describe the same sets of values as these
longhand definitions:
type SomeNumbers : Number where value.Count >=
;
type TwoToFourNumbers : Number where value.Count >=
2
&&
value.Count <= 4;
type ThreeNumbers : Number where value.Count == 3;
type FourOrMoreNumbers : Number where value.Count >=
4;
Independent of which form is used to declare the types, we
can now assert the following hold:
!({ } in TwoToFourNumbers)
!({ "One", "Two", "Three"
} in TwoToFourNumbers)
{ 1, 2, 3 } in TwoToFourNumbers
{ 1, 2, 3 } in ThreeNumbers
{ 1, 2, 3, 4, 5 } in FourOrMoreNumbers
The collection type constructors compose with the where operator, allowing the following type
check to succeed:
{ 1, 2 } in (Number where value < 3)* where value.Count % 2 ==
0
Note that the where inside the parentheses applies to
elements of the collection, and the where outside the parentheses operator
applies to the collection itself.
>>>>>1.3.2 Nullable types
We have seen many useful values: 42,
"Hello", {1,2,3}.
The distinguished value null serves as a place holder
for some other value that is not known. A type with null in the value space is
called a nullable type. The value null can be added to the
value space of a type with an explicit union of the type and a collection
containing null or using the postfix operator ?.
The following expressions are true:
! (null in Integer)
null in Integer?
null in (Integer | { null } )
The ?? operator converts
between a null value and known value:
null ?? 1 == 1
3 ?? 1 == 3
Arithmetic operations on a null operand
return null:
1 + null == null
null * 3 == null
Logical operators, conditional, and
constraints require non nullable operands.
>>>1.3.3 Entity types
Just as we can use the collection type constructors to
specify what kinds of collections are valid in a given context, we can do the
same for entities using entity types.
An entity type declares the expected members for a set of
entity values. The members of an entity type can be declared either as fields or as computed values. The value of a field is stored; a computed value
is evaluated. All entity types are restrictions over the Entity type.
Here is the simplest entity type:
type MyEntity : Language.Entity;
The type MyEntity does not declare any
fields. In M, entity types are open
in that entity values that conform to the type may contain fields whose names
are not declared in the type. That means that the following type test:
{ X => 100, Y => 200 } in MyEntity
will evaluate to true, as the MyEntity type says
nothing about fields named X and Y.
Most entity types contain one or more field declarations. At
a minimum, a field declaration states the name of the expected field:
type Point { X; Y; }
This type definition describes the set of entities that
contain at least fields named X
and Y
irrespective of the values of those fields. That means that the following type
tests will all evaluate to true:
{ X => 100, Y => 200 } in Point
{ X => 100, Y => 200, Z => 300 } in Point //
more fields than expected OK
! ({ X => 100 } in Point) // not enough fields – not OK
{ X => true, Y => "Hello, world" } in
Point
The last example demonstrates that the Point type does not
constrain the values of the X and Y fields – any value
is allowed. We can write a new type that constrains the values of X
and Y
to numeric values:
type NumericPoint {
X : Number;
Y : Number where value > 0; }
Note that we’re using type ascription syntax to assert that
the value of the X and Y fields must conform to the
type Number.
With this in place, the following expressions all evaluate to true:
{ X => 100, Y => 200 } in NumericPoint
{ X => 100, Y => 200, Z => 300 } in
NumericPoint
! ({ X => true, Y => "Hello, world" }
in NumericPoint)
! ({ X => 0, Y => 0 } in NumericPoint)
As we saw in the discussion of simple types, the name of the
type exists only so that M declarations and expressions can refer to it. That
is why both of the following type tests succeed:
{ X => 100, Y => 200 } in NumericPoint
{ X => 100, Y => 200 } in Point
even though the definitions of NumericPoint and Point
are independent.
>>>>>1.3.4 Declaring fields
Fields are named units of storage that hold values. M allows
you to initialize the value of a field as part of an entity initializer.
However, M does not specify any mechanism for changing the value of a field
once it is initialized. In M, we assume that any changes to field values happen
outside the scope of M.
A field declaration can indicate that there is a default
value for the field. Field declarations that have a default value do not
require conformant entities to have a corresponding field specified (we
sometimes call such field declarations optional
fields). For example, consider this type definition:
type Point3d {
X : Number;
Y : Number;
Z => -1 : Number; // default value of negative one }
Because the Z field has a default value,
the following type test will succeed:
{ X => 100, Y => 200 } in Point3d
Moreover, if we apply a type ascription operator to the
value:
({ X => 100, Y => 200 } : Point3d)
we can now access the Z field like this:
({ X => 100, Y => 200 } : Point3d).Z
This expression will yield the value -1.
If a field declaration does not have a corresponding default
value, conformant entities must specify a value for that field. Default values
are typically written down using the explicit syntax shown for the Z
field of Point3d. If the type of a field is either nullable or a zero-to-many
collection, then there is an implicit default value for the declaring field of null
for optional and {} for the collection.
For example, consider this type:
type PointND {
X : Number;
Y : Number;
Z : Number?; // Z is optional
BeyondZ : Number*; // BeyondZ is optional too }
Again, the following type test will succeed:
{ X => 100, Y => 200 } in PointND
and ascribing the PointND to the value
will allow us to get these defaults:
({ X => 100, Y => 200 } : PointND).Z == null
({ X => 100, Y => 200 } : PointND).BeyondZ == { }
The choice of using a nullable type vs. an explicit default
value to model optional fields typically comes down to style.
>>>>>>>>1.3.5
Declaring computed values
Calculated values are named expressions whose values are
computed rather than stored. >>>
>>1.3.6 Constraints on entity types
Like all types, a constraint may be applied to an entity
type using the where operator. Consider the following
type definition:
type HighPoint {
X : Number;
Y : Number; } where X < Y;
In this example, all values that conform to the type HighPoint
are guaranteed to have an X value that is less
than the Y value. That means that the following expressions:
{ X => 100, Y => 200 } in HighPoint
! ({ X => 300, Y => 200 } in HighPoint)
both evaluate to true.
Now consider the following type definitions:
type Point {
X : Number;
Y : Number;
}
type Visual {
Opacity : Number;
}
type VisualPoint {
DotSize : Number;
} where value in Point && value in Visual;
The third type, VisualPoint, names the set of
entity values that have at least the numeric fields X, Y,
Opacity,
and DotSize.
Because it is a common desire to factor member declarations
into smaller pieces that can be easily composed, M provides explicit syntax
support for this. We can rewrite the VisualPoint type
definition using that syntax:
type VisualPoint : Point, Visual {
DotSize : Number;
}
To be clear, this is just shorthand for the long-hand
definition above that used a constraint expression. Both of these definitions
are equivalent to this even longer-hand definition:
type VisualPoint {
X : Number;
Y : Number;
Opacity : Number;
DotSize : Number;
}
Again, the names of the types are just ways to refer to
types – the values themselves have no record of the type names used to describe
them.
>>>>>1.4 Queries
M extends LINQ
query comprehensions with several features to make authoring simple queries
more concise. The keywords, where
and select are available
as binary infix operators. Also, indexers are automatically added to strongly
typed collections. These features allow common queries to be authored more
compactly as illustrated below.
>>>>1.4.1 Filtering
Filtering extracts
elements from an existing collection. Consider the following collection:
People {
{ First => "Mary", Last => "Smith",
Age => 24 },
{ First => "John", Last => "Doe",
Age => 32 },
{ First => "Dave", Last => "Smith",
Age => 32 },
}
This query extracts
people with Age
== 32 from the People
collection:
from p in
People
where p.Age ==
32
select p
An equivalent query
can be written with either of the following expressions:
People where
value.Age == 32
People.Age(32)
The where operator takes a
collection on the left and a Logical expression on the right. The where
operator introduces a keyword identifier value
into the scope of the Logical expression that is bound to each member of the
collection. The resulting collection contains the members for which the
expression is true. The expression:
CollectionwhereExpression
is exactly
equivalent to:
from value in
Collection
where Expression
select value
Collection types
gain indexer members that correspond
to the fields of their corresponding element type. That is, this:
Collection . Field (
Expression )
is equivalent to:
from value in
Collection
where Field ==
Expression
select value
>>>>1.4.2 Selection
Select is also
available as an infix operator. Consider the following simple query:
from p in People
select p.First + p.Last
This computes the
select expression over each member of the collection and returns the result.
Using the infix select
it can be written equivalently as:
People select
value.First + value.Last
The select operator takes a
collection on the left and an arbitrary expression on the right. As with where, select introduces the keyword
identifier value
that ranges over each element in the collection. The select operator maps the
expression over each element in the collection and returns the result. The
expression:
Collection select
Expression
Is exactly
equivalent to:
from value in
Collection
select Expression
A trivial use of
the select operator is to extract a single field:
People select
value.First
Collections are
augmented with accessors to fields which can be extracted directly. For example People.First yields a new collection containing all
the first names and People.Last
yields a collection with all the last names.
>>>>1.5 Modules
All of the examples shown so far have been “loose M” that is
taken out of context. To write a legal M program, all source text must appear
in the context of a module definition.
A module defines a top-level namespace for any type names that are defined. A
module also defines a scope for defining extents that will store actual values,
as well as computed values.
Here is a simple module definition:
module Geometry {
// declare a type
type Point {
X : Integer; Y : Integer;
}
// declare some extents
Points : Point*;
Origin : Point;
// declare a computed value
TotalPointCount { Points.Count + 1; }
}
In this example, the module defines one type named Geometry.Point.
This type describes what point values will look like, but doesn’t mention any
locations where those values can be stored.
This example also includes two module-scoped extents (Points
and Origin). Module-scoped field declarations are identical in syntax to those
used in entity types. However, fields declared in an entity type simply name
the potential for storage once an
extent has been determined; in contrast fields declared at module-scope name actual storage that must be mapped by an
implementation in order to load and interpret the module.
Modules may refer to declarations in other modules by using
an import directive to name the
module containing the referenced declarations. For a declaration to be
referenced by other modules, the declaration must be explicitly exported using
an export directive.
Consider this module:
module MyModule {
import HerModule; // declares HerType
export MyType1;
export MyExtent1;
type MyType1 : Logical*;
type MyType2 : HerType;
MyExtent1 : Number*;
MyExtent2 : HerType;
}
Note that only MyType1 and MyExtent1
are visible to other modules. This makes the following definition of HerModule
legal:
module HerModule {
import MyModule; // declares MyType1 and MyExtent1
export HerType;
type HerType : Text where value.Count < 100;
type Private : Number where !(value in MyExtent1);
SomeStorage : MyType1;
}
As this example shows, modules may have circular
dependencies.