Alphabet Soup

A Survey of .NET Languages And Paradigms

Joel Pobar

This article discusses:

  • Object-oriented programming
  • Functional programming
  • Dynamic programming
  • New paradigms for .NET languages
This article uses the following technologies:
C#, C++, F#, IronPython, IronRuby, Visual Basic

Contents

Object Orientation
Functional Programming
Dynamic Languages
Safe and Useful
LINQ
Inline XML in Visual Basic 9.0
More to Come

The Windows operating system has never been a better platform for programmers. Hundreds of languages target Windows® either directly through the Win32® API or through the CLR, and more are being built as you read.

One of the goals of the CLR was to enable a thousand flowers to bloom in one compatible ecosystem, where languages and APIs seamlessly integrate within the same runtime. It thus far has succeeded fantastically—new languages are popping up everywhere. Dynamic languages like the Microsoft implementations of Ruby (IronRuby), Python (IronPython), and PHP (Phalanger) are now first-class citizens among Microsoft® .NET Framework languages. A functional language called F# was recently inducted, too. And while you've probably heard a lot of talk about all of these new languages and language paradigms, you may be wondering what it all means.

A tour of these language paradigms and a look at some of the important implementation aspects will answer your questions and will also help to illustrate how some of these new languages and old paradigms are influencing the design and implementation of future iterations of the C# and Visual Basic® languages.

To understand the kinds of changes the new designs represent, you need to understand the differences between traditional languages (such as C# and Visual Basic) and new languages (such as F#, IronPython, and IronRuby). There are three main themes in this space: object-oriented programming (a model that C# and Visual Basic both leverage), functional programming (F#), and dynamic programming (IronPython and IronRuby). Let's take a look at these paradigms and explore their distinguishing features.

Object Orientation

Object orientation (OO) is the paradigm with which you're probably most familiar. It enables you to describe the world in terms of objects and the contracts that bind the interactions between them. OO leverages features such as type contracts, polymorphism, and granular visibility to provide excellent reuse and encapsulation.

Typically, object-oriented languages employ a static type system and, as a result, are referred to as statically typed languages. This means that all types created and used by a program are checked at compile time; this prevents you from calling a method, Moo, on an object of type Duck if the method doesn't exist. Broken and misused contracts between types will be detected by the compiler before the code runs, which in theory leads to fewer runtime errors.

OO has its drawbacks, however. Type safety can lead programmers to rely on the compiler to catch errors instead of creating the necessary test infrastructure. OO also persuades programmers to define their contracts up front, which tends to be the antithesis of rapid prototyping and multi-person programming environments. In large software projects, contracts between components (usually owned by separate teams) tend to evolve throughout the development cycle, requiring the contract consumers to constantly update their code. OO languages can be complicated and verbose as a result of these problems.

Functional Programming

Functional programming treats program computation as the evaluation of mathematical functions. Adding two numbers together is a function. Given two input values, say 5 and 10, the value 15 is output. To solve a problem, a functional programmer breaks the problem down into smaller chunks that can be represented by functions and then combines these functions to produce the intended output.

Functional programming generally avoids state (things like variables and objects) and the mutation of state. This is practically the opposite of OO, whose main purpose is to create and manipulate state (objects). By avoiding state, functional programs tend to be more precise, more accurate, and provable. This is due to the lack of unpredicted side effects that can occur when the internal state of a program (such as a variable) changes in between operations, thus producing an unexpected result in one or more of the operations. Object-oriented and dynamic languages rely on the programmer to encapsulate and protect state to reduce the unpredicted side effects that lead inevitably to more errors.

Functional languages can be pure, meaning they have no side effects or state. However, most popular functional languages have state manipulation features to facilitate interoperability with external APIs, the operating system, and other languages. They are also mindful of the fact that some programs must represent problems with some amount of state information.

There are a number of useful features that are common to functional languages. A personal favorite of mine is the higher-order function, which is a function that can take another function as an argument and can return a function as a result, leading to extremely powerful code reuse. And while most languages have this mechanism built-in, functional languages tend to elevate it as a first-class feature.

A nice example of the power of higher-order functions is the classic Map API, which executes (or maps) a function over every element in an array or list of data. To start with, let's write this in JavaScript, a commonly used Web scripting language. Given this array

var data = [1, 2, 3, 4, 5];

let's write a function that will execute code over every element in the array, like this:

function map (func, array)
{
    var returnData = new Array(array.length);
    for (i = 0; i< array.length; i++)
    {
        returnData[i] = func(array[i]);
    }
        
    return returnData;        
}

Here's a function that increments a number by one:

function increment(element)
{
    return element++;
}

And now let's use it:

print (map (incremenent, data));

output: [2,3,4,5,6]

This example of the Map API is simplistic, but it shows the power of higher-order functions. Map takes a function and an array and executes that function over each array element, returning the result in a new array.

Given that there's no mutated state (you create a new array for the return result), and that no function result depends on a previous result, you can now start to think about extending the map example across multiple processor cores (by splitting the array in half and joining the two resulting arrays), and even across multiple machines (split the serialized data across n machines, pass the code function to the machine, execute and join the serialized results back at a single host) without having to worry about concurrency issues like state management. Extremely powerful stuff for something so simple!

Initially developed by Don Syme in Microsoft Research, the F# programming language is the Microsoft answer to functional programming on the .NET Framework. When released, it will be a fully supported language within Visual Studio®. The F# and the Visual Studio integration package are both available for download at go.microsoft.com/fwlink/?LinkID=261287. In addition, Foundations of F# by Robert Pickering is a great introduction to the language.

OK, let's use F# to explore some of the functional language world. Take the Map example again, this time in F#:

// increment function
let increment x = x + 1

// data
let data = [1 .. 5]

// map (func, myList)
let map func myList = 
  { for x in myList -> func x }

print_any (map increment data)

Yes, it's beautifully succinct. The very first line defines a simple function, increment, which takes an argument, x, which evaluates to x + 1. I then define data as an immutable list that contains the integers 1 through 5. The map function takes a function and a list as arguments and uses a little sprinkling of traditional programming to loop through the list and execute the func function argument, passing to it the current element in the list. The print_any function then executes the map function using the increment function and the data list as arguments, and it then prints the result to the screen.

Where are the types? Nowhere in this example. Under the hood, the variable, data, is actually typed as a list of System.Int32, but I didn't have to tell the compiler that—it simply infers it using type inference. The compiler works a little harder so that you can type a little less. Sounds good to me. You could easily rewrite some of the previous code in just one line, using a feature called list comprehension:

print_any { for x in 1..5 -> x + 1 }

Another cool F# feature is pattern matching:

let booleanToString x = 
   match x with false -> "False" | _ -> "True"

This function takes a Boolean type and matches it against false or everything else, returning the appropriate string. What happens if I pass a string to the booleanToString function? Again, the compiler has done the heavy lifting and defined the type of the x argument as type bool. It infers this from the way x is used in the function (it's only matched against a bool type in this case).

Pattern matching can also be used to build powerful function dispatch mechanisms, easily reproducing virtual method dispatch in the OO world and more. Virtual methods really start to strain when you need to vary behavior based on the receiver (the object on which the virtual method is invoked) and the methods parameters. The Visitor Pattern was designed to help out in this situation but is basically unnecessary in F#—pattern matching subsumes this.

User-defined types are supported and usually exposed via records (similar to classes in the OO world) or tuples (usually a type that is an ordered sequence). Here is a user-defined player and soccer team record:

type player = 
  { firstName : string; 
    lastName : string;
  }

type soccerTeam = 
  { name : string;
    members : player list;
    location : string;
  }  

Lazy evaluation is another common functional language feature whose power is derived from the theory that there are no explicit side effects in functional programming. Lazy evaluation relies on both the compiler and programmer's ability to choose the evaluation order of expressions, which allows the computation to be delayed until needed. Both the compiler and programmer can use lazy evaluation techniques as a very precise performance optimization that works by avoiding unnecessary calculations. It's also useful as a technique to deal with computation over infinite (or extremely large) datasets. In fact, the Applied Games research group at Microsoft Research uses this technique in F# to parse terabytes of Xbox LIVE® log data.

Here's lazy evaluation in F#:

    let lazyTwoTimesTwo = lazy (2 * 2)
 
    let actualValue = Lazy.force lazyTwoTimesTwo

When this code executes, the lazyTwoTimesTwo simply sits in the runtime as a lightweight pointer to a function that executes 2 * 2. Only when you actually force the function to execute do you get a result.

While the functional programming model can be hard to get your head around (expect around 30-40 hours of ramp up), it rewards you as a programmer graciously. You can solve problems with less code and fewer errors. The paradigm keeps your code safe by minimizing unintended side effects and keeps it running fast by performing aggressive optimizations. Besides, simplicity tends to scale well—you just need to peek at the Map code and think about how that code could easily be distributed across thousands of machines to get a feeling for that.

So with that, you've seen some of the nicer features of functional programming such as type inference, higher-order functions, pattern matching, and user-defined types. Now, let's take a tour of dynamic languages.

Dynamic Languages

Dynamic programming became popular back in the mid 1990s when Web-based applications became all the rage. Perl and PHP were two dynamic languages used heavily on the server to add interactive, dynamic elements to the Web sites they drove. Since then, dynamic languages have gained even more traction, providing a technical answer to new software development methodologies like agile.

A dynamic programming language differs from a static language (C# and Visual Basic .NET are both static object-oriented languages) in the way it compiles and executes program code. With dynamic languages, code compilation is typically delayed until run time, when the program is actually running. In other cases, the code is simply interpreted. These delays in compilation enable the program code to include and execute behaviors such as extending objects on the fly and allowing the programmer to bend the type system at will.

In fact, the just-in-time (JIT) compilation and execution technique that delays the process until the last second is what gives dynamic languages their power and rich feature set. As a result, dynamic languages are often referred to as late-bound because all action binding (such as calling a method or getting a property) is done at run time instead of at compile time. To illustrate some of the features of dynamic languages, let's take a closer look at IronPython and IronRuby.

IronPython is the Microsoft implementation of the Python programming language for the .NET Framework. It is the brainchild of Jim Hugunin, who started working on the original prototype in his garage. Six months later, he joined Microsoft and built the IronPython compiler. You can download the full source and installer for IronPython at go.microsoft.com/fwlink/?LinkId=112377.

IronRuby is the name given to the Microsoft implementation of the Ruby programming language. John Lam is the active Ruby hacker who wrote the original Ruby-to-CLR bridge, called RubyCLR. He later joined Microsoft to work on the IronRuby team. You can download the full IronRuby source at ironruby.net.

Now let's start IronPython and take a peek at a nice feature called a Read Eval Print Loop (REPL). A REPL loop is designed to take code input one line at a time and execute. It really shows off the late-bound nature of dynamic languages like Python. Figure 1 shows the IronPython command-line REPL loop.

Figure 1 IronPython REPL Loop

Figure 1** IronPython REPL Loop**

The REPL loop accepts Python code as input and executes it immediately. In this case, I ask it to print "Hello, World!" to the screen. As soon as the programmer hits the enter key, the IronPython compiler compiles the statement into intermediate language (IL) and executes it—all at run time.

The REPL loop also adheres to the full scoping rules of the language—in this case, I assigned the expression 1 + 2 to the variable "a" and was able to reference that variable in the very next line. The compilation of REPL input happens in-process. There's no need to emit a .dll or .exe file; the IronPython compiler simply leverages an in-memory code generation feature of the CLR called lightweight code generation (LCG).

The type systems of dynamic languages are very flexible and relaxed. Object-oriented concepts such as classes and interfaces are usually supported, but with a twist: strict contract enforcement such as that found in static languages isn't generally seen. A classic example of this more relaxed type system is found in the concept of duck typing, a commonly supported dynamic language type system feature. With duck typing, if a type looks like a duck and quacks like a duck, the compiler assumes it must be a duck. Figure 2 shows this concept in action using Ruby—you can run this code in IronRuby if you like.

Figure 2 Duck Typing in Ruby

class Duck
  def Quack()
     puts "Quack!"
  end
end

class Cow
  def Quack()
    puts "Cow's dont quack, they Mooo!"
  end
end

def QuackQuack(duck)
  duck.Quack()
end

animal = Duck.new()
QuackQuack(animal)
animal = Cow.new()
QuackQuack(animal) 

Figure 2 shows two class definitions (Duck and Cow), a method called QuackQuack, and code to instantiate those objects, passing them to the QuackQuack method. The code once loaded and run outputs the following:

Quack!
Cow's don't quack, they Mooo!

The QuackQuack method invokes the method Quack on the method argument, not caring what the type of that argument is—it only cares whether the argument has the method Quack. The lookup and invocation of the Quack method occurs on the argument at run time instead of compile time, allowing the language to make sure the argument looks like a duck before it makes it quack.

The simile to duck typing in the static typing world is the interface. If a type implements an interface in the static world, the static type system enforces its adherence to the full and complete interface structure, so invocation of an interface method on any implementing type is guaranteed. Duck typing only concerns itself with part of the type's structure—the method name being addressed. It achieves this through late binding—the compiler simply emits code that will perform a method lookup on the object (usually using the string-based name of the method) and then a call to that method if the lookup was successful.

Language features aren't the only cool features in dynamic languages such as IronPython and IronRuby. Dynamic languages even allow you to host APIs, which means you can host the language, compiler, and even the REPL interactive feature in your own applications. Say you're building a large, complicated application, maybe a photo manipulation tool, and you want to expose an automation or scripting facility to your users. You could host IronPython or IronRuby and allow users to create Python or Ruby scripts that manipulate your application programmatically. Simply expose your application's internal objects and APIs to the language hosting API, create a text window to host the REPL loop, and you'll be enabling scripting in your application.

Having features like command-line-driven REPL loops has a really nice productivity side effect: you can easily cook up a working prototype without the save-compile-run cycle. Modifying and extending types on the fly is encouraged, allowing you to get the solution from your head to code quickly without worrying about what interface or class definition contracts look like. Once you're in a good place, you can move from your dynamic language prototype to the stricter and safer world of static languages like C#.

Flexible type systems are also really good at consuming unstructured data (like HTML on the Web) and can easily be used as change-resistant glue to strict interfaces that are likely to change versions over time. With new application paradigms like Web-based computing, dynamic languages provide a great solution for dealing with uncertainty.

Hosting APIs give your applications an extensibility point that's second to none; you can offer an application experience similar to those of Microsoft Excel® and VBScript Macros.

Safe and Useful

Safety and usefulness are also important considerations in language selection. The usefulness of a language is measured by its ease of use, productivity benefits, and so forth. The safety of a language is measured by its type safety (whether it allows invalid casts), programmatic barriers (whether you can walk over an array boundary and trample on other memory in the stack), security features, and more. Sometimes you trade usefulness for safety and vice versa.

Interestingly, you find C#, C++, Visual Basic .NET, Python, and the like in the useful but less safe category. While it's true that managed code languages like C# are safer than traditional native code, you can still get in trouble. Because these languages focus on state manipulation, they generally have unknown side effects—and when working in a concurrent world, this becomes a problem.

A C#/Visual Basic .NET multithreaded program that runs fine on a single-core machine might crash on a multicore due to a race condition that exposes a side effect that the programmer didn't know about. Also, while these languages have a good general level of type safety, they still can't prevent an unknowing programmer from casting an object to void*.

Haskell (a pure functional language used in academia) is in the safe-but-not-as-useful category. Because Haskell has no side effects (code blocks are functions and variables never change), and because it forces the programmer to define his types and intentions up front, it is considered a safe language. Unfortunately, Haskell is incredibly hard to get your head around, even for ninja programmers, and it's generally hard to read Haskell code you haven't written yourself.

In my opinion, F# is more useful than Haskell with its .NET Framework integration, opt-in procedural syntax, and features to deal with Objects that come from the object-oriented world. Clearly, the next-generation languages would do well to borrow the best features of all the languages I've discussed. Who wouldn't like a language that could move from a rigid, enforceable static type system to more dynamic and flexible types on the fly, had the power of functional programming, and avoided concurrency problems?

Thankfully, it's easy to go from C# to F# to IronPython and back again thanks to the common type system (CTS). And the designers of C# and Visual Basic .NET are borrowing from the best features of the dynamic and functional paradigms and integrating them straight in to the languages as first-class features. So now it's time to take a look at a few of these features and see how they relate back to what you've learned about the various languages and their paradigms.

LINQ

LINQ is a new feature in the .NET Framework that can be used directly as an API from any language in .NET or actually exposed as a language feature through the set of LINQ language extensions. LINQ can be used to query data in a storage-agnostic manner by exposing a set of query operators similar to that of the T-SQL language. Visual C#® 2008 and Visual Basic 2008 in the .NET Framework 3.5 both support LINQ as a first-class citizen.

Your classic LINQ query using Visual C# 2008 looks like this:

List<Customer> customers = new List<Customer>();
// ... add some customers
var q = from c in customers
        where c.Country == "Australia"
        select c;

This code walks the list of customers, looking for Customer objects that contain a Country property that equals "Australia", and then adds them to a new list. The first thing you'll notice is the "var" keyword, a new keyword added to Visual C# 2008 that tells the compiler to perform type inference. Then you see the C# LINQ extensions in action, with the from, where, and select keywords. These form a comprehension of the list of customers (list comprehension), just as you saw in the F# language and as is commonly featured in dynamic languages.

Because you create a new list on execution of this LINQ query, there are effectively no side effects—the original list of customers is untouched. Side effects are the thorn in the side of every concurrent program, but this style of programming gives you safety from side effects while maintaining readability and overall usefulness. And given there are no side effects, you can scale out this query as wide as you like.

In fact, that's the very purpose of Parallel LINQ (PLINQ), an extension to LINQ that is available as a community technology preview through the Parallel FX library—parallel execution of LINQ queries across multiple CPU cores—an almost linear increase in speed with no added programming cost! This looks a lot like the Map API example you saw in the functional programming section. Overall, the integration of LINQ into C# and Visual Basic has added a functional flavor, with a touch of dynamism, making these languages much more expressive, and a lot less verbose.

Inline XML in Visual Basic 9.0

Some call inline XML the best feature to be added to Visual Basic—I tend to agree. And while it doesn't look like something out of the functional or dynamic paradigms, its implementation comes directly from the dynamic language world. Figure 3 shows XML inline in Visual Basic.

Figure 3 Inline XML in Action

Dim rss = From cust In customers _
          Where cust.Country = "USA" _
          Select <item>
                     <title><%= cust.Name %></title>
                     <link><%= cust.WebSite %></link>
                     <pubDate><%= cust.BirthDate %></pubDate>
                 </item>

Dim rssFeed = <?xml version="1.0" encoding="utf-8"?>
    <rss version="2.0"><channel></channel></rss>

rssFeed...<channel>(0).ReplaceAll(rss)
rssFeed.Save("customers.xml")

This code uses the Visual Basic LINQ to XML implementation to query a list of customers for those who live in the USA and returns a list of XML elements that form the basis of an RSS Feed. The code then defines the RSS Feed schema and uses the rather curious Visual Basic "..." operator to traverse the XML document to find the "<channel>" element. Then the code calls ReplaceAll on that <channel> element to include all of the XML elements created above. Phew.

The ... operator plus the use of an inline XML element is what is interesting here. You see, Visual Basic doesn't know about the XML <channel> element—in fact, you can change that to <foo> and it will compile fine—so if it doesn't know about it, it can't possibly be statically checking that call site for correctness. Instead, it's compiling code that will do a dynamic check at run time—the kind of check you saw in the dynamic paradigm. It's leveraging the dynamic language concept of late binding.

For those who eat IL for breakfast, let's crack open the assembly and take a look at the code for the "...<channel>" statement:

ldloc.1
ldstr      "channel"
ldstr      ""
call       class [System.Xml.Linq]System.Xml.Linq.XName
     [System.Xml.Linq]System.Xml.Linq.XName::Get(string,
                                             string) 
callvirt   instance class Generic.IEnumerable'1<XElement>
           XContainer::Descendants(System.Xml.Linq.XName)

Ldloc.1 loads the rssFeed variable, loads the channel string, then calls the XName.Get method, which returns an XName. This XName result gets passed to the Descendants method, which then returns all the descendents of the XML Document that match the <channel> name. This mechanism is classic, late-bound dynamic dispatch, borrowed straight from the dynamic languages playbook.

But wait, there's more. If you drop the RSS schema definition into Visual Studio and add an "Imports" reference to the schema, the Visual Basic .NET compiler will pick that up, parse it, and then enable full IntelliSense® over any XML that adheres to the schema. So if you dropped in the RSS XML Schema, the "...<channel>" statement would immediately show an IntelliSense window with the possible descendent options available as per the schema definition. Very cool!

More to Come

I've only touched on a few examples of how languages like C# and Visual Basic are combining the best elements from other paradigms and languages. Right now, the language design teams are already adding new flavor to their languages to cope with tomorrow's programming challenges.

While I have looked at a few high-level categories of programming paradigms, there are others. Declarative programming (the art of describing what something is) using XML as the descriptor has had a huge impact on development through frameworks such as Windows Presentation Foundation and Windows Workflow Foundation. They're also excellent at describing and manipulating data. And with the ever-growing data assets that organizations have created over the years, I suspect that declarative languages will be a future driving force in trying to programmatically harness those assets. There's also logic programming (leveraging logic for computer programming), which I haven't touched on but is used widely in the artificial intelligence and data-mining spaces with languages like Prolog.

As new features are added from all these languages and paradigms, you benefit. Your favorite language gets better through cross-pollination, and you get more flexibility and productivity. And if you think there are some cool features in another language that you want to use, no problem; your favorite runtime and framework supports that interoperability scenario, too. It's great to be a .NET programmer.

Joel Pobar is a former Program Manager on the CLR team at Microsoft. He now hangs out at the Gold Coast in Australia, hacking away on compilers, languages, and other fun stuff. Check out his latest .NET ramblings at callvirt.net/blog.