VISUAL BASIC

Unleash The Power Of Query In Visual Studio "Orcas"

Ting Liang and Kit George

This article is based on a prerelease version of Visual Studio code-named “Orcas.” All information herein is subject to change.

This article discusses:
  • Visual Basic and query integration
  • Supported clauses such as Where, OrderBy, and Select
  • Understanding and using query results
This article uses the following technologies:
Visual Basic, LINQ

Contents

LINQ Syntax
The From Clause
The Select Clause
Using the Results of a Query
Filtering with the Where Clause
Sorting with Order By
More Clauses
Summary

For a long time, developers have wanted to be able to use data querying techniques in their code, much as they might in their SQL queries. Now, with the upcoming release of Microsoft® Visual Studio,® code-named "Orcas," this kind of facility is becoming a reality. The next version of Visual Studio includes a new set of language features known collectively as Language Integrated Query (LINQ), which adds data querying capabilities directly to Visual Basic® and Visual C#®.

With LINQ, you can write your data access code directly in Visual Basic. You get compile-time syntax- and schema-checking, and better tools support (such as IntelliSense®) for authoring your queries. No longer must you write a query in a string and not even know until runtime if it’s well-formed.

With LINQ, the data-access code is abstracted from the underlying storage facility. This means you can use the same code constructs for querying relational databases, XML, and in-memory objects, and you can easily join information across different source domains. For example, you can create a union between a list of customer names stored in XML and one stored in a database table. In addition, LINQ allows deferred execution, which lets you break up a query into logical parts and still get the performance benefits of running the end result only once.

Finally, Visual Basic/XML integration virtually eliminates the conceptual barrier between the code you write and the document you’re trying to express or manipulate. Now let’s start learning how to write some queries.

LINQ Syntax

Using LINQ you can execute queries on a variety of sources, including Microsoft .NET Framework object collections, relational databases, or XML data sources. The general format of a LINQ query in Visual Basic looks like this:

From ... < Where ... Order By ... Select ... >

Of course, if you’re a SQL programmer, you’ll wonder why the query begins with From instead of Select. The reason is simple: IntelliSense. By indicating the data source first, IntelliSense can show appropriate lists of type members on the objects in your collection. This is important when you’re typing the Select clause.

In the following sections, we look at some of the standard LINQ clauses and see how they build on each other to allow the creation of robust, flexible queries. We’ll walk through each of the following in turn: the From clause, the Select clause, using the result of a query, the Where clause, and the Order By clause.

In the following examples, we sometimes refer to a Customer or an Order. For those examples, imagine they are defined as in Figure 1. Each Customer has a reference to its own collection of orders via the Orders property. This allows for a simple relationship between the two entities. The association is formed by the CustomerID foreign key’s presence in each Order. In addition to these type definitions, there’s a collection of Customers stored in the variable customers.

Figure 1 Customers and Orders

Public customers As List(Of Customer)

Public Class Customer
    Public Property CustomerID As Integer
    Public Property FirstName As String
    Public Property LastName As String
    Public Property City As String
    Public Property Orders As List(Of Order)
End Class

Public Class Order
    Public Property OrderID As Integer
    Public Property CustomerID As Integer
    Public Property OrderDate As DateTime
End Class

The From Clause

The From clause is the heart of LINQ, since it’s required in every query. A query must always start with the From clause, and it’s the only clause that must be supplied explicitly. The From clause follows this basic pattern:

From <query variable> In <datasource>

Essentially, From represents a walk through a set of data. For that to work, it needs a variable that corresponds to each item of data in the source, much like an iteration variable used in a For Each statement. In clauses following the From statement, the query variable can be referenced to filter, sort, or otherwise manipulate data.

Here’s a really simple example of a From clause:

Dim numbers As Integer = {1, 7, 4, 2, 91, 12, 23, 37, 42}
Dim allNumbers = From number In numbers
For Each num In allNumbers
    Console.Write(num & “ “)
Next

In this example, the query returns a collection typed as IEnumerable(Of Integer). There are a couple of things to notice about this query as well. First, the Select is implied, which is why you don’t see it in the example. A Select statement is optional; if you don’t supply one, then the set of variables in scope is returned. The second thing is that this query is really not very useful. You could remove the query and simply iterate over the numbers variable using a For Each loop. The real power of LINQ is not evident just yet. But look at this query:

From <query variable> As Type In <datasource>

Dim checkboxes = From checkbox As Checkbox In groupJobType.Controls

This variant of the From statement allows you to specify a type statement following the query variable. This is useful if you want to retrieve types from a particular source, and retype them as you retrieve them, to a more (or perhaps less) concrete type. We happen to know that all of the controls in the groupJobType GroupBox are Checkbox controls, so we can safely type them to Checkbox as we retrieve them.

There is another variant of the From clause that uses the From keyword twice:

From <query variable> In <datasource> From <let variable> = <value>

From <query variable> In <datasource>, <let variable> = <value>

Dim evens = From Number In numbers From IsEven = (Number Mod 2 = 0)
Dim evens = From Number In numbers, IsEven = (Number Mod 2 = 0)

This query syntax is actually all one From clause, with the latter part known as the "Let" part of the From. Essentially we’re defining a second variable and assigning it to the specified value. In this case, we’re defining a second value on our resultset called "IsEven," which is a Boolean value—the result of whether or not our selected number is even. This of course means that what is returned is changed: it’s no longer just an IEnumerable of Integers, as each item in the returned enumerable contains both a number and a Boolean. We’ll see how to use the results of a query below. Notice in the general syntax expression provided that it’s possible to omit the second use of the From keyword and use a comma instead; both examples above are equivalent.

What distinguishes a Let variable from a standard query variable is that the Let variable is always assigned a value using =, while a query variable is followed by the In keyword. However, the Visual Basic team is actively exploring the way the Let variable is exposed, so there’s a good chance the supported syntax for Let will change between now and the time the product is released in final form:

From <queryVar1> In <datasource1>, <queryVar2> In <datasource2>, ...

Dim customerOrders = From cust In customers, ord In cust.Orders

This syntax allows you to specify multiple data sources. In the above example, we get a collection of customers and orders; essentially, this is an implicit Join on the two sources. Much of the time, the sources won’t be so easily related (your first query variable won’t have a collection of the second query variable on it), and therefore you’ll have to associate them with a Where clause. But in this case, our Orders collection is actually on each cust instance; therefore we’re only getting the orders that apply to the given customer.

The Select Clause

So far you’ve seen how to specify a data source. Note that in every example we’ve omitted the Select keyword. This simply means that we used the default Select behavior, which is to Select every query or Let variable specified (logically equivalent to Select *). But we also have the power to specify a Select clause, and we get considerably more flexibility when we do:

Select <varA>, <varB>, ...

Dim allNumbers = From number In numbers _
                 Select number

Dim evens = From Number In numbers From IsEven = (Number Mod 2 = 0) _
            Select Number, IsEven

Dim customerOrders = From cust In customers, ord In cust.Orders _
                     Select cust, ord

This is Select at its simplest. Compare the earlier queries to their equivalent queries in the From section here. Every query in these three examples with an explicit Select clause will have the same behavior as the earlier queries where Select was implicit.

Using Select gets more interesting, however, when you try variants on this theme:

Dim customerInfos1 = From cust In customers _
                     Select cust.FirstName, cust.LastName

Dim customerInfos2 = From cust In customers, ord In cust.Orders _
                     Select cust.LastName, ord.OrderID, ord.OrderDate

Notice first of all that a new general syntax is not specified (there’s no comment above the example). This is because if you look at these examples, the previous code is in the same style: FirstName and LastName are simply variables after all. But unlike the previous examples, we dereference the query variable and return only a couple of properties. This is a very powerful capability.

Another great feature of LINQ is that when you don’t need to rename fields, Visual Basic infers the names based on the names of the properties. Sometimes, however, you may want to rename fields, or merge data together, like so:

Select <aliasA> = <varA>, <aliasB> = <varB>, ...

Dim customerInfos1 = From cust In customers, ord In cust.Orders _
                     Select Date = ord.OrderDate, _
                            Name = cust.FirstName & cust.LastName

In some scenarios (like Date = ord.OrderDate), you may be doing it for personal preference. In another scenario (as in Name = cust.FirstName & cust.LastName) you actually want to create a new piece of data—a combination of fields. There is also a third scenario for using aliases. Imagine you’re selecting two fields from different sources that happen to have the same name. Name inference would give those variables the same property name on the resulting type, and this would fail to compile due to the conflict. You can use an alias for one (or both) conflicting field names to resolve the conflict.

One final point to note about the Select clause is that it rescopes your variables, so any clauses in a query can only see what Select exposes. We discuss this in more detail when we talk about the Where clause.

Using the Results of a Query

Now let’s see how to use the results of the query. To do so you’ll need to understand what’s formed when a query is executed:

Dim allNumbers = From number In numbers _
                 Select number

The result of any query will always be an IEnumerable(Of T). This is one of the most reassuring and powerful aspects of queries, since it means you can always iterate over the data. Sometimes you may know what the type of T is, and sometimes you may not, such as in the case of anonymous types.

Anonymous types are new with LINQ and are used to represent any result of a query with multiple returned field columns. An anonymous type is a type that you can’t directly name or reference in code but that the compiler treats just like other named types (the compiler gives the type a name, you just don’t know what it is, nor do you need to). The field name of an anonymous type can be specified directly in code or inferred by the compiler from the query expression automatically. For example, the use of anonymous types enables code like you saw earlier in which the pairings of {Num, IsEven} and {cust, ord} are effectively new types, each with two pieces of data (with Num, IsEven and Cust, Ord as respective, accessible field names):

Dim customerInfos1 = From cust In customers _
                     Select cust.FirstName, cust.LastName

Dim customerInfos2 = From cust In customers, ord In cust.Orders _
                     Select cust.LastName, ord.OrderID, ord.OrderDate

As you can see, anonymous types allow you to specify a varying number of fields in the Select clause without requiring you to specify an exact type beforehand that represents each particular combination of fields.

While we say that each query is an IEnumerable(Of T), you may have noticed that none of the examples actually specify a type for the variable representing the query. You may have become concerned that the variable is getting typed as System.Object and that you’re losing all strong typing support in the process.

Not so. A new compiler feature called type inference means that the compiler can infer the type of a variable declared with no type specifier, based on its initialization value (if you include the statement Option Infer On at the top of your code). For example, the following variables are statically typed based on the initialization value:

Dim x = 5             ‘ x is typed as an Integer
Dim y = DateTime.Now  ‘ x is typed as a DateTime
Dim z = GetCustomer() ‘ assuming GetCustomer returns a customer
                      ‘ type, z is typed as a Customer

x = “hello”           ‘ COMPILE-TIME ERROR: x can only store Integers

In the same way, a variable representing a query is strongly typed. This enables you to actually use members directly on that variable, without fear of those members being late-bound. Plus, you retain other benefits of strong typing such as IntelliSense support. The same kind of type inference happens in a For Each statement, when a new variable is declared with no type specifier:

 ‘ assuming allNumbers is a collection of Integers, 
 ‘ the type of num is inferred to be an Integer
For Each num In allNumbers

In general, what you can’t easily know when using queries is what specific kind of object is being returned. This is where you have to put your trust in queries. Under the covers, different kinds of objects may be used to store your query results. All that’s important to know is that they will always be an IEnumerable(Of T), therefore you can iterate through them with a For Each or pass them to an API that accepts an IEnumerable object.

Having said that, there is a rule of thumb that will help you know what is returned, and this rule is important (you’ll see why in a second). When the results you’re returning consist of a single named type, the query returns an IEnumerable(Of <your type>). When the results you’re returning are of more than one type, the query returns an IEnumerable(Of <anonymous type>). So in the example above, which just returns an integer from an array of Integers, you’ll get an IEnumerable(Of Integer).

When the query result is an IEnumerable of a single type, using that result is easy. You simply use a For Each clause:

For Each num In allNumbers
    Console.WriteLine(num)
Next

There is nothing here that should surprise you since we have an IEnumerable of a single type. But now let’s look at a scenario that involves anonymous types:

Dim evens = From Number In numbers From IsEven = (Number Mod 2 = 0) _
            Select Number, IsEven

For Each numInfo In evens
    Console.WriteLine(“Is “ & numInfo.Number & “ even? “ & _
                      numInfo.IsEven)
Next

Because the query returns an object that has two pieces of data in it, the variable "evens" is an IEnumerable(Of <anonymous type>). Each entity we get out has fields named Num and IsEven; therefore, when we iterate over our evens collection, we need to dereference each of these pieces of data to retrieve them (numInfo.Num for example). Notice a couple of things here. First, we named the iteration variable numInfo. When iterating over collections of anonymous types, it’s a good idea to use a general name to indicate that it has multiple pieces of data. Second, we capitalized the query variables in the example, so when we dereference and use them inside the For Each loop, they look like standard properties. We didn’t need to do this for "number" in the previous example, since an anonymous type was not introduced.

Just to show how easy this is, let’s see one more example:

Dim customerInfos = From cust In customers, ord In cust.Orders _
                    Select Name = cust.FirstName & cust.LastName, _
                           Date = ord.OrderDate

For Each custInfo In customerInfos
    Console.WriteLine(“Customer Name: “ & custInfo.Name & _
                      “ Order Date: “ & custInfo.OrderDate)
Next

It really doesn’t get much easier. However, sometimes you may want to retrieve your first name and last name fields separately. Here’s a variant on the above that retains all of the data in our Customers and Orders (we want to use the other fields later) but that achieves the same for this specific iteration.

Dim customerInfos = From cust In customers, ord In cust.Orders _
                    Select Customer = cust, Order = ord

For Each custInfo In customerInfos
    Console.WriteLine(“Customer Name: “ & _
                       custInfo.Customer.LastName & “, “ & _
                       custInfo.Customer.FirstName)
    Console.WriteLine(“Order Date: “ & custInfo.Order.OrderDate)
Next

Again, nothing unexpected. By including each entire object in our results, we have to dereference the objects to get to the specific fields we want to consume. Note that we concatenate the last and first names inside the For Each loop.

We’ll leave you with one further tip. You can do some pretty cool things if you look at the members available on your query. For example, consider this code:

Dim customerInfos = From cust In customers, ord In cust.Orders _
                    Select Customer = cust, Order = ord
Console.WriteLine(“There are “ & customerInfos.Count & _
                  “ customer orders”)

This example uses Count directly on the result of the query to see how many items it has. We won’t get into all of the members you can find on the collection, but don’t forget to check out what kinds of things you can do with your resultset. Also note that these members are available on all collections resulting from a query. If you surmised that this means these members must be on IEnumerable(Of T), you’re right; even their implementations must be. This is a new capability called extension methods.

One concept you must understand about how queries execute is deferred execution. In deferred execution, an attempt is not made to retrieve the data from a query until it is actually used. This is important because it allows you to write the query in different parts, without multiple calls to the underlying data source, and there are significant performance benefits, especially when accessing information against a database (multiple calls in that situation should be especially avoided).

What this means is that the declaration of a query does not actually retrieve the data; only taking an action that needs the data causes it to be retrieved. The action you’ll probably take most often is to iterate over the query. But other possible actions include displaying the data or trying to use a method such as Count to find the number of entries in the query. Figure 2 shows where data retrieval is performed.

Figure 2 Data Retrieved in the For Each Clause

Dim seattleCustomers = From cust In customers _
                       Where cust.City = “Seattle”

Dim recentOrders = From ord In orders _
                   Where ord.OrderDate.Year = 2007

Dim custInfos = From cust In seattleCustomers, ord in recentOrders _
                Where cust.CustomerID = ord.OrderID _
                Select Name = cust.LastName & “, “ & cust.FirstName, _
                              cust.CustomerID, ord.OrderDate _
                Order By OrderDate Descending, Name

‘ None of the above queries are run until this point.
‘ This means that the information is only retrieved once.
For Each custInfo In custInfos
    ‘ take action on the data here
Next

Filtering with the Where Clause

The Where clause makes your query more powerful by letting you filter on specific conditions. The Where clause operates just like an If statement. That is, whatever is possible in an If statement is possible in a Where clause:

Where <condition>

Dim evens = From num In numbers _
            Where num Mod 2 = 0 _
            Select num

Notice that this collection is called "evens," as in the previous examples. Unlike previous examples, however, it consists only of the even numbers. When you iterate over the results, you’ll only get the even numbered results:

Dim mids = From num In numbers _
           Where num > 10 And num < 50 _
           Select num

The beauty of the Where clause is its simplicity. Your <condition> is any condition you choose. Let’s see how it works on a more involved query:

Dim seattleCustomers = From cust In customers, ord In cust.Orders _
                       Where cust.City = “Seattle” _ 
                       Select Customer = cust, Order = ord

Dim seattle2003Orders = From cust In customers, ord In cust.Orders _
                        Where cust.City = “Seattle” _ 
                        And ord.OrderDate.Year = 2003 _ 
                        Select Customer = cust, Order = ord

Sometimes you might want to filter by one field, and Select a different field (or fields). This is very straightforward:

Dim dallas2005Orders = From cust In customers, ord In cust.Orders _
                        Where cust.City = “Dallas” _ 
                        And ord.OrderDate.Year = 2005 _ 
                        Select cust.CustomerID, ord.OrderID

Once you’ve declared your From clause, the order of the other clauses are optional—you can use them in any order you prefer:

Dim seattle2003Orders = From cust In customers, ord In cust.Orders _
                        Select Customer = cust, Order = ord _
                        Where Customer.City = “Seattle” _                     
                              And Order.OrderDate.Year = 2003               

Compare this example to the example of the same name above. Notice that we’ve moved the Select clause to before the Where clause, which is perfectly acceptable. But notice also that the Where clause must change slightly and now refers to Customer and Order rather than cust and ord. Be very aware that the Select clause is a rescoping clause, thus any clauses after the Select clause can only see what the Select exposes, which means that this code will be invalid:

‘ This won’t compile
Dim dallas2005Orders = From cust In customers, ord In cust.Orders _
                       Select cust.CustomerID, ord.OrderID _
                       Where cust.City = “Dallas” _ 
                              And ord.OrderDate.Year = 2005 

We refer to ord and cust inside the Where clause, but Where comes after Select and, therefore, those two variables have disappeared. After Select, all we’re left with is an anonymous type with two properties, CustomerID and OrderID. The code can be changed like so:

Dim dallas2005Orders = From cust In customers, ord In cust.Orders _
                        Select cust.CustomerID, ord.OrderID, _
                               cust.City, ord.OrderDate _
                        Where City = “Dallas” _ 
                              And OrderDate.Year = 2005 

Note that many clauses, including Where and Order By, do not rescope the available variables.

Sorting with Order By

Order By is one of the fundamental clauses of any query. Order By sorts the results of your query by the specified field or fields:

Order By <query variable>

This is Order By at its simplest. Without any order direction specified, it’s assumed that you want the results in ascending order. Here’s the clause in action:

Dim nums = From num In numbers _
           Order By num _
           Select num

The next variant of Order By is almost identical to the first. It just assumes that you want to Order By a field inside a variable, rather than the variable itself:

Order By <field1>, <field2>, ...

Dim customerInfos1 = From cust In customers, ord In cust.Orders _
                     Order By cust.City _
                     Select Customer = cust, Order = ord

Dim customerInfos2 = From cust In customers, ord In cust.Orders _
                     Order By cust.City, ord.OrderDate _
                     Select Customer = cust, Order = ord

It’s fine to select either one or more than one field. Also, because Order By does not rescope the available variables, the following Select clause can use any variables available from the previous scope:

Order By <var1> [Ascending/Descending], _
         <var2> [Ascending/Descending], ...

Dim customerInfos = From cust In customers, ord In cust.Orders _
                    Order By cust.City Descending, ord.OrderDate _
                    Select Customer = cust, Order = ord

This is a powerful use of Order By. We have omitted the direction from the second column but included it on the first, which means we will sort first by city in descending order and then by order date in ascending order.

Of course, you can combine Order By with Where, so let’s finish with a couple of examples:

Dim orderedEvens = From num In numbers _
                   Where num Mod 2 = 0 _
                   Order By num _
                   Select num

Dim seattleCustomers = From cust In customers, ord In cust.Orders _
                       Order By cust.City Descending, ord.OrderDate _
                       Where City = “Dallas” _                
                                     And OrderDate.Year = 2005 _
                       Select Name = cust.LastName & “, “ _
                                     cust.FirstName, ord.OrderDate

More Clauses

Discussing the other available clauses is a topic for another article, but many of the standard querying capabilities are supported, usually as clauses like the ones covered in this article. For example, joins, aggregations, and grouping are supported. You’ve seen implicit joins in examples above, but explicit joins can be executed using the Join keyword. Aggregating information is supported with the Aggregate clause, and grouping is supported with a combination of the Group By clause and pre-supplied aggregation methods such as Sum, or Average (though you can always create and consume your own as well). And standard query capabilities such as inserting, updating, and deleting are supported, just not directly from the query syntax.

Summary

LINQ provides basic query capabilities, including specifying a source (From), identifying data to return (Select), filtering (Where) and sorting (Order By). Advanced query capabilities to group, join, or aggregate your sets of data are also available. With these capabilities, supported by other features such as anonymous types or type inference, you have everything you need to write SQL-like queries directly in your Visual Basic code.

Ting Liang is a Software Design Engineer with the Microsoft Visual Basic Compiler Team. He has been involved extensively with the prototyping and implementation of the Visual Basic LINQ project.

Kit George is a Program Manager with the Microsoft Visual Basic Team. Kit helped design and deliver the Visual Basic LINQ features.