Programming With XML Using Visual Basic 9
Microsoft Visual Basic 9.0
Microsoft Visual Studio 2008
Summary: In this paper, using a realistic application, we will look at the new features and capabilities that are available in Microsoft Visual Basic 9.0 that relate to programming with XML. (18 printed pages)
LINQ to XML in Visual Basic 9.0: Key Features
Getting Home Sales Data from Windows Live Expo
Using XML Literals and Embedded Expressions to Create the New Worksheet
Modifying the Office Excel Workbook Container
Putting It All Together
Productivity Enhancements Within Visual Basic 9.0 and Visual Studio 2008
Programming with XML has traditionally been done using the document object model (DOM) API, XSLT, and XPath. Using this approach, the developer not only must understand XML, but also must become proficient in these additional technologies in order to be productive.
Microsoft Visual Basic 9.0 provides an alternative to the traditional approach through the implementation of the new LINQ to XML (formerly known as XLinq) features and incorporating XML literals and "axis" properties that simplify accessing and navigating XML structures. One of these new features, specifically XML literals, is unique to the Visual Basic language in Microsoft Visual Studio 2008 and provides a distinctive approach to programming with XML. In addition, enhanced user experience and debugging support within Visual Studio helps improve a developer's productivity while working with XML.
In this paper, using a realistic application, we will look at the new features and capabilities available in Visual Basic 9.0 that relate to programming with XML.
We will be looking at an application where information on homes for sale is extracted from an RSS feed and placed into a Microsoft Office Excel workbook. This application runs on a server, builds the Office Excel workbook, and then makes it available for downloading to the client. Online listings are fine, but sometimes you want to just download all that into Office Excel, so that you can do some more processing of your own offlinesuch as which homes you've visited, tracking attributes that interest you, or performing such analyses as computing the average cost per square foot.
In this application, we will be using a LINQ to XML query to extract XML from Microsoft Windows Live Expo (expo.live.com) using the Windows Live Expo API and the Expo Web service that exposes an HTTP/GET request interface with XML response. (Click here for information on getting started with Windows Live API.) The user will provide a ZIP code as the basis of what home information to extract. They will also provide a distance (in miles) to define the area to be searched for houses for sale. After this information is extracted from the Expo Web service, it will be used to create a new Office Excel workbook on the server by creating an Office Open XML file. This process will demonstrate the use of embedded expressions to insert the data into the workbook. Finally, the user will be given the option to download the workbook for use on the client computer.
XML literals. In Visual Basic 9.0, one can treat XML as a literal value. The developer experience is enhanced when working with XML literals with autocompletion and outlining. You can create a variable and associate it to an XML literal by either typing the literal or, more likely, pasting some XML directly into the code editor. This feature is unique to Visual Basic 9.0. In the sample application, we will create an XML literal by pasting Office Open XML that represents an Office Excel worksheet.
XML literals can also contain embedded expressions. An embedded expression is evaluated at run time and provides the means to modify the XML dynamically, based on values obtained while the program is executing. In our sample application, we will be using embedded expressions to insert values taken from an RSS feed, and inserting them into the XML that represents an Office Excel worksheet.
Figure 1 shows an XML literal and embedded expressions.
Figure 1. XML literal with an LINQ to XML query and embedded expressions
XML axis properties. XML axis properties, also known simply as XML properties, are used to reference and identify child elements, attributes, and descendent elements. These provide a shorter, more readable syntax than using equivalent LINQ to XML methods. Table 1 provides a summary of XML properties. Intellisense is provided for axis properties when an appropriate XML schema is added to the project. Details on this feature will be explored in the last section of this article.
Table 1. XML axis properties
Uses the <> notation to return the child elements with the supplied name
Uses the @ notation to return the string value of the attribute with the supplied name
Uses the ...<> notation to return the descendents elements with the supplied name
XML namespaces. To define a namespace that can be used in XML literals or a LINQ to XML query or embedded expression, one uses the Imports statement. In our application, we will have to define a specific namespace that relates to the RSS feed that is supplied by the Windows Live Expo API. To define this namespace within our code, we will use the following Imports statement:
This statement creates an identifier named expo that can be used to qualify an element within queries and literals. The following line of code returns all the category child elements in the expo namespace:
To define a default namespace, one also uses the Imports statement, but no identifier is provided. An example definition of a default namespace is the follow:
By default, the empty namespace is the "default" namespace. If you define another namespace to be the default and you need the use of the empty namespace, you can create a prefix for the empty namespace as in the following example.
Type inference. In previous versions of Visual Basic, dropping the As clause from a variable declaration resulted in the variable being typed as the Object type, where late binding was used to deal with the contents of the variable. In Visual Basic 9.0, the type of local variables is "inferred" by the type of the initialize expression on the right-hand side. For instance, in the following statement:
Dim x = 1
the x variable would be inferred as Integernot Object, as in earlier versions of Visual Basic. If you look at the code in Figure 1, you will see that type inference is being used for the sheetTemplate variable. Because the variable is being assigned an XML literal, its type is implied as XElement, because that is the type of the XML literal on the right-hand side. Visual Basic 9.0 supports a new option named Option Infer. This option is used to turn type inference on or off, and is on by default. It is important to note that type inference applies only to local variables. Type inference does not apply to class-level and module-level variables. This means that in the class definition that follows, the status variable will be an Object type, not a String type.
Public Class DemoClass
Dim status = "Default"
We are using the XML over HTTP interface named http://msdn2.microsoft.com/en-us/library/bb246477.aspxListingsByCategoryKeywordLocation_V2 to get the data in the form of an RSS feed. The syntax of this request is the following:
When making the call to the service, the user may provide a number of parameters. For this application, the parameters that will be passed to the service are the following:
Application key (see http://expo.live.com/myapikeys.aspx)
Using this Web service, we obtain the RSS feed and store it as an XElement type. The code to do this is straightforward and is shown in Figure 2. Notice that in defining the feed variable, we are using "type inference," as described earlier.
Private Function GetSheetXML() As XDocument
' get the "xml over http" rss feed url
Dim url As String = BuildURL()
' get the xml feed - note that type inference is used here
Dim feed = XElement.Load(url)
Figure 2. Getting the RSS feed
The application uses a helper function named BuildURL to define the service call and its parameters. This function is shown in Figure 3. Note that there is little error checking in this application. This is not because error checking is not needed, but instead because the error checking could make it harder to focus on the technologies that are the subject of this article. Also, be aware that MyAppID is a numerical identifier that is available to developers from the Windows Live Expo API site.
Figure 3. Building the Web service call URL (Click on the picture for a larger image)
The XML that is returned from the Web service is in the form of an RSS feed. A sample of this XML is shown in Figure 4.
Figure 4. The RSS feed from expo.live.com
An important attribute in the <rss> element is the namespace definition:
We will have to define this namespace in our code in order to identify correctly elements in the document that are qualified by the namespace (such as the <classifieds:totalListings> element in Figure 4).
The key elements for our application in the RSS feed are the <item> elements. Figure 4 shows one <item> element; but, in reality, there are many returned from the service. A closer look at an <item> element reveals the elements and attributes with which we will be working. Figure 5 shows the XML.
Figure 5. Details of the <item> element (Click on the picture for a larger image)
We are interested in data about the homes for sale including the price ; the ZIP code (postcode) ; and details on the home, including number of bedrooms and bathrooms, the year the home was built, and the size of the home . Note that the element named <classifieds:LOT_SIZE> is actually storing the square footage of the home.
You can see how we will be using these elements in the worksheet that is shown in Figure 6.
Figure 6. Spreadsheet created from the RSS feed (Click on the picture for a larger image)
The other thing that is noted in the <item> element in Figure 5 is the classifieds:transactionType attribute .
We will now look at the code that will process the XML RSS feed data. The RSS feed includes homes both for sale and for rent, so the first thing that we must do is to get only elements that are for sale. To do this, we must query the RSS XElement holding the feed; therefore, we use a LINQ to XML query, as seen in Figure 7 (which is the continuation of the GetSheetXML() function that is shown in Figure 2). Again, note the use of type inference in the definition of itemList.
Figure 7. LINQ to XML query to get selected <item> elements (Click on the picture for a larger image)
In addition to this code, namespaces must be defined. This is done at the beginning of the code using Imports statements, as shown in Figure 8. We will discuss these namespaces as they are used in the code.
' define an expo.live namespace
' define an empty namespace
' define the default namespace
Figure 8. Defining the namespace
Let's look at the LINQ to XML query in Figure 7 in some detail. The From clause identifies an iterator named itemElement that refers to the element that is in scope for each iteration. The expression feed...<empty_ns:item> identifies the IEnumerable(Of XElement) "feed" reference and uses the "descendants axis" (...) to get all the <item> elements within the feed reference, no matter how deeply they occur. The XML axis property must be qualified with the empty namespace. Otherwise, the default namespace—in this case, http://schemas.openxmlformats.org/spreadsheetml/2006/main—would be applied to the axis property. The Where clause filters these <item> elements to only those that have "For Sale" in the <category> item's transactionType attribute. Note the use of the namespace identifier (expo) and attribute axis (@) in the query syntax.
Because our ultimate objective is to place data from each "For Sale" item into a separate row of our worksheet, we need a way to identify easily the row in the worksheet in which each selected item will be stored. In this application, we do this by converting the itemList—which is an IEnumerable—into a List(Of T), so that we can later use the list's IndexOf() method to get each <item> index and use it to determine a row number in the worksheet.
Now that we have our XML extracted from the Live Expo site, we must get it into an Office Excel 2007 worksheet. To do this, we will need to understand the Office Open XML File Format. The specification for this format is quite extensive; we will touch only the surface, as far as our understanding is concerned. In addition, an excellent reference that is specific to the Office Open Excel File format is available on the Web. (Please see Standard ECMA-376 Office Open XML File Formats.) Note that when we talk about Office Excel in this article, we are referring to Office Excel 2007.
An Office Excel file (.xlsx) is a container file or package that is actually in an industry-standard ZIP file format. Each file comprises a collection of parts, and this collection defines the document. For an Office Excel file, these various parts and their relationships are shown in Figure 9.
Figure 9. The various parts and their relationships in an Office Excel document (Click on the picture for a larger image)
You can see these parts if you open an Office Excel document using a ZIP application. Figure 10 shows such a view. Notice the paths that are shown in the figure; they give you a sense of the file and directory structure within the document. To work with an Office Excel document as a ZIP archive, just change the file extension from ".xlsx" to ".zip".
Figure 10. Files (parts) stored within the ZIP container of an XML document (Click on the picture for a larger image)
In our application, we will store an existing Office Excel document file (named baseWorkbook.xlsx) on the server. We will then build a new worksheet (such as sheet1.xml, shown in Figure 10) using the XML <item> elements we extracted from the RSS feed. We will then delete the existing worksheet from the Office Excel document and then add our new one. Finally, we will offer the user the opportunity to download the newly modified workbook with the newly added worksheet.
We are ready to write the code to create the new worksheet using the <item> elements from the RSS feed. It must be restated that we have just touched the surface of Office Open XML Format. In fact, there are things that we might want to add to our workbook that might cause "issues" when we open the workbook. These issues deal with the many parts of the document and their relationships. If you add XML that does not include all the relationships, an informational dialog box might be displayed indicating that there are issues that must be resolved. You are given the option to have Office Excel try to fix these issueswhich really means that it attempts to resolve the references, update shared-value tables, and so on.
The code that we will see next is a continuation of the GetSheetXML() function that was shown earlier. We will be working with a large XML literal that was created initially by copying and pasting the complete XML definition of the worksheet from an existing Office Excel workbook. This XML literal is then modified by placing embedded expressions at the appropriate locations.
We begin by looking at a few lines of code that define some parameters for modifying the XML literal that represents the new worksheet. This code (again, a continuation of the GetSheetXML() function) is shown in Figure 11.
Figure 11. Continuation of the GetSheetXML() function that creates the new worksheet (Click on the picture for a larger image)
The code in Figure 11 determines what the last row will be by first determining the number of <item> elements in the RSS feed. Because there will be one new row in the worksheet for each <item> element, we can determine the last row (because the first rowthe headingsare in row 2, we calculate the last row by adding 2 to the number of <item> elements). The last line of code in Figure 11 sets the value of a String variable that defines the cell range for our worksheet.
Now, we start working with the XML literal. Figure 12 shows the first few lines of the literal. We started by writing the code:
Dim sheetTemplate = _
and then just pasting in the XML definition from the existing Office Excel worksheet. This is one of the great features in Visual Basic 9.0: Instead of having to create a document using the DOM API, we just take the XML that we want to manipulate and paste it into our code as an XML literal. Then, we replace the original "ref" attribute value from:
ref=<%= cellRange %>
This embedded expression uses the previously defined cellRange variable to take into account the new rows of data to be added.
' finally we go into the actual XML literal and insert the new range
' and the appropriate data from the RSS feed
Dim sheetTemplate = _
<dimension ref=<%= cellRange %>/>
Figure 12. First part of the XML literal
Also note that the object reference named "sheetTemplate"—used to store the XML literalis defined using type inference. In this case, the type will become an XDocument, as opposed to an XElement. The difference between the two is that XDocuments may contain processing instructions (PI) and comments before the root element definition, while XElement types cannot. The following special XML declaration:
in our document will be seen by the type-inference engine and, therefore, will type "sheetTemplate" as an XDocument.
The final step that we must perform is to define the rows using values from each <item> element in the RSS feed. Figure 13 shows this code. (There is additional XML in the literal between the <dimension> element in Figure 12 and the start of the embedded expression in Figure 13. See the code download for this article to view this XML.) There is a lot going on in these lines of code. First, note that we have a LINQ to XML query that queries across the List(Of XElement) RSS feed named rssItems:
<%= From item In rssItems Let rowNum = rssItems.IndexOf(item) + startRow _
As mentioned earlier, we must identify the row number for the row that we are inserting; we use the IndexOf method of the List to compute this. This computed value is stored in a local variable named rowNum that will be computed for each iteration of the query.
For each item in the collection, we select a number of values and use them in embedded expressions. These embedded expressions are fairly straightforward.
Figure 13. Creating new rows in the worksheet using the RRS feed data (Click on the picture for a larger image)
We are accessing specific data items from the feed. For example, in column D of each row, we are adding the value of the YEAR_BUILT element (item.<expo:details>.<expo:YEAR_BUILT>.Value). Note that we have used the namespace that we defined in Figure 8. As you will see in a later section, XML Intellisense is a big help in entering element references. We can easily create the formula found in column H by using an embedded expression that includes string constants, row values, and string concatenation. Something that Visual Basic programmers must remember is that XML is case sensitive; this means that XML properties, like attribute names, are case sensitive.
The issue of namespaces and how they are applied within the XML literals and XML axis properties can be confusing, and it is helpful to review what we have done here. Figure 14 summarizes what is happening by showing the namespace definitions and Tool Tip–enhanced segments of code. In this figure, we see <expo:category>, which resolves to fully qualified name:
This resolution is the result of applying the <expo> prefix. Similarly, we see <row>, which resolves to fully qualified name:
This resolution is a result of the fact that the default namespace is being applied. Finally, <empty_ns:item> resolves to item, because the empty namespace is applied (the empty_ns prefix).
Figure 14. Namespaces applied to code (Click on the picture for a larger image)
Finally note the entire new XDocument contents are returned by the function. In the next step, we will take this XDocument and place it into our workbook document container.
As mentioned previously, the Office Excel workbook is a container stored in the standard ZIP archive format. Microsoft introduced a new API, known as the Packaging API, with the introduction of Microsoft .NET 3.0. This API, which is found in WindowsBase.dll, must be added as a reference to the application in order to get access to the packaging API.
For this application, a small class named SpreadSheet has been created to manage the workbook. It includes a constructor that opens the package and establishes a reference to the workbook part (sheet1.xml) that will be replaced with the new worksheet that is stored in the XDocument object. The original worksheet must be removed, so there is a RemoveOldSheet method. The new worksheet must then be added, so an AddNewSheet method is included for the class.
The complete code for this SpreadSheet class is available in the code download for this article.
With the background of the code presented previously, it is time to look at the main Web application and code that orchestrates the fetching of the RSS feed, converts it to a new Office Open XML file, and then replaces an existing worksheet with the newly created one.
Figure 15 shows the user interface for the application prototype. The user supplies a ZIP code as the center of the home search and a distance in miles around that ZIP code. A simple click event is defined for the Get Spreadsheet button.
Figure 15. The Web interface for the prototype application
When the click event finishes executing, a hyperlink pointing to the newly modified Office Excel workbook is made visible as shown in Figure 16. This allows the user to download the workbook to the client machine.
Figure 16. The Web interface with the worksheet download link active
The code for the Web application and the first part of the click event is shown in Figure 17. Note the Imports of the necessary namespaces. Regarding the system-derived namespaces, this application was built using Beta 2 of Visual Studio 2008; as later betas and release candidates are released, there might be a need to use a different set of Imports. Also note the Import statements that define the XML namespaces used within the Expo Live RSS feed, an empty namespace, and the default namespace used within the Office Open Excel worksheet XML document.
Figure 17. The Imports and first few lines of the Web application (Click on the picture for a larger image)
Note As of Beta 2, the Web application template does not include the reference to System.Xml.Linq.dll. In Beta 2, this reference is located at C:\Program Files\Reference Assemblies\Microsoft\Framework\v3.5.
We looked at the GetSheetXML() function in Figures 2, 7, and 11 through 13. This function got the RSS feed from Expo Live by using the user-supplied parameters; then, it took the RSS feed, and used an XML literal and embedded expressions to build the new worksheet.
The final steps involve using the SpreadSheet class and the Packaging API to replace an existing worksheet with our new one. Figure 18 shows this code as the continuation of the click event.
Figure 18. Continuation of the click-event code (Click on the picture for a larger image)
Note that the first part of the code works with getting some configuration settings from the web.config file. These settings define the location of the template workbook as well as the relative location of the workbook part that will be replaced (sheet1.xml). The relevant section of web.config is shown in Figure 19.
Figure 19. Application settings from the web.config file (Click on the picture for a larger image)
The new worksheet first must be saved, because the Packaging API can add only a part from a file. Following this, a new SpreadSheet object is created and used to remove the old worksheet part and replace it with the new worksheet part. Finally, the hyperlink that points to the updated workbook is made visible.
This concludes the in-depth description of our sample application. It shows how Visual Basic 9.0, with its LINQ to XML and embedded XML features, provides an extremely powerful way to work with XML. Note in particular how important the use of XML literals and XML axis properties—which are unique to Visual Basic 9.0—were to the application solution. Next, we look at some of the productivity features in Visual Basic 9.0 that make it much easier for the developer, working within Visual Studio, to use the new language features.
When dealing with XML literals, there are two important features that are available within Visual Basic 9.0. The first is autocompletion. With autocompletion, when an opening element is entered into the code, the closing element is entered automatically. In addition, if you change the spelling of the opening tag, the system will automatically change the spelling of the matching closing tag. The second is outlining where the literal can easily be collapsed or expanded based on parent/child relationships.
In addition to outlining and autocompletion, XML literals are checked for syntax. Figure 20 shows an XML literal that contains two errors. In the top image, an attribute value is shown without being enclosed in quotes. When that error is corrected, the lack of the closing ">" character in the </phone> element is highlighted.
Figure 20. Syntax error in an XML literal (Click on the picture for a larger image)
Arguably, however, the most significant productivity enhancement in Visual Basic 9.0 deals with Intellisense and XML axis properties. If a schema is available within the project for the XML, the information from the schema is used within the context of Intellisense to provide the developer with a set of choices while entering LINQ queries and embedded expressions.
For our application, we do not have a schema available, so we must create one. Fortunately, Visual Studio provides a tool to do this, if we have the XML available. To get the XML, open the URL that was created in the application (see Figure 3) in Visual Studio by using the File menu and selecting Open. Visual Studio will open the XML editor with the query result (you might want to use the Save As command to shorten the file name). You can now create the schema using the XML menu item and selecting Create Schema. For the RSS feed in our application, three schema files will be created. Be sure to save these schemas and add them to the project. You can see these schema files in the Solution Explorer that is shown in Figure 21.
Figure 21. Solution Explorer highlighting the new schema files
You can then see the namespaces defined in the RSS feed schema in the Imports statement. Figure 22 shows the Intellisense (which is called schema discovery).
Figure 22. Schema-enhanced Intellisense on Imports
Now that we have imported the namespace that is backed up with the schema information, we can see the enhanced Intellisense when we enter code. The first code that we will enter is the LINQ query shown in Figure 7. Figure 23 shows the Intellisense list that is displayed as possible values for the descendant's axis of the feed list variable.
Figure 23. Intellisense applied within a LINQ query
Note that there are many choices for descendant attributes and the Intellisense engine cannot identify which ones are known with certainty. Items in which the XSD type is not known with certainty are placed in what is called the "candidate" list. To indicate that an item is in this candidate list, a question-mark glyph is added to the item. Figure 24 shows this glyph.
Figure 24. Glyph to indicate item is in the "candidate" list
Figure 25 shows the Intellisense list within an embedded expression. Intellisense matches not only on the prefix, but also on the local names of the element or attribute. Looking at Figure 25, you can start typing "cou...", and the match on "expo:country" will be found.
Figure 25. Intellisense applied within an embedded expression
The choices available as descendants of the <location> element are well defined in the schema and thus are added to what is called the "matched" list by Intellisense. The fact that a choice is a member of the matched list is indicated by the use of the green check-mark glyph as shown in Figure 26.
Figure 26. Glyph to indicate item is in the "matched" list
In addition to the great Intellisense and compile-time support for XML, Visual Basic 9.0 also supports enhanced information while debugging. Figure 27 shows the Locals window while the application is in Break mode. The breakpoint has been set right after the following statement:
Dim rssItems As List(Of XElement) = itemList.ToList
has been executed. This statement causes the LINQ query to be executed. In Figure 27, we are looking at the in-memory results of the query for the itemList variable. You can see how the contents of this variable can be expanded to see the XML elements that are returned from the query. The value column shows the XML content of the variable which makes it extremely easy to examine the results.
Figure 27. Run-time information available for debugging (Click on the picture for a larger image)
In this article, we have seen a number of new features that are available in Visual Basic 9.0 and Visual Studio 2008. The processing of XML has been improved significantly with the addition of LINQ to XML, XML literals, XML axis properties, and improved Intellisense and debugging support. With these new features, Visual Basic 9.0 has raised the bar, as far as processing XML is concerned. The realistic prototype application demonstrates the value of these new features, in addition to a brief look at the new Office Open XML format.