November 2010

Volume 25 Number 11

Microsoft Azure Marketplace DataMarket - Introducing DataMarket

By Elisa Flasko | November 2010

Microsoft Azure Marketplace DataMarket, which was first announced as Microsoft Project Codename “Dallas” at PDC09, changes the way information is exchanged by offering a wide range of content from authoritative commercial and public sources in a single marketplace. This makes it easier to find and purchase the data you need to power your applications and analytics.

If I were developing an application to identify and plan stops for a road trip, I would need a lot of data, from many different sources. The application might first ask the user to enter a final destination and any stops they would like to make along the way. It could pull the current GPS location or ask the user to input a starting location and use these locations to map the best route for the road trip. After the application mapped the trip, it might reach out to Facebook and identify friends living along the route who I may want to visit. It might pull the weather forecasts for the cities that have been identified as stops, as well as identify points of interest, gas stations and restaurants as potential stops along the way.

Before DataMarket, I would’ve had to first discover sources for all the different types of data I require for my application. That could entail visiting numerous companies’ Web sites to determine whether or not they have the data I want and whether or not they offer it for sale in a package and at a price that meets my needs. Then I would’ve had to purchase the data directly from each company. For example, I might have gone directly to a company such as Infogroup to purchase the data enabling me to identify points of interest, gas stations and restaurants along the route; to a company such as NavTeq for current traffic reports; and to a company such as Weather Central for my weather forecasts. It’s likely that each of these companies would provide the data in a different format, some by sending me a DVD, others via a Web service, Excel spreadsheet and so on.

Today, with DataMarket, building this application becomes much simpler. DataMarket provides a single location—a marketplace for data—where I can search for, explore, try and purchase the data I need to develop my application. It also provides the data to me through a uniform interface, 
in a standard format (OData—see OData.org for more information). By exposing the data as OData, DataMarket ensures I’m able to access it on any platform (at a minimum, all I need is an HTTP stack) and from any of the many applications that support OData, including applications such as Microsoft PowerPivot for Excel 2010, which natively supports OData.

DataMarket provides a single marketplace for various content providers to make their data available for sale via a number of different offerings (each offering may make a different subset or view of the data available, or make the data available with different terms of use). Content providers specify the details of their offerings, including the terms of use governing a purchase, the pricing model (in version 1, offerings are made available via a monthly subscription) and the price.

Getting Started with DataMarket

To get started, I register on DataMarket using my Windows Live ID and log on to the site. From here, I can search through the datasets currently available on the site, viewing the details of each (description, pricing, visualizations, terms of use and so on) to determine which publishers deliver the type of data I’m looking for and which offers best suit my needs.

Although I need to find a dataset that meets the technical requirements of my application, it’s important to also verify that the terms of use set forth by the publisher allow the data to be used in the way required by my application. Terms of use can vary widely across datasets and across publishers. I can view the terms of use for a specific dataset within the offering details.

After digging around a bit, I find that the Infogroup Business Database service looks like it could be interesting for my road trip application. After looking through the available visualizations of the data (a sample of the data in table format, or in some cases a visualization of the data on a map or in a chart), it looks like it should provide me with points of interest including hotels, restaurants and gas stations along the route, and it should meet my requirements. After I’ve determined which datasets fit my needs, I can choose to purchase a monthly subscription to the service, allowing me an unlimited number of queries against the service per month, and I can simply pay with a credit card. After I’ve purchased something, I can manage my account and view all my current subscriptions in the Account section of the Marketplace. The Account section can also be used to create and manage account keys, which I use to access my subscriptions. When accessing data from an application such as PowerPivot, I’ll need to provide my account key. Similarly, when developing an application, it will authenticate with an account key.

After I’ve purchased a data offering, DataMarket includes a Service Explorer, as seen in Figure 1, which allows me to explore a dataset by creating queries and previewing the results, effectively learning more about the data and the schema that each dataset makes available.

image: DataMarket Service Explorer

Figure 1 DataMarket Service Explorer

If I open up the Infogroup Business Database service in the Service Explorer, I can create queries that filter by city, state and ZIP code to find places along the planned route. For example, if I specify Seattle as the city and WA as the state and click on “Execute query,” I get back a preview including the first 100 results from the service. I can see from the preview that there’s a Westin Hotel in Seattle that I’ll want my application to include in the list of stops. By default, when I click Preview, the results show up on the right side of the explorer in a basic tabular format to make it easy to read through. But I can also view the OData Atom format that I’ll use in my application right there in the Service Explorer, or view the raw data in the same format originally provided by Infogroup.

If I wasn’t a developer, or if my goal wasn’t to build an application with this data, but rather just pull the data into an existing tool that understands OData—such as PowerPivot for Excel 2010—I could also do that directly from DataMarket simply by signing in, opening my current subscriptions list and clicking on the data offering. From there I will be given the option to open the data offering in the application of my choice. (For more information on using DataMarket with PowerPivot, including a step-by-step tutorial, be sure to check out the DataMarket team blog at blogs.msdn.com/b/dallas.)

Consuming OData

Have you previously built an application that consumes data from an OData service? If the answer is yes, you’re already well on your way to consuming data from DataMarket in your app, because most DataMarket services are exposed using OData. I’ll quickly go through the basics of consuming OData.

Whether I begin with an existing application and I’m simply adding in new features that utilize data from DataMarket, or if I’m creating a new application from scratch, the first step in pulling data in from a DataMarket service is to define the classes that I’ll use to represent the data in my application. I can do this by coding my own Plain Old C# Object (POCO) classes or by using the Add Service Reference Wizard in Visual Studio to generate the necessary classes.

As described earlier, the publisher (the owner of the data) specifies how the data for each dataset is exposed and the terms of use dictating how the data may be used after purchase. This includes specifying how or if a user can create ad hoc queries against a dataset, which fields a user can query on to retrieve the data and which fields are returned. In some cases, a publisher may specify that users can’t create ad hoc queries and must use fixed Web methods to request data from the service. As a developer, before beginning to write an application, I’ll need to look at the Atom Service document for a particular offering to determine whether the offering exposes any entity sets for query. I can do this by pointing the browser at the root URI of an offering. If the service document doesn’t contain any collections, the offering doesn’t expose entity sets for query and I’ll be required to access the dataset using fixed Web methods. For example, let’s examine the Atom Service documents for two separate datasets, one that exposes an Entity Set for query (Data.Gov Crime Data), and one that doesn’t allow query at all (AP Data). Notice in the following code samples that the AP offering has no <collection> nodes (representing exposed entity sets) in the service document.

Here’s an Atom Service document for the Data.gov Crime data offering, which exposes entity sets for query. This particular offering exposes a single entity set, CityCrime:

<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<service xml:base=
  "https://api.datamarket.azure.com/Data.ashx/data.gov/Crimes"  
  xmlns:atom=
  "https://www.w3.org/2005/Atom" 
  xmlns:app="https://www.w3.org/2007/app" 
  xmlns="https://www.w3.org/2007/app">
 <workspace>
   <atom:title>Default</atom:title> 
   <collection href="CityCrime">
     <atom:title>CityCrime</atom:title> 
   </collection>
 </workspace>
</service>

Here’s an Atom Service document for the AP offering, which doesn’t allow ad hoc query:

<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<service xml:base=
  "https://api.datamarket.azure.com/Data.ashx/data.gov/Crimes" 
  xmlns:atom="https://www.w3.org/2005/Atom" 
  xmlns:app="https://www.w3.org/2007/app" 
  xmlns="https://www.w3.org/2007/app">
 <workspace>
   <atom:title>Default</atom:title> 
 </workspace>
</service>

If the dataset I’m using exposes entity sets for query, I can use Add Service Reference in Visual Studio, just as I do for a Windows Communication Foundation (WCF) service. I simply right-click on the project and choose Add Service Reference. In the Add Service Reference dialog box, I enter the URI for the entry point of the service (I can get this from the Service Explorer page in DataMarket) as the Address and then click OK.

If the dataset I’m using doesn’t allow for ad hoc query, however, I can’t use Add Service Reference. In this case, I can code my own POCO classes representing the entities (or objects) returned from the fixed Web methods that I call. DataMarket uses Basic Authentication, passing in a Windows Live ID as the username and an associated DataMarket account key in the password field.For the purpose of the console application example (see Figure 2), I used Add Service Reference in Visual Studio to generate the classes. Visual Studio will access the service and generate the associated classes based on the data service definition, and add them to the project. After my classes have been generated, I continue to develop my application in the same way I would any application that consumes an OData service, using either basic authentication or Audit Collection Services to authenticate with the service. The Figure 2 example sets up a simple console application using Basic Authentication.

Figure 2 A Simple Console Application Accessing a Data Service and Printing the Results

public class Program
  {
       
    static void Main(string[] args)
    {
        GetCityCrime X = new GetCityCrime();
        IList<CityCrime> stats = X.getStats();
        foreach (CityCrime c in stats)
        {
            Console.WriteLine(c.City + " : " + c.AggravatedAssault);
        }
        Console.ReadLine();
    }               
  }
  public class GetCityCrime
  {
    Uri serviceUri;
    datagovCrimesContainer context;
    public GetCityCrime()
    {
      serviceUri = 
        new Uri("https://api.datamarket.azure.com/Data.ashx/data.gov/Crimes");
      context = new datagovCrimesContainer(serviceUri);
      context.Credentials = new NetworkCredential(" ", "
        <my account key as copied from DataMarket>");
    }
    public IList<CityCrime> getStats()
    {
      IEnumerable<CityCrime> query; 
       query = from c in context.CityCrime
             where c.State == "Alaska"
             select c;
       return query.ToList();
    }
  }

Note: To run the sample code in Figure 2*, you’ll need to register as a DataMarket customer and subscribe to the Data.Gov Crime Statistics offering. Also note that the URI used in this sample refers to a pre-release to manufacturing version of the service; you can find the correct URI by logging into DataMarket, going to My Datasets and accessing your subscription to the Data.gov Crime Statistics offering.*

For more details on writing applications that consume data services, check out “Entity Framework 4.0 and WCF Data Services 4.0 in Visual Studio 2010” (msdn.microsoft.com/magazine/ee336128) and “Expose and Consume Data in a Web Services World” (msdn.microsoft.com/magazine/cc748663).

Selling Data on DataMarket

Many popular apps and Web sites generate, store and consume large amounts of valuable data. But typically that data is only leveraged within the app for which it was created. With the introduction of WCF Data Services and OData, we had a simple way for developers to expose their data for broader use, offering a data services platform and making it easier for that data to be used not only within the initially intended application, but within third-party apps. Now, with DataMarket, there’s a simple opportunity for developers to not only expose that data to apps they build, but also to generate profit by selling data they’re already required to store and maintain.

DataMarket is built on Azure and SQL Azure and allows publishers to create datasets for data that they host in SQL Azure. If my data is already stored in SQL Azure, I’m well on my way to being able to offer data for sale in DataMarket. To learn more about how you can become a DataMarket publisher, check out blogs.msdn.com/b/dallas.


Elisa Flasko is a program manager in the Microsoft Azure Marketplace DataMarket team at Microsoft. She can be reached at blogs.msdn.com/elisaj.

Thanks to the following technical experts for reviewing this article: Rene Bouw, Moe Khosravy and Adam Wilson