Project Astoria

The Architecture Journal

by Pablo Castro

Summary: Project Astoria delivers a set of patterns as well as a concrete infrastructure for creating and consuming data services using Web technologies. This article explores how modern data-centric Web applications and services are changing and the role Astoria can play in their new architectures.

Data is becoming increasingly available as a first-class element in the Web. The proliferation of new data-driven application types such as mashups clearly indicates that the broad availability of standalone data independent of any user interface is changing the way systems are built and the way data can be leveraged. Furthermore, technologies such as Asynchronous JavaScript and XML (AJAX) and Microsoft Silverlight are introducing the need for mechanisms to exchange data independently from presentation information in order to support highly interactive user experiences.

Project Astoria provides architects and developers with a set of patterns for interacting with data services over HTTP using simple formats such as POX (Plain Old XML) and JavaScript Object Notation (JSON). Closely following the HTTP protocol results in excellent integration with the existing Web infrastructure, from authentication to proxies to caching. In addition to the patterns and formats, we provide a concrete implementation that can automatically surface an Entity Data Model schema or other data sources through the HTTP interface.

While the software infrastructure provided by Astoria can be a useful piece of the puzzle when building new Web applications and services, it is just one piece. Other elements in these applications need to be organized in a way that enables interaction with data across the Web.

In this article, I will discuss the architectural aspects that are impacted by modern approaches for Web-enabled application development, focusing on how data services integrate into the picture.

Contents

Separating Data from Presentation
Flexible Data Interfaces
Interacting with Data in the Presentation Layer
Introducing Business Logic
Extensibility and Alternate Data Sources
Deployment Scenarios: Applications and Services
Security
Closing Notes: Scope and Plans for Project Astoria
Resources
About the Author

Separating Data from Presentation

The new generation of Web applications utilizes technologies such as AJAX, Microsoft Silverlight, or other rich-presentation infrastructures. One common characteristic of all these technologies is that they impose a change with respect to the way Web applications are built.

Figure 1 compares the flow of content in traditional and modern Web applications. Traditional data-driven Web applications typically consist of a set of server-side pages or scripts that run when a request arrives; during execution they execute a few queries against a database and then render HTML that contains both presentation information and data embedded in it.

Click here for larger image

Figure 1: Traditional flow of content (left) and how it changes in modern Web applications (right) (Click on the picture for a larger image)

An AJAX-based or Silverlight-based application does not follow that interaction model. Presentation information is shipped to the Web browser along with code to drive the user interface, but without actual data. That presentation information is typically a combination of HTML, Cascading Style Sheets (CSS), and JavaScript for AJAX applications and Extensible Application Markup Language (XAML)/ DLLs for Silverlight. Once the code is up and running in the client, the initial user interface is presented and data is retrieved as the user interacts with the interface.

Interestingly, this new round of technologies is acting as a forcing function to push application organization toward a goal that is common in many application architectures: strict separation of data and presentation.

Serving user-interface elements is relatively straightforward from the server perspective. Most of the time these are simple file resources on the server, such as HTML or CSS files, media files. Serving data is another story. Until now, interaction with data was something that happened between the Web server and the database server; there was no need to expose entry points accessible from code running across the Web in a Web browser or some other software agent. This is where Project Astoria kicks in.

Flexible Data Interfaces

There are a number of ways to expose data to clients that will consume it from across the Web. One approach that is enabled by existing technologies is to use an approach similar to Remote Procedure Call (RPC), where “functions” are exposed through an interface such as Web services (for example, a SOAP-based interface or a simple URI-based convention for invoking methods and passing parameters). Microsoft Visual Studio has a mature set of tools that makes it straightforward to both create and consume interfaces built this way. The ASP.NET AJAX toolkit takes this to the next level by enabling Visual Studio-created Web services to work with AJAX clients.

The main issue with this approach is flexibility. If interaction with data only happens through fixed, predefined entry points then every new scenario or every variation of existing ones with slightly different data will typically require the creation of new entry points. While this level of control is occasionally desired, in many cases, more flexibility would increase development productivity and contribute to a more dynamic application.

Project Astoria introduces an alternative to the RPC approach that is based on the simple semantics of HTTP. Astoria takes a schema definition that describes each one of the entities that your application deals with, along with the associations between entities, and exposes them over an HTTP interface. Each entity is addressable with a Uniform Resource Identifier (URI) and a URI convention allows applications to traverse associations between entities, search entities, and perform other commonly needed operations on data.

The schema definition used by Astoria is an Entity Data Model (EDM) schema, which is supported directly by the ADO.NET Entity Framework. The Entity Framework also includes a powerful mapping engine that allows developers to map the EDM schema to a relational database for actual storage.

In order to show interaction with Astoria services, I will use an example based on the well-known Northwind sample database. A data service on top of Northwind can be set up by creating an ASP.NET application, importing the Northwind schema from the database into an EDM schema using the EDM wizard, and then creating an Astoria data service pointing to that EDM schema.

Detailed steps for creating Astoria data services, as well as extensive documentation on the Astoria URI and payload formats, can be found in the “Using Microsoft Codename Astoria” document available on the Astoria Web site at http://astoria.mslivelabs.com.

With the service up and running, URIs can be used to browse the resources exposed by the data interface. The following examples use the experimental Astoria online service that hosts a few read-only data services. For instance: http://astoria.sandbox.live.com/northwind/northwind.rse/Customers would return all of the resources in the Customers resource container. In the default XML format, it would look like this:

<DataService xml:base=”http://astoria.sandbox.live.com/northwind/northwind.rse”>

 <Customers>

 <Customer uri=”Customers[ALFKI]”>

 <CustomerID>ALFKI</CustomerID>

 <CompanyName>Alfreds Futterkiste</CompanyName>

 <ContactName>Maria Anders</ContactName>

 <ContactTitle>Sales Representative</ContactTitle>

 <Address>Obere Str. 57</Address>

 <City>Berlin</City>

 <Region />

 <PostalCode>12209</PostalCode>

 <Country>Germany</Country>

 <Phone>030-0074321</Phone>

 <Fax>030-0076545</Fax>

 <Orders href=”Customers[ALFKI]/Orders” />

 </Customer>

 <Customer uri=”Customers[ANATR]”>

 ...properties...

 </Customer>

 ...more customer entries...

 </Customers>

 </DataService>

It is also possible to point to a particular entity by using its keys: http://astoria.sandbox.live.com/northwind/northwind.rse/Customers[ALFKI] would return a particular Customer resource; again, using the XML format in the example:

<DataService xml:base=”http://astoria.sandbox.live.com/northwind/northwind.rse”>

 <Customers>

 <Customer uri=”Customers[ALFKI]”>

 <CustomerID>ALFKI</CustomerID>

 <CompanyName>Alfreds Futterkiste</CompanyName>

 <ContactName>Maria Anders</ContactName>

 <ContactTitle>Sales Representative</ContactTitle>

 <Address>Obere Str. 57</Address>

 <City>Berlin</City>

 <Region />

 <PostalCode>12209</PostalCode>

 <Country>Germany</Country>

 <Phone>030-0074321</Phone>

 <Fax>030-0076545</Fax>

 <Orders href=”Customers[ALFKI]/Orders” />

 </Customer>

 </Customers>

 </DataService>

When a URI points to a specific resource such as the one just discussed, it is possible not only to retrieve the resource using an HTTP GET verb, but also to update it, using HTTP PUT, or delete it, using HTTP DELETE.

Since the schema description provided to Astoria includes associations between entities, those can also be leveraged in the HTTP interface. Continuing with the example, if each Customer resource is associated with a set of Sales Order resources, then the following URI represents the set of Sales Orders related to a particular Customer: http://astoria.sandbox.live.com/northwind/northwind.rse/Customers[ALFKI]/Orders.

In addition to being able to point to specific resources and list resources in a container, it is also possible to filter, sort, and page over data to facilitate the creation of user interfaces on top of the data service. For example, to list all the Customer resources in the city of London, sorted by contact name, an application can use this URI: http://astoria.sandbox.live.com/northwind/northwind.rse/Customers[City eq ‘London’]?$orderby=ContactName.

The actual format of Astoria URIs is still subject to change, but the semantics and capabilities should remain relatively stable.

In all cases, data is exchanged in simple formats such as XML or JSON. The actual format can be controlled by the client-agent using regular HTTP content type negotiation. Data is encoded as resources that simply map properties in EDM entities into XML elements or JSON properties, and they are hyperlinked to other resources that they have associations with. For example, the URI from the previous example: http://astoria.sandbox.live.com/northwind/northwind.rse/Customers[ALFKI] would result in the following response (in XML):

DataService xml:base=”http://astoria.sandbox.live.com/northwind/northwind.rse”>

 <Customers>

 <Customer uri=”Customers[ALFKI]”>

 <CustomerID>ALFKI</CustomerID>

 <CompanyName>Alfreds Futterkiste</CompanyName>

 <ContactName>Maria Anders</ContactName>

 <ContactTitle>Sales Representative</ContactTitle>

 <Address>Obere Str. 57</Address>

 <City>Berlin</City>

 <Region />

 <PostalCode>12209</PostalCode>

 <Country>Germany</Country>

 <Phone>030-0074321</Phone>

 <Fax>030-0076545</Fax>

 <Orders href=”Customers[ALFKI]/Orders” />

 </Customer>

 </Customers>

 </DataService>

You can see that the response includes simple properties and hyperlinks to other resources (“Orders” in the example). Every resource also includes a URI that represents the canonical location for it (in the URI attribute of the Customer element above).

Interacting with Data in the Presentation Layer

There are a number of options for interacting with data sources from the presentation layer. The first aspect that will scope the available options for a given scenario is the nature of the client (for example, browser versus rich client).

Since the Astoria interface is just plain HTTP, pretty much every environment with an HTTP client library can be used to consume data services. The interface has been specifically designed to be easy to use at the HTTP level; the URI patterns are simple and human-readable, and the payload formats use JSON or a subset of XML that keeps it straightforward.

For .NET applications, the Astoria toolkit includes a client library that runs in the .NET Framework environment and presents results coming from Astoria services as .NET objects; not only is that easier for developers to use within the codebase of the client application, but it also integrates well with components that already operate on top of regular .NET objects. The library provides rich services such as graph management, change tracking, and handling of updates. (See Example 1.)

Click here for larger image

Example 1: Accessing an Astoria service from .NET code using the Astoria client library (Click on the picture for a larger image)

The Astoria client library runs both on the .NET Framework and inside Microsoft Silverlight. This enables the creation of desktop applications and Silverlight-based Web applications using the same API, just by referencing the corresponding Astoria assembly for the target environment.

In the case of AJAX-style Web applications, most AJAX frameworks include easy-to-use wrappers for HTTP access to external resources. Those wrappers even support materializing the response into JavaScript objects if you indicate that the response will be in JSON format. (See Example 2.)

Click here for larger image

Example 2: Accessing an Astoria service from an AJAX application (example uses ASP.NET AJAX library (Click on the picture for a larger image)

Independent from the kind of application, they will typically run in environments with relatively high latency between the client and the data service, so the use of typical asynchronous execution techniques should be the norm. The client API has built-in support for asynchronous request execution. For the case of AJAX applications, the XMLHTTP interface, along with most wrappers, provides support for asynchronously submitting requests. (See Example 3.)

Click here for larger image

Example 3: Using the asynchronous API in the Astoria client for .NET (Click on the picture for a larger image)

Introducing Business Logic

Astoria enables developers to simply point to a database or a prebuilt EDM schema and it will automatically generate an HTTP view of it. While this is great for the initial iterations of an application, this wide-open interface to the data will often not be appropriate for production applications.

In most databases, there is a relatively clear split between two kinds of data (most typically, two kinds of tables): There is a part of the data that has enough implied semantics in itself; this is true for simpler concepts such as “product category.” In those cases, business logic is usually thin or inexistent, and the direct interface to the data is good enough. The other part of the data only makes sense with some business logic around it; for example, the data that is shown needs to be restricted based on the context, or modifications need to pass external validations, or when a given value is changed, another side-effect needs to take place, and so on.

To address the requirement of being able to introduce business logic that is tightly bound to certain pieces of data, Astoria supports two customization mechanisms: service operations and interceptors.

A default Astoria service consists entirely of resource containers that are the entry points to the resource graph, such as /Customers or / Products. In addition to those, developers can define service operations that encapsulate both business logic and queries. For example, in a given application, it may not be desirable to list all customers; instead, the application could provide an entry point to list “my customers” and even then there could be a requirement where clients retrieve their customers for a given city at a time. The developer can define a “MyCustomersByCity” service operation that obtains the users’ identity from the context (from ASP.NET’s HttpContext.User property, for example) and can then formulate a query that factors in both the user identity and the city name that is passed as an argument. For example:

 [WebGet]

 public static IQueryable<Customer> CustomersByCity(No

 rthwindEntities db, string city)

 {

 if (city == null || city.Length < 3) throw new

 Exception(“bad city”);

 

 var q = db.Customers.Where(“it.City = @city”,

 new

 ObjectParameter(“city”, city));

 

 // add user-based filter condition to q

 

 return q;

 }

would then be callable with a URI following this pattern:  /MyCustomersByCity?city=Seattle

An interesting feature of Astoria is that service operations can opt for returning a query instead of the actual results, as shown in the previous example. When a query object is returned, the rest of the URI pattern can still be used; so for instance, client could still add an “orderby” option to the URI: /MyCustomersByCity?city=Seattle&$orderby=CompanyName

The service operation can also contain code to perform validations, log activity, or any other need. This provides a good middle-ground between strict RPC, which makes building flexible UIs hard, and wide-open data interfaces that do not allow for control of the data that is flowing through the system.

For scenarios where preserving the resource-centric interface is desired, interceptors can be used. An interceptor is a method that is called whenever a certain action happens on a resource within a given resource container. For example, a developer could register an interceptor to be called whenever a change (POST/PUT/DELETE) is made to the “Products” resource container. The interceptor can perform validations, modify values, and even choose to abort the request.

Extensibility and Alternate Data Sources

So far I have discussed Astoria in the context of EDM and the ADO.NET Entity Framework. When using Astoria for surfacing data in a database, this is most likely the best choice; however, not all data is in a database.

In the first public CTP of Astoria, we have only targeted the Entity Framework. As we iterate on the design of the product, we are changing that to provide more options. Specifically, in order to enable scenarios where you want to expose data sources that are not databases, we will support using any LINQ-enabled data source to be exposed through the HTTP interface.

LINQ defines a general interface called IQueryable that allows consumers to dynamically compose queries without having to know any details about the nature of the target for the query. It is up to the actual implementation of IQueryable in each source to interpret or translate queries appropriately. This allows the Astoria runtime to take a “base query” and compose it with operations such as sorting and paging. (See Figure 2.)

Click here for larger image

Figure 2: Astoria architecture diagram illustrating IQueryable-based layering (Click on the picture for a larger image)

With this extensibility point in place, developers will be able to bring a broad set of data sources into the picture and expose them through the HTTP interface. This ranges from access to specialized data stores to using LINQ-based access libraries for online services (for example, there are informal implementations of LINQ to Amazon and LINQ to Flickr that provide limited query capabilities to those Internet sites).

Deployment Scenarios: Applications and Services

A typical data-driven Web application today will have its own database in addition to one or more Web servers. For enterprise applications, these servers are part of the IT infrastructure and for applications hosted in ISPs, most ISPs provide database services.

Along the same lines, you can expect many applications that use Astoria as their data services and are built around AJAX or Silverlight to still have their own database. In those scenarios, the Astoria data services will be part of the Web application itself, and will be deployed together with the rest of the application components. This is one of the scenarios we target with Astoria, but it is not the only one we envision.

Another way of looking at Astoria is as a technology for building data services for other systems to consume. Data providers can set up Astoria servers that other applications can interact with, both consuming and updating data as required and as allowed by the security policies. The provider of the data could be either the same as the owner of the application that consumes the data or it can be a service for others to consume.

Yet another way of looking at Astoria is as a general-purpose service for data storage. To explore this idea we have set up an experimental online service that hosts several sample data sets, including the Northwind and AdventureWorks sample databases, a subset of the Microsoft Encarta articles and a snapshot of data that supports the Microsoft TagSpace social bookmarking site. Turning this into a real-world service requires technology well beyond just the HTTP interface and patterns, and that is a space outside of the scope of the Astoria project, but we still find the experimental service valuable as a learning tool, to see what applications developers would build on top of it.

Security

Exposing a Web-facing data interface requires careful thinking around securing access to make sure only the data that is meant to be accessible is effectively accessible over the HTTP interface. This involves both an authentication infrastructure and proper authorization policies.

While a lot of the design points in Astoria equally apply to Astoria as a component of an application and to Astoria as a service, authentication is one of the areas where this is not the case.

When using Astoria data services as part of a custom Web application, authentication typically applies to all resources within a given boundary, including access to data. Users would authenticate once with the Web site and the system needs to be able to apply the credentials to the data service as well as to the rest of the application. Astoria looks into the ASP.NET API to find out whether a user is authenticated and to find out further details, so that an application that uses any authentication scheme properly integrated with ASP.NET will automatically work with Astoria.

Click here for larger image

Figure 3: REST and Astoria (Click on the picture for a larger image)

Integration with ASP.NET authentication means that in typical cases common authentication-over-HTTP mechanisms will work. This includes “forms authentication”, integrated authentication (useful inside corporate networks), and custom authentication schemes; it is even straightforward to roll a custom implementation of HTTP “Basic” authentication, which may be good enough when used over SSL connections, depending on the nature of the application.

For online services this becomes a much bigger challenge. Besides the actual technological question of how authentication happens, there are higher level differences that need to be addressed first: If an application and the data service are from different sources, is the data in the data service owned by the user of the application? If it is owned by the user, then the application should not have access to the user’s credentials to the data service. What is required in this scenario is a scheme where the user authenticates with the data service independent of the application (which may require authentication as well). We are exploring this space as part of the Astoria design effort, and although we have a reasonable understanding of the scenarios, we have not yet designed concrete technology to support them.

Once the authentication scheme is in place, a proper authorization model is required. The currently released version of Astoria (May 2007 CTP) has an over-simplistic implementation, where authentication policies can be set at the resource container level ( /Customers, for example); on each one of them the configuration can indicate whether authentication is required to read the resources in the container, and to write to them. This is not flexible enough for most applications, so clearly a more sophisticated scheme needs to be provided.

Closing Notes: Scope and Plans for Project Astoria

The manner in which applications are being written is changing. One of the key traits of emerging Web applications is the new ways they interact with their data. There is a clear opportunity here to introduce base technology to help the development community take on this new space.

With Project Astoria, we aim at enabling the use of data as a first-class construct on the Web and across the application stack. We want to provide the infrastructure for creating Web data services, and also contribute to the creation of an ecosystem where service providers and service consumers use a uniform interface for data. We would like to see UI controls vendors, client library writers, and other players leverage the power for reuse of the data interfaces to build better tools for Web application creation.

We are starting with this vision at home. Within Microsoft we are closely collaborating with various groups to explore different aspects where Astoria can play.

On the services side, we are working together with the Windows Live organization, the Web3S folks in particular (they are responsible for the data interfaces for Live properties), to explore a world where every data interface is a Web3S/Astoria interface and can be consumed by the various tools and controls out there.

On the tools side, the ASP.NET, WCF, and Astoria teams, as well as the various folks involved in Silverlight, are working together to provide a solid end-to-end story for Web application development, which includes first-class tools and libraries for interacting with data from Web applications and services.

Project Astoria moves fast and focuses on solving real-world challenges around the Web and data. We plan to go through the design process in a transparent way, so everyone can see what we are up to. It is an exciting time to be working on this space; I would encourage anyone that has an interest in the topic to check out the Astoria Team blog, follow our design discussions, and jump in whenever you have an opinion.

Resources

·         AstoriaTeam Blog: http://blogs.msdn.com/astoriateam

·         Pablo’s Blog: http://blogs.msdn.com/pablo

·         Project Astoria http://astoria.mslivelabs.com

About the Author

Pablo Castro is a technical lead in the SQL Server team. He has contributed extensively to several areas of SQL Server and the .NET Framework including SQL-CLR integration, type-system extensibility, the TDS client-server protocol, and the ADO.NET API. Pablo is currently involved with the development of the ADO.NET Entity Framework and also leads the Astoria project, looking at how to bring data and Web technologies together. Before joining Microsoft, Pablo worked in various companies on a broad set of topics that range from distributed inference systems for credit scoring/risk analysis to collaboration and groupware applications.

 

This article was published in the Architecture Journal, a print and online publication produced by Microsoft. For more articles from this publication, please visit the Architecture Journal Web site.

Show: