by Pablo Castro
Summary: Project Astoria delivers a set of patterns as well as a concrete infrastructure for creating and
consuming data services using Web technologies. This article explores how
modern data-centric Web applications and services are changing and the role Astoria can play in their new architectures.
Data is becoming increasingly available as a
first-class element in the Web. The proliferation of new data-driven
application types such as mashups clearly indicates that the broad availability
of standalone data independent of any user interface is changing the way
systems are built and the way data can be leveraged. Furthermore, technologies
such as Asynchronous JavaScript and XML (AJAX) and Microsoft Silverlight are
introducing the need for mechanisms to exchange data independently from
presentation information in order to support highly interactive user
experiences.
Project Astoria provides architects and
developers with a set of patterns for interacting with data services over HTTP
using simple formats such as POX (Plain Old XML) and JavaScript Object Notation
(JSON). Closely following the HTTP protocol results in excellent integration
with the existing Web infrastructure, from authentication to proxies to
caching. In addition to the patterns and formats, we provide a concrete
implementation that can automatically surface an Entity Data Model schema or
other data sources through the HTTP interface.
While the software infrastructure provided by Astoria can be a useful piece of the puzzle when building new Web applications and
services, it is just one piece. Other elements in these applications need to be
organized in a way that enables interaction with data across the Web.
In this article, I will discuss the
architectural aspects that are impacted by modern approaches for Web-enabled
application development, focusing on how data services integrate into the
picture.
Contents
Separating Data from Presentation
Flexible Data Interfaces
Interacting with Data in the Presentation Layer
Introducing Business Logic
Extensibility and Alternate Data Sources
Deployment Scenarios: Applications and Services
Security
Closing Notes: Scope and Plans for Project Astoria
Resources
About the Author
Separating Data from
Presentation
The new generation of Web applications utilizes
technologies such as AJAX, Microsoft Silverlight, or other rich-presentation
infrastructures. One common characteristic of all these technologies is that
they impose a change with respect to the way Web applications are built.
Figure 1 compares the flow of content in
traditional and modern Web applications. Traditional data-driven Web
applications typically consist of a set of server-side pages or scripts that
run when a request arrives; during execution they execute a few queries against
a database and then render HTML that contains both presentation information and
data embedded in it.
Figure 1: Traditional flow of content (left) and how it
changes in modern Web applications (right) (Click on the picture for a larger image)
An AJAX-based or Silverlight-based application
does not follow that interaction model. Presentation information is shipped to
the Web browser along with code to drive the user interface, but without actual
data. That presentation information is typically a combination of HTML,
Cascading Style Sheets (CSS), and JavaScript for AJAX applications and Extensible
Application Markup Language (XAML)/ DLLs for Silverlight. Once the code is up
and running in the client, the initial user interface is presented and data is
retrieved as the user interacts with the interface.
Interestingly, this new round of technologies is
acting as a forcing function to push application organization toward a goal
that is common in many application architectures: strict separation of data and
presentation.
Serving user-interface elements is relatively
straightforward from the server perspective. Most of the time these are simple
file resources on the server, such as HTML or CSS files, media files. Serving
data is another story. Until now, interaction with data was something that
happened between the Web server and the database server; there was no need to
expose entry points accessible from code running across the Web in a Web
browser or some other software agent. This is where Project Astoria kicks in.
Flexible Data Interfaces
There are a number of ways to expose data to
clients that will consume it from across the Web. One approach that is enabled
by existing technologies is to use an approach similar to Remote Procedure Call
(RPC), where “functions” are exposed through an interface such as Web services
(for example, a SOAP-based interface or a simple URI-based convention for
invoking methods and passing parameters). Microsoft Visual Studio has a mature
set of tools that makes it straightforward to both create and consume
interfaces built this way. The ASP.NET AJAX toolkit takes this to the next
level by enabling Visual Studio-created Web services to work with AJAX clients.
The main issue with this approach is
flexibility. If interaction with data only happens through fixed, predefined
entry points then every new scenario or every variation of existing ones with
slightly different data will typically require the creation of new entry
points. While this level of control is occasionally desired, in many cases,
more flexibility would increase development productivity and contribute to a more
dynamic application.
Project Astoria introduces an alternative to the
RPC approach that is based on the simple semantics of HTTP. Astoria takes a
schema definition that describes each one of the entities that your application
deals with, along with the associations between entities, and exposes them over
an HTTP interface. Each entity is addressable with a Uniform Resource
Identifier (URI) and a URI convention allows applications to traverse
associations between entities, search entities, and perform other commonly
needed operations on data.
The schema definition used by Astoria is an
Entity Data Model (EDM) schema, which is supported directly by the ADO.NET
Entity Framework. The Entity Framework also includes a powerful mapping engine
that allows developers to map the EDM schema to a relational database for
actual storage.
In order to show interaction with Astoria services, I will use an example based on the well-known Northwind sample database.
A data service on top of Northwind can be set up by creating an ASP.NET
application, importing the Northwind schema from the database into an EDM
schema using the EDM wizard, and then creating an Astoria data service pointing
to that EDM schema.
Detailed steps for creating Astoria data
services, as well as extensive documentation on the Astoria URI and payload
formats, can be found in the “Using Microsoft Codename Astoria” document
available on the Astoria Web site at http://astoria.mslivelabs.com.
With the service up and running, URIs can be
used to browse the resources exposed by the data interface. The following
examples use the experimental Astoria online service that hosts a few read-only
data services. For instance: http://astoria.sandbox.live.com/northwind/northwind.rse/Customers
would return all of the resources in the Customers resource container. In the
default XML format, it would look like this:
<DataService xml:base=”http://astoria.sandbox.live.com/northwind/northwind.rse”>
<Customers>
<Customer uri=”Customers[ALFKI]”>
<CustomerID>ALFKI</CustomerID>
<CompanyName>Alfreds
Futterkiste</CompanyName>
<ContactName>Maria
Anders</ContactName>
<ContactTitle>Sales Representative</ContactTitle>
<Address>Obere Str. 57</Address>
<City>Berlin</City>
<Region />
<PostalCode>12209</PostalCode>
<Country>Germany</Country>
<Phone>030-0074321</Phone>
<Fax>030-0076545</Fax>
<Orders href=”Customers[ALFKI]/Orders” />
</Customer>
<Customer uri=”Customers[ANATR]”>
...properties...
</Customer>
...more customer entries...
</Customers>
</DataService>
It is also possible to point to a particular
entity by using its keys: http://astoria.sandbox.live.com/northwind/northwind.rse/Customers[ALFKI]
would return a particular Customer resource; again, using the XML format in the
example:
<DataService
xml:base=”http://astoria.sandbox.live.com/northwind/northwind.rse”>
<Customers>
<Customer uri=”Customers[ALFKI]”>
<CustomerID>ALFKI</CustomerID>
<CompanyName>Alfreds
Futterkiste</CompanyName>
<ContactName>Maria
Anders</ContactName>
<ContactTitle>Sales
Representative</ContactTitle>
<Address>Obere Str. 57</Address>
<City>Berlin</City>
<Region />
<PostalCode>12209</PostalCode>
<Country>Germany</Country>
<Phone>030-0074321</Phone>
<Fax>030-0076545</Fax>
<Orders href=”Customers[ALFKI]/Orders” />
</Customer>
</Customers>
</DataService>
When a URI points to a specific resource such as
the one just discussed, it is possible not only to retrieve the resource using
an HTTP GET verb, but also to update it, using HTTP PUT, or delete it, using
HTTP DELETE.
Since the schema description provided to Astoria includes associations between entities, those can also be leveraged in the HTTP
interface. Continuing with the example, if each Customer resource is associated
with a set of Sales Order resources, then the following URI represents the set
of Sales Orders related to a particular Customer: http://astoria.sandbox.live.com/northwind/northwind.rse/Customers[ALFKI]/Orders.
In addition to being able to point to specific
resources and list resources in a container, it is also possible to filter,
sort, and page over data to facilitate the creation of user interfaces on top
of the data service. For example, to list all the Customer resources in the
city of London, sorted by contact name, an application can use this URI: http://astoria.sandbox.live.com/northwind/northwind.rse/Customers[City eq ‘London’]?$orderby=ContactName.
The actual format of Astoria URIs is still
subject to change, but the semantics and capabilities should remain relatively
stable.
In all cases, data is exchanged in simple
formats such as XML or JSON. The actual format can be controlled by the
client-agent using regular HTTP content type negotiation. Data is encoded as
resources that simply map properties in EDM entities into XML elements or JSON
properties, and they are hyperlinked to other resources that they have
associations with. For example, the URI from the previous example: http://astoria.sandbox.live.com/northwind/northwind.rse/Customers[ALFKI] would result in the following response (in XML):
DataService
xml:base=”http://astoria.sandbox.live.com/northwind/northwind.rse”>
<Customers>
<Customer uri=”Customers[ALFKI]”>
<CustomerID>ALFKI</CustomerID>
<CompanyName>Alfreds
Futterkiste</CompanyName>
<ContactName>Maria
Anders</ContactName>
<ContactTitle>Sales
Representative</ContactTitle>
<Address>Obere Str. 57</Address>
<City>Berlin</City>
<Region />
<PostalCode>12209</PostalCode>
<Country>Germany</Country>
<Phone>030-0074321</Phone>
<Fax>030-0076545</Fax>
<Orders href=”Customers[ALFKI]/Orders” />
</Customer>
</Customers>
</DataService>
You can see that the response includes simple
properties and hyperlinks to other resources (“Orders” in the example). Every
resource also includes a URI that represents the canonical location for it (in
the URI attribute of the Customer element above).
Interacting with Data in the Presentation Layer
There are a number of options for interacting
with data sources from the presentation layer. The first aspect that will scope
the available options for a given scenario is the nature of the client (for
example, browser versus rich client).
Since the Astoria interface is just plain HTTP,
pretty much every environment with an HTTP client library can be used to consume
data services. The interface has been specifically designed to be easy to use
at the HTTP level; the URI patterns are simple and human-readable, and the
payload formats use JSON or a subset of XML that keeps it straightforward.
For .NET applications, the Astoria toolkit
includes a client library that runs in the .NET Framework environment and
presents results coming from Astoria services as .NET objects; not only is that
easier for developers to use within the codebase of the client application, but
it also integrates well with components that already operate on top of regular
.NET objects. The library provides rich services such as graph management,
change tracking, and handling of updates. (See Example 1.)
Example 1: Accessing an Astoria service from .NET code
using the Astoria client library (Click on the picture for a larger image)
The Astoria client library runs both on the .NET
Framework and inside Microsoft Silverlight. This enables the creation of
desktop applications and Silverlight-based Web applications using the same API,
just by referencing the corresponding Astoria assembly for the target
environment.
In the case of AJAX-style Web applications, most
AJAX frameworks include easy-to-use wrappers for HTTP access to external
resources. Those wrappers even support materializing the response into
JavaScript objects if you indicate that the response will be in JSON format.
(See Example 2.)
Example 2: Accessing an Astoria service from an AJAX application (example uses ASP.NET AJAX library (Click on the picture for a larger image)
Independent from the kind of application, they
will typically run in environments with relatively high latency between the
client and the data service, so the use of typical asynchronous execution
techniques should be the norm. The client API has built-in support for
asynchronous request execution. For the case of AJAX applications, the XMLHTTP
interface, along with most wrappers, provides support for asynchronously
submitting requests. (See Example 3.)
Example 3: Using the asynchronous API in the Astoria client for .NET (Click on the picture for a larger image)
Introducing Business Logic
Astoria enables
developers to simply point to a database or a prebuilt EDM schema and it will
automatically generate an HTTP view of it. While this is great for the initial
iterations of an application, this wide-open interface to the data will often
not be appropriate for production applications.
In most databases, there is a relatively clear
split between two kinds of data (most typically, two kinds of tables): There is
a part of the data that has enough implied semantics in itself; this is true
for simpler concepts such as “product category.” In those cases, business logic
is usually thin or inexistent, and the direct interface to the data is good
enough. The other part of the data only makes sense with some business logic
around it; for example, the data that is shown needs to be restricted based on
the context, or modifications need to pass external validations, or when a
given value is changed, another side-effect needs to take place, and so on.
To address the requirement of being able to
introduce business logic that is tightly bound to certain pieces of data, Astoria supports two customization mechanisms: service operations and interceptors.
A default Astoria service consists entirely of
resource containers that are the entry points to the resource graph, such as
/Customers or / Products. In addition to those, developers can define service
operations that encapsulate both business logic and queries. For example, in a
given application, it may not be desirable to list all customers; instead, the
application could provide an entry point to list “my customers” and even then
there could be a requirement where clients retrieve their customers for a given
city at a time. The developer can define a “MyCustomersByCity” service
operation that obtains the users’ identity from the context (from ASP.NET’s
HttpContext.User property, for example) and can then formulate a query that
factors in both the user identity and the city name that is passed as an
argument. For example:
[WebGet]
public static IQueryable<Customer>
CustomersByCity(No
rthwindEntities db, string city)
{
if (city == null || city.Length < 3) throw
new
Exception(“bad city”);
var q = db.Customers.Where(“it.City = @city”,
new
ObjectParameter(“city”, city));
// add user-based filter condition to q
return q;
}
would then be callable with a URI following this
pattern: /MyCustomersByCity?city=Seattle
An interesting feature of Astoria is that
service operations can opt for returning a query instead of the actual results,
as shown in the previous example. When a query object is returned, the rest of
the URI pattern can still be used; so for instance, client could still add an
“orderby” option to the URI: /MyCustomersByCity?city=Seattle&$orderby=CompanyName
The service operation can also contain code to
perform validations, log activity, or any other need. This provides a good
middle-ground between strict RPC, which makes building flexible UIs hard, and
wide-open data interfaces that do not allow for control of the data that is
flowing through the system.
For scenarios where preserving the
resource-centric interface is desired, interceptors can be used. An interceptor
is a method that is called whenever a certain action happens on a resource
within a given resource container. For example, a developer could register an
interceptor to be called whenever a change (POST/PUT/DELETE) is made to the
“Products” resource container. The interceptor can perform validations, modify
values, and even choose to abort the request.
Extensibility and Alternate Data Sources
So far I have discussed Astoria in the context
of EDM and the ADO.NET Entity Framework. When using Astoria for surfacing data
in a database, this is most likely the best choice; however, not all data is in
a database.
In the first public CTP of Astoria, we have only
targeted the Entity Framework. As we iterate on the design of the product, we
are changing that to provide more options. Specifically, in order to enable
scenarios where you want to expose data sources that are not databases, we will
support using any LINQ-enabled data source to be exposed through the HTTP
interface.
LINQ defines a general interface called
IQueryable that allows consumers to dynamically compose queries without having
to know any details about the nature of the target for the query. It is up to
the actual implementation of IQueryable in each source to interpret or
translate queries appropriately. This allows the Astoria runtime to take a
“base query” and compose it with operations such as sorting and paging. (See
Figure 2.)
Figure 2: Astoria architecture diagram illustrating
IQueryable-based layering (Click on the picture for a larger image)
With this extensibility point in place,
developers will be able to bring a broad set of data sources into the picture
and expose them through the HTTP interface. This ranges from access to
specialized data stores to using LINQ-based access libraries for online
services (for example, there are informal implementations of LINQ to Amazon and
LINQ to Flickr that provide limited query capabilities to those Internet
sites).
Deployment Scenarios: Applications and Services
A typical data-driven Web application today will
have its own database in addition to one or more Web servers. For enterprise
applications, these servers are part of the IT infrastructure and for
applications hosted in ISPs, most ISPs provide database services.
Along the same lines, you can expect many
applications that use Astoria as their data services and are built around AJAX or Silverlight to still have their own database. In those scenarios, the Astoria data services will be part of the Web application itself, and will be deployed
together with the rest of the application components. This is one of the
scenarios we target with Astoria, but it is not the only one we envision.
Another way of looking at Astoria is as a
technology for building data services for other systems to consume. Data
providers can set up Astoria servers that other applications can interact with,
both consuming and updating data as required and as allowed by the security
policies. The provider of the data could be either the same as the owner of the
application that consumes the data or it can be a service for others to
consume.
Yet another way of looking at Astoria is as a
general-purpose service for data storage. To explore this idea we have set up
an experimental online service that hosts several sample data sets, including
the Northwind and AdventureWorks sample databases, a subset of the Microsoft
Encarta articles and a snapshot of data that supports the Microsoft TagSpace
social bookmarking site. Turning this into a real-world service requires
technology well beyond just the HTTP interface and patterns, and that is a
space outside of the scope of the Astoria project, but we still find the
experimental service valuable as a learning tool, to see what applications
developers would build on top of it.
Security
Exposing a Web-facing data interface requires
careful thinking around securing access to make sure only the data that is
meant to be accessible is effectively accessible over the HTTP interface. This
involves both an authentication infrastructure and proper authorization
policies.
While a lot of the design points in Astoria equally apply to Astoria as a component of an application and to Astoria as a
service, authentication is one of the areas where this is not the case.
When using Astoria data services as part of a
custom Web application, authentication typically applies to all resources
within a given boundary, including access to data. Users would authenticate
once with the Web site and the system needs to be able to apply the credentials
to the data service as well as to the rest of the application. Astoria looks
into the ASP.NET API to find out whether a user is authenticated and to find
out further details, so that an application that uses any authentication scheme
properly integrated with ASP.NET will automatically work with Astoria.
Figure 3: REST and Astoria (Click on the picture for a larger image)
Integration with ASP.NET authentication means
that in typical cases common authentication-over-HTTP mechanisms will work.
This includes “forms authentication”, integrated authentication (useful inside
corporate networks), and custom authentication schemes; it is even
straightforward to roll a custom implementation of HTTP “Basic” authentication,
which may be good enough when used over SSL connections, depending on the
nature of the application.
For online services this becomes a much bigger
challenge. Besides the actual technological question of how authentication
happens, there are higher level differences that need to be addressed first: If
an application and the data service are from different sources, is the data in
the data service owned by the user of the application? If it is owned by the
user, then the application should not have access to the user’s credentials to
the data service. What is required in this scenario is a scheme where the user
authenticates with the data service independent of the application (which may
require authentication as well). We are exploring this space as part of the Astoria design effort, and although we have a reasonable understanding of the scenarios,
we have not yet designed concrete technology to support them.
Once the authentication scheme is in place, a
proper authorization model is required. The currently released version of Astoria (May 2007 CTP) has an over-simplistic implementation, where authentication policies
can be set at the resource container level ( /Customers, for example); on each
one of them the configuration can indicate whether authentication is required
to read the resources in the container, and to write to them. This is not
flexible enough for most applications, so clearly a more sophisticated scheme
needs to be provided.
Closing Notes: Scope and Plans for Project Astoria
The manner in which applications are being
written is changing. One of the key traits of emerging Web applications is the
new ways they interact with their data. There is a clear opportunity here to
introduce base technology to help the development community take on this new
space.
With Project Astoria, we aim at enabling the use
of data as a first-class construct on the Web and across the application stack.
We want to provide the infrastructure for creating Web data services, and also
contribute to the creation of an ecosystem where service providers and service
consumers use a uniform interface for data. We would like to see UI controls
vendors, client library writers, and other players leverage the power for reuse
of the data interfaces to build better tools for Web application creation.
We are starting with this vision at home. Within
Microsoft we are closely collaborating with various groups to explore different
aspects where Astoria can play.
On the services side, we are working together
with the Windows Live organization, the Web3S folks in particular (they are
responsible for the data interfaces for Live properties), to explore a world
where every data interface is a Web3S/Astoria interface and can be consumed by
the various tools and controls out there.
On the tools side, the ASP.NET, WCF, and Astoria
teams, as well as the various folks involved in Silverlight, are working
together to provide a solid end-to-end story for Web application development,
which includes first-class tools and libraries for interacting with data from
Web applications and services.
Project Astoria moves fast and focuses on
solving real-world challenges around the Web and data. We plan to go through
the design process in a transparent way, so everyone can see what we are up to.
It is an exciting time to be working on this space; I would encourage anyone
that has an interest in the topic to check out the Astoria Team blog, follow
our design discussions, and jump in whenever you have an opinion.
Resources
·
AstoriaTeam Blog: http://blogs.msdn.com/astoriateam
·
Pablo’s Blog: http://blogs.msdn.com/pablo
·
Project Astoria http://astoria.mslivelabs.com
About the Author
Pablo Castro is a technical lead in the SQL
Server team. He has contributed extensively to several areas of SQL Server and
the .NET Framework including SQL-CLR integration, type-system extensibility,
the TDS client-server protocol, and the ADO.NET API. Pablo is currently involved
with the development of the ADO.NET Entity Framework and also leads the Astoria project, looking at how to bring data and Web technologies together. Before
joining Microsoft, Pablo worked in various companies on a broad set of topics
that range from distributed inference systems for credit scoring/risk analysis
to collaboration and groupware applications.
This article was published in the Architecture Journal, a print
and online publication produced by Microsoft. For more articles from this
publication, please visit the Architecture Journal Web site.