by Vittorio Bertocci
Summary: Today’s identity-management practices are often a patchwork of partial solutions, which somehow accommodate but never really integrate applications and entities separated by technology and organizational boundaries. The rise of Software as a Service (SaaS) and cloud computing, however, will force organizations to cross such boundaries so often that ad hoc solutions will simply be untenable. A new approach that tears down identity silos and supports a de-perimiterized IT by design is in order.
The Sky Is the Limit
Architectural Patterns of Claims-Aware Solutions
The Canonical Pattern: Subject-IP-RP
The Claims Transformer Variation: Subject-IP-Claims Transformer-RP
Claims and Cloud
Access Control via Claims Transformation
Call to Action
This article will walk you through the principles of claims-based identity management, a model which addresses both traditional and cloud scenarios with the same efficacy. We will explore the most common token exchange patterns, highlighting the advantages and opportunities they offer when applied on cloud computing solutions and generic distributed systems.
When you look at a cloudy sky, your inner child probably sees dragons and castles; don’t be surprised if your inner architect, after having read this article, will see dollar signs. Cloud computing promises to bring radical advantages to the way in which we think of IT: Its basic idea is that companies can host assets outside of their own premises, reaping the benefits of those assets without the burden of maintaining the necessary infrastructure. This is somewhat similar to the idea of SaaS, where companies can avoid the burden of maintaining on-premise applications that are not specific to their core business, buying the corresponding functionality as a service. Cloud computing, however, pushes the bar further. Instead of buying complete applications provided by third parties, such as the classic CRM and HR packages, the cloud offers the possibility of hosting your own resources in data centers that are exposed to you as a platform. You have all the advantages of retaining control of the resource, without the pain of CPU and bandwidth usage, dealing with the hardware, cooling the room; you don’t even need to worry about patching your system. If your Web application produces new data every day, using a data store in the cloud saves you from constantly buying hardware for accommodating growth. The best part is that you can expect to be charged an amount proportional to the usage you actually make of the resource, instead of having to invest in hardware and infrastructure beforehand. This “pay-per-use” pattern is one of the reasons for you will often hear the term “utility computing” instead of “cloud computing,” and it is even more evident in CPU-intensive tasks. Imagine if, instead of sizing your data center for handling its maximum forecasted peek and underutilizing it most of the time, you could deploy your most CPU-hungry processes in a data center of monstrous proportions: The CPU utilization could grow as much as requested, and you would pay your cloud provider in proportion. Those are some of the advantages that will light a sparkle in the eyes of your IT managers, but the Cloud holds even more interesting properties for architects. Since the cloud provider hosts resources on a common infrastructure, it is in the position of offering services that can be leveraged by every resource simplifying development and maintenance. Obvious candidates are naming, message dispatching, logging, and access control. Once a resource uses the cloud infrastructure, implementing those functionalities can be factored out from the resource itself.
The literature on cloud computing is vast and grows every day: If the introduction above was not enough to convince you of its disruptive potential, simply search for the term with your favorite search engine to get a feeling of how seriously the industry is taking it. The diligent architect, at this point, is likely to wonder, “Is my company ready for this?” Not surprisingly, answering this question is a complex task and requires considering many aspects of your architecture and your practices. In extreme simplification: If you run your business according to solid service orientation (SO) principles, you are in the ideal position to take advantage of the new wave. After all, if you respected autonomy, exposed policies, and used standards, who cares where your services run? If you are in that position, you have my congratulations. In my experience, however, nobody ever applies SO principles in excruciating detail. For example, services developed with the same technology offer special features when talking with each other, and there are situations in which it makes perfect sense to take advantage of those.
Identity management and access control are most likely to be affected by this phenomenon. Enterprises typically have their directory software, and they rightfully leverage that for many aspects of the resource access control; sometimes it works so well that developers are not exposed to identity concepts, which is actually a good thing, but that rarely happens. When faced with tasks involving some form of access control management, such as federating with partners outside the directory or using different credential types, you can expect developers to come out with the worst swivel chair integration solutions. If identity brings out the worst from development practices, why do we get away with it? The easy answer is that sometimes we don’t. I am sure you have heard your share of horror stories of access control gone wrong. The subtler answer is that we get away with it because, until we own the majority of the infrastructure, if we exercise iron-fist governance, we can somehow handle it: We may use more resources than needed, we may deal with emergencies more often than needed, but somehow we go on. In fact, “we own the majority of the infrastructure” is a fact that is challenged by growing market pressure. When a lot of your business requires you to continuously connect and onboard new partners, where does your infrastructure end and theirs begin? Cloud computing is going to snowball this: Once the cloud is just another deployment option, crafting custom access code for every resource will simply be not sustainable.
The good news is that there is an architectural approach that can help manage identities and access control for generic distributed systems, and it works for on premise, cloud, and hybrid systems alike. The core idea is modeling almost everything as exchanges of claims, and model transactions in a much more natural fashion.
This article is an introduction to this new approach. Special attention will be given to the aspects that are especially relevant for the cloud, but the vast majority of the concepts and patterns presented can be applied regardless of the nature of the distributed system. Note that the problems we are trying to tackle here are not just the simple single-site, membership provider-based application. While the principles laid down here apply to any system, hence also to simple cases, their expressive power is best utilized for scenarios including partnerships, complex access rules, and structured identity information.
If the following definitions are hard to digest at first, think of how difficult it must have been to learn to use the digit "0" for the Europeans of the Middle Ages, used to sum up numbers using the Roman notation. In that case, there was an established practice to go against, but the payoff of adopting a better model was exceptional!
The issue with classic identity-management solutions can be summarized as follows: They presume too much.
The most common assumption is that every entity participating in a transaction is well known by some central, omnipresent authority that can decide who can access what, and it what terms. This is usually true in self-contained systems, such as enterprise networks managed via directories, but fails when business processes begin to require alien participant such as software packages with their own identity stores, partners and customers accessing your extranet, and consultants. Tactical solutions, like using shadow accounts, often have to do with pretending to be able to manage something we don’t own; and as such, they are very brittle.
Another common assumption is that every participant in a transaction uses a consistent identity-management technology. Again, this is a fair assumption for self-contained systems (think network software), but it fails as soon as you let aliens in the process. The common practice in accommodating different technologies is treating those cases as exceptions. As a result, the resources themselves end up embedding a lot of identity-management plumbing code, written by developers that usually are all but identity experts. This is every bit as bad as the old taboo for embedding business logic in the presentation layer, perhaps even worse. Handling identity plumbing directly inside the resource not only makes the system brittle and hard to maintain, it also makes the life of system administrators miserable. How can you manage access control at deployment time if the logic is locked inside the resource itself?
The claims-based approach defuses these issues by assigning each task to the entities who are its natural owners, and avoiding introducing artificial dependencies and expectations by respecting the autonomy of all participants – nothing but good old SO architectural principles.
To follow, I will give a quick description of the claims-based approach. The topic in itself is huge; in fact, I coauthored a book on the subject. In order to fully understand the implications in this article, I recommend reading Chapter 2, freely available online on MSDN (see Resources).
Here I will present a bestiary of the various concepts and constructs you will encounter while exploring claims-based approaches.
A claim is a fact about an entity (the “subject”), stated by another entity (the “authority”).
A claim can be literally anything that describes one aspect of a subject, be it an actual person or an abstract resource. Classic examples of claims are “Bob is older than 21,” “Bob is in the group "remote debuggers" for the domain Contoso.com”, and “Bob is a Silver Elite member with one Star Alliance airline. A claim is endorsed by an authority; hence one observer can decide if the fact the claim represents should be considered true according to the authority’s trustworthiness.
An entity A is said to trust an entity B if A will consider true the claims issued by B. While very simplistic, this definition serves our purposes here. Trusting what B says about a subject saves A from the hassle of verifying the claim directly. Entity A still needs to make sure that the claim is actually coming from B and not a forgery.
A security token is an XML construct signed by an authority, containing claims and (possibly) credentials information.
Security tokens are artifacts, XML fragments described in (see Resources: WS-Security), which can fulfill two distinct functions:
· they provide a means to propagate claims
· they can support cryptographic operations and/or have a part in credentials authentication
Thanks to the properties of asymmetric cryptography, the fact that a token is signed makes it easy to verify the source of the claims it contains.
Tokens can also contain cryptographic material, such as keys and references to keys, which can be referenced in encryption and signatures in SOAP messages; those operations can be used as part of credentials verification processes. In this context, we consider a “credential” any material that can be used as part of some mechanism for verifying that the caller is a returning user: Passwords and certificates are good examples (for more details, see Resources: Vittorio Bertocci’s blog, The Tao of Authentication).
Tokens can be “projections” of specific authentication technologies, such as X509 certificates, or they can be issued (SAML, a popular token format you may have heard mentioned in the context of Web services security, is one example of an issued token). The system is future-proof: As new technologies emerge, suitable token “projections” can be documented in profile specifications.
A Security Token Service is a Web service that issues security tokens as described by WS-Trust (see Resources: WS-Trust).
An STS (see Figure 1) can process requests for security token (RST) messages and issue tokens via requests for security token responses (RSTR). Processing the RST usually entails authenticating the caller and issuing a token that contains claims describing the caller itself. In some cases, the STS will issue claims that are the result of transformations of claims it received in the RST. (For more details, see Resources: Vittorio Bertocci’s blog, R-STS.)
Figure 1. Anatomy of a security token
The Identity Metasystem (IdM) is a model that provides a technology agnostic abstraction layer for obtaining claims.
This is a very simplified definition that does not render justice to the model; for example, it does not mention policy distribution and systems management. (See Resources: WS-Security, WS-Trust.)
Every identity technology tends to accomplish the same tasks, following common patterns and dealing with more or less the same functional roles. The IdM describes those patterns and roles in an abstract fashion, modeling the behavior of any system in term of claims exchanges but leaving the details on how those are implemented to the specific technologies. The necessary level of abstraction is achieved by taking advantage of open, interoperable protocols such as the WS-* family of specifications.
The IdM describes three roles:
· Subject. The subject is the subject entity we mentioned in the claim definition. It is whoever (or whatever) needs to be identified in a transaction.
· Relying party (RP). The RP is the resource that requires authentication before being accessed. Examples of RPs include Web sites and Web services. RPs derive their name from the fact that they rely on IPs for obtaining claims about the subject they need to identify.
· Identity provider (IP). The IP is the authority entity we mentioned in the claim definition. An IP possesses knowledge about the subject, and can express it in the form of claims: Any RP that trusts that particular IP will be able to rely on those claims for making authentication and authorization decisions on subjects. Note: Often the IP will use an STS for issuing claims in the form of tokens; that does not mean that every STS is an IP. An STS is a tool that the IP uses to get its job done.
The basic aforementioned concepts constitute the basis for a new physic of authentication, and as such they can be combined to describe any system and transaction. In this new world, we can finally separate two functions traditionally conflated into one: obtaining information about the caller (via claims) and verifying if it is a returning user (via credentials). Thanks to this separation, we can now choose which function fits best our scenario; for more details see Resources: Vittorio Bertocci’s blog The TAO of Authentication.
There are certain patterns that are especially useful for describing common scenarios. To follow, I will describe the three most common of those patterns, along with some of the advantages that the claims-based approach brings to those scenarios with respect to traditional practices. All the patterns are suitable both for on premise and cloud architectures.
This pattern describes the minimal situation in authentication management: a subject wants to access a restricted resource (the RP). This happens when a Windows user tries to access a network share, when a smart client invokes a secure Web service, when a Web surfer wants to browse restricted content, you name it (see Figure 2).The key steps are as follow:
Figure 2. The Subject-IP-RP canonical pattern. The numbers refer to the text.
1. (out of band) The RP publishes its requirements in the form of a policy. Those requirements include
a. A list of claims
b. The IPs that the RP trusts as sources of those claims
c. Details about the specific authentication technologies that the RP can use; for example, the token formats that the RP understands
2. The subject reads the RP policy and verifies whether it can comply with it. In practice, it means verifying if the subject can obtain a suitable token from any of the IPs that RP trusts.
3. The subject invokes a suitable IP, requesting a token that complies with RP policies. In practice, it usually means sending an RST message to the IP’s STS.
4. The IP receives the RST and authenticates the subject. If the subject is known, the IP will retrieve the required claim values, package it in a token, sign it, and send it back to the subject.
5. The subject receives the token and uses it for invoking RP.
6. RP extracts the token from the invocation message and examines it: Is it signed by one trusted IP? Does it contain the right claims? If the check is positive,
a. the claims values are used for feeding some access control logic
b. If the access control logic is satisfied, the claim values are made available to the resource itself and access is granted
For the purposes of this explanation, let’s assume that the resource is a Web service. The pattern above exhibits many good properties.
Using tokens and claims lifts from the resource the burden of writing any explicit identity-management code. The same infrastructure that takes care of publishing RP policies can also perform operations such as deserializing tokens, checking signatures, and making claims values readily available to the resource developer regardless of the token format that was actually used on the wire. The infrastructure can also perform some authorization decision based on claims values, reducing further the tasks that the developer needs to worry about.
This is an advantage that holds for any scenario, but especially for the cloud where the hosting technology is one of the variables. Using claims makes resources truly portable, by decoupling them from the details of the infrastructure that will host them and the authentication technology that will be available at runtime.
Being able to specify policies at the resource level allows for very agile deployments, fine-grained control, and dynamic negotiation of the authentication technology of choice. It allows for establishing a management strategy that will work regardless of the hosting environment, thanks to the use of interoperable standards. It decouples the resource itself from the execution environment, making it much easier to move the resource around (including to the cloud).
Every resource specifies its requirements in an autonomous fashion: It is the subject that matches those with its own capabilities. The interaction is an emergent property of the combination of the two. Both parties can change independently, and the set of subjects that can access the resource is defined solely by the ability to comply with the requirements as opposed to some out of band or infrastructural dependency. The system is very robust, since everybody needs to worry only about its own requirements and capabilities. Onboarding users is very easy, since it does not require any explicit negotiation or arrangement.
In this pattern, the IP performs all the attributes retrieval. The RP receives what it needs in the form of a token, without needing to query other systems. This is obviously an advantage for those attributes to which the RP would not have access (i.e., Airline A asks for the subject’s frequent flyer status with partner Airline B), but it is also a great means of centralizing attribute retrieval logic in a single place. Not only does it reduce the number of endpoints that need access to attribute stores, with advantages for manageability and security, it can also reduce the number of accesses itself, since once the subject obtains a token it can cache and reuse it until it expires without getting back to the IP. The IP can also be used for authorization decisions, which can be expressed via claims as well; however, this can happen only when the IP has a tight relationship both with the subject and the resource. While this happens in important scenarios, such as when the IP represents a directory and the RP is one of the resources it manages, in the generic case no relationship can be assumed between RP and IP.
This is key to many cloud scenarios, in which resources need to rely on external IPs for verifying information about possible users; however, it is also very useful in more agile versions of classic federation, in which users of a partner organization can be represented by technology- and platform-agnostic tokens, and for crossing boundaries of any kind.
The pattern just described can be applied to a wide range of scenarios, from consumer (frequent flyer with Airline A accessing the Web site of partner Airline B) to enterprise (almost any Kerberos scenario can be modeled that way, with the added bonus of technology independence).
However, there are situations in which certain constraints may prevent the application of the pattern as it is:
1. Indirect trust—A RP may not directly trust the IPs that are in a relationship with the subject, but there could be a chain of intermediate authorities that can be traversed for brokering trust. If you want to board a plane, you won’t be able to do so using just the documents already in your wallet; the employee at the gate will trust only a boarding pass issued by the proper airline. However, you can use the documents already in your possession (passport or driver’s license, issued by the government) at the check-in counter to obtain the boarding pass (issued by the airline).
2. Claims format—The RP may not be able to understand the claims in the format issued by the IPs that are directly available to the subject. Sometimes it is a simple matter of format (my Italian ID says “nato il” instead of “birth date,” a bartender in the U.S. would not be able to extract the information he needs even if it’s there), other times the claims required by the RP are the results of some processing of the raw information received (the RP may need a “CanDrink?” claim, which may be the result of processing “age” and “Nationality”).
The new constraints can be easily addressed by adding a new element, which we will call a claim transformer, that can process a token before it reaches the resource and can be leveraged by the RP to broker trust and convert the incoming claims in a more suitable format. Between the subject and the RP, there could be an arbitrarily long chain of claim transformers. Dynamic negotiation of policy can automatically “choose” a path, if available, without any need to plan it at a high level.
The best artifact for implementing a claim transformer is, again, the STS. Instead of populating the issued claims from its own data sources, the STS used by the claim transformer will mainly manipulate the incoming information. Since it is often run by the same entity that owns the RP, as in the classic federation case, this kind of STS is usually referred to as Resource STS, R-STS or RP-STS. I recommend reading my blog for more details (see Resources).
This variant presents some powerful advantages and, in our opinion, will emerge as the dominant pattern for identity management in distributed systems (see Figure 3).
Figure 3. Basic claims transformer pattern. The Subject (1) obtains a token from the IP and (2) uses it to get a token from the claims transformer. The R-STS of the claims transformer processes the incoming claims and (3) issues the subject a new token. The subject (4) uses the new token to invoke the resource.
Claims transformers can offer a granularity continuum between single resources and the directory at the enterprise level. An R-STS can be used for corralling multiple resources with similar requirements. It simplifies the policy management of single resources, moving the complexity of brokering trust and processing raw claims in a single point but without requiring it at the enterprise level. An R-STS can be a virtual boundary protecting legacy resources that may not play too well with the rest of the network. Those are just few examples of the expressive power of claims transformers (see Figure 4).
Figure 4. R-STSes can be used to cluster multiple resources with similar policy requirements, simplifying trust management and providing a means to control the granularity of externalized authentication and authorization logic.
While an IP is usually an independent entity, very often the R-STS has some knowledge of the resources it fronts. Such knowledge may enable the R-STS to perform authorization decisions. The incoming claims can be processed together with the requirements of the resource the subject is trying to access, and the result may be a decision about whether the subject holds the necessary privileges. Such a decision can be expressed in different ways:
· If the subject does not hold any privileges, the R-STS may simply refuse to issue a token and block the invocation. In this case, it would enforce authorization.
· If the subject holds some privileges, the R-STS may formalize those authorization decisions in the form of claims. The RP is hence relieved from the need to run authorization logic, and will just enforce the directives it receives from the R-STS in the form of claims.
The use of tokens, claims, and IPs made authentication externalization possible; the R-STS can, in certain cases, help to do the same with authorization decisions.
There are many resources that are not claims-aware: legacy systems, resources managed directly at the directory level, and so on. Often some investments have been already made to manage access to those resources, such as assigning ACLs and creating groups. The identity of a subject calling a front-end application via claims-based security would not be suitable to leverage those investments; if the front-end application could obtain a directory identity on behalf of the caller, however, the problem would be solved. In fact, the value of obtaining tokens on behalf of somebody else is clear even when the issued tokens remain in the realm of claims.
The beauty of what I’ve described so far is that it adapts to any loosely connected system; and what is the cloud, if not the ultimate distributed system? When you reason at cloud scale, you deal with a world where every entity is managed independently: claims, tokens, trust, and identity metasystem roles can help you make the best of whatever little information you may have about the relationships among the parties you deal with. By describing a claims-based solution, we killed two birds with one stone, giving you tools that can be applied on premise and on the cloud alike.
We still know too little of how things will evolve to give detailed guidance on cloud scenarios. However, with some working hypothesis, we can certainly highlight the aspects of claim aware identity management that are especially well suited for the cloud.
We have seen that a cloud provider may offer services such as storage and compute, messaging, and integration. If you recall what we described in the claims transformer section, you will see that an obvious access control strategy for the cloud provider is to offer an R-STS and have the resources trust it. Every tenant would then be able to govern access control simply by manipulating the claim transformation rules of the R-STS.
Let’s say that you are an ISV, and you want to deploy your services at a certain cloud provider. You decide on the set of claims your services will accept, and you set your services to trust the provider’s R-STS. Your access control strategy is already in place, regardless of who calls your services! When onboarding a new customer, you simply have to define how to map his claims to yours, by setting some rules at the R-STS level. This can be incredibly agile and easy to maintain (see Figure 5).
Figure 5. An R-STS in the cloud is a very powerful tool for managing access control. The resource always handles tokens and claims in the right format, from the same trusted source; the R-STS takes care of handling trust management and credential verification from different sources, using rules to perform the necessary transformations and isolating the resource from changes and unnecessary complexity.
We have seen how an R-STS can conveniently externalize and aggregate authorization decisions; this is, of course, valid in cloud scenarios as well. In fact, there are cases in which this can be pushed further to include authorization enforcement. Getting back to the ISV example:
If the services are exposed by taking advantage of some messaging offering, the dispatch infrastructure itself can take care of enforcing authorization decisions by examining the authorization claims even before the message is routed to its destination. This relieves the ISV from yet another burden: If the services are not exposed using the provider’s messaging infrastructure, the authorization decision claims have to be enforced by adding some processing pipeline in front of the resource. (For more details, see Resources: Vittorio Bertocci’s blog, claims types.)
With the model described, an ISV can represent a customer relationship as a set of rules. This is much more agile than creating a federation relationship by traditional means, directory to directory. While coarser to manage, it also avoids the burden of extra management that is necessary in peer partners relations but would often be overkill for vendor-customer relationships.
The model also allows for accommodating individuals and customers without advanced identity capabilities. The ISV services will always see tokens issued by the R-STS, regardless of whether the customer authenticated with a token issued from his own IP or with simple, one-off username and password.
The shift toward the cloud will happen: the industry is fairly quick to acknowledge that it is not a matter of if, but when. Predicting the future is always a dangerous game, but the cloud is just too exciting for not venturing in a plausible development that, despite looking like an identity utopia today, is in fact perfectly plausible thanks to principles described so far.
Today, access control is often constrained in the railways of organization structure, with privileges reflecting rank rather than function. The number and the caliber of companies and open-source initiatives participating in interop events (see Resources: RSA 2008 OSIS Interop event) and integrating support for claims in their lead products suggests that claims-based programming is on its way to becoming mainstream. Once that happens, it is easy to imagine how identities and access control structures may be corralled around tasks rather than organizations. Subject, resources, roles, and authorities from many different companies could all be described in terms of a specific project, with privileges assigned not according to organization rank but reflecting the function performed in the context of the project itself. Virtual organizations could emerge for the duration of the task, and dissolve once the mission is performed. That would also allow for templating of cross companies efforts, enforcing best practices and improving predictability (see Figure 6).
Figure 6. Policies, claims exchanges, and virtual directory management make possible the semi-spontaneous emergence of virtual organizations (center).
For this to happen, claims are a necessary, but not sufficient, condition: Emerging technologies such as virtual directories will play a key role as well (see Kim Cameron’s blog, www.identityblog.com, for more details).
The claims-based approach is great for traditional and cloud scenarios alike. The difference is that while in traditional scenarios you can sometimes use a great deal of extra resources to get away with bad strategies, once the cloud is upon us, it won’t be that easy. If you want to prepare yourself and your organization for the upcoming wave, here there are some things you can do:
· Experiment with Web services and security, if you are not already doing it
· Experiment with claims-based programming, taking advantage of the Beta release of “Zermatt” Developer Identity Framework ( http://go.microsoft.com/fwlink/?LinkId=122266)
· Consider running pilots within your organizations. For example: create an IP for your employees
Here I could offer the proverbial “we barely scratched the surface of the topic,” and that would be true. Instead, I will say that claims are the cornerstone of exciting developments in identity management for traditional and cloud scenarios alike. I invite the geeks among you to enjoy this special transition moment in which much of those interactions are still evident, and I reassure all the others by foreseeing that most of the details will likely sink in the infrastructure and be made invisible by new breeds of tools.
Vittorio Bertocci’s blog:
· “The Tao of Authentication” I, II and III:
Kim Cameron’s blog:
· “virtual directories”: http://www.identityblog.com/?p=983
“Understanding Windows Cardspace”, Chapter II (Addison Wesley 2008), on MSDN
RSA2008 OSIS Interop event:
· OSIS interop: http://osis.idcommons.net/wiki/Main_Page
Vittorio Bertocci, senior architect evangelist, Cloud Services Evangelism, Microsoft Corp., helps large enterprises leverage new technologies. After a few years in the Italian Microsoft Services, he moved to the U.S. headquarters, where he has spent the past three years helping customers deploy solutions based on identity and access management, SOA, and services. He currently focuses on the identity and access aspects of cloud computing; Understanding Windows Cardspace, the book on user-centered identity he coauthored, was published last January. Vittorio maintains a popular blog at http://blogs.msdn.com/vbertocci.
This article was published in the Architecture Journal, a print and online publication produced by Microsoft. For more articles from this publication, please visit the Architecture Journal Web site.