by Fernando Gebara Filho
Summary: The purpose of this article is to show the readers how digital identities are evolving over time, from the needs of a single user running within a single computing platform to today’s world of multiplatform identity federation and privacy concerns. It can be used to help infrastructure architects design effective identity infrastructures that are not over-engineered, but instead based on current and projected needs of the organization.
A (Very) Brief (and Simplified) History of Identities
A Nontechnical View of the Digital-Identity Anatomy
The Managed and the Unmanaged User Spaces
Competencies, Commoditization, and Real Value
The Challenges of Identity Strategies
There was a time when very few people needed a digital identity. They were specialized workers in a brand new profession, nowadays known as Information Technology (IT). They used a very limited set of devices, accessing a very limited set of applications. There was no online access; the only way to make the machines understand what to do was to feed them tremendous quantities of punched cards, loaded with machine-like language constructs. There were very few bad guys those days; the worst were the ones who hustled you, so that all of those carefully ordered punched cards were spread out over the floor.
This is not the reality today. We live in a globally connected world; the vast majority of people who use the technologies that once were reserved for scientists and hardcore engineers are now lay users—with little to no understanding of all of the processes that are running inside their computers. There are unknown numbers of very bad guys who want to spy on you in order to determine your password to your banking account, so that they are free to steal from you without the need for proximity. These malicious users can live far way from our homes, but it takes just a few seconds for them to track all of our activities. Some of them don’t even want our passwords; they want to steal our home-processing power to run complex hacking algorithms or distributed attacks to carefully chosen targets in the Internet. And, so, computer programmers must constantly work on digital-identity technologies in order to handle all of those changing landscapes.
In the beginning...
… there was almost no interest in creating and managing identities and their security contexts. Why? We lived in a world of mainframes and mini-computers, submitting huge computational jobs through punched cards and printing stacks and stacks of paper on mechanical printers (but only if we were IT professionals or attending University classes at that time). Our identity was nothing more than an identifier, determining who submitted the job and who owned that big amount of paper (usually, printed on the first page of the paper stack).
There was no security context at all in our identities. The user name/password pair was even printed in the punched card set, so that there was absolutely no secrecy involved. However, there was no need for it, especially in the commercial/academic world; except for a few individuals, there was no interest in stealing other people’s jobs (JCL jobs, that is). The only necessary secrets were in the realm of military installations. Identities were used only in the context of a single machine. If you wanted to use another computer, another user name/password pair had to be created, and there was no connection among the identities in the machines that you were allowed to use.
Basically, identities were not used to really identify you. Their only purpose was to generate an identity under which a process was run and the results could be sent to you. There was a very weak connection between you and your digital identity.
With the advent of distributed computing, network logon became a necessity, and technologies and protocols were specially created to handle those needs. But they evolved from the context of the so-called workgroup computers to the full domain-based central directory. Workgroup computers were really a set of workstations with a “master” element that took care of presenting the individual members as a cohesive entity. However, this was only a view of the reality: You could enumerate workgroup members and resources but, when trying to access one of them, you had to be registered in the local identity database of the member that held the resource; and this workgroup member was responsible for checking if the credentials you used were correct.
The workgroup concept evolved alongside the network. When file and print servers became popular, they were also responsible for holding the user identity database and running the algorithms that checked the presented credentials’ validity. Initially, one server was enough for daily jobs, but as quickly as we could spell “network,” the need for networked servers showed up in our lives. This brought the challenge of presenting the user identity as a unique entity among all of those computational resources: If I wanted to use a printer, no matter which server held the printer queue, I had to identity myself using my single set of network credentials and get the job done. In the first years of the networked servers, a simple and effective (at that time) artifact was used: identity database replication. All servers that were part of a known and trusted set of servers replicated its user database, effectively implementing the concept of a single sign-on to network resources. Obviously, this mechanism had its limitations. When dealing with a large number of servers, replication delays and even inconsistencies were commonplace.
This may have been one of the first times when there was a clear relationship between identities and the individual who held them, because the same set of credentials (user name/password) were used to access a set of network resources.
Then came the concept of the network domain. In it, a set of workstations and servers are managed under a central credential database, effectively allowing the creation of a common security context among all domain network resources and processes. In the network domain model, not only users but printers, workstations, and services are assigned a set of credentials that allow the execution of processes and communications among them. A user’s credentials will validate only on a workstation that is part of the same domain.
This model also allowed the creation of trust relationships between disparate domains. With it, users who are controlled by domain A can access resources from domain B, if and only if domain B is set to trust the credentials from domain A. This allowed for more flexible identity management, because there was no need to replicate or clone identities from domain A to domain B if the trust relationship has been previously established.
Unfortunately, this model requires that all domains that are part of the same trust mesh use the same set of technologies, making it very difficult to share resources among loosely coupled directories or directories from different technology platforms.
New sets of technologies were created and standardized to handle the transmission of user identities among loosely coupled network domains. They are collectively called identity-federation systems: A predefined, cross-platform, standardized set of protocols designed exclusively to transmit user security contexts to allow one network domain to share resources with another network domain. These sets of standards-based protocols are friendly to the Internet infrastructure, allowing the sharing of resources even in the absence of dedicated network links.
As can be inferred from the preceding paragraphs, digital identities had to evolve from a single pair of user name/password to a very complex set of protocols that transport lots of user-related claims and attributes.
Because of this evolution of the use and sharing of identities, a new understanding of digital identities was needed. In a simplified, nontechnical view, a digital identity can be seen as a set of at least four layers (see Figure 1):
Figure 1. Layered view of a digital identity
3. Main profile
4. Context-based profile
In this model, the innermost layer is the unique identifier of this digital identity. There should be no identical unique identifiers in the same security domain. Users must have unique identifiers to be easily recognizable by any system to which they have access.
To have access to this identifier, the user must present to the security system a set of credentials that will be checked against the computing secrets stored in the identity database. This is the nearest that we can get in terms of proof of possession. A user is entitled to access their own unique identifier if and only if that user presents the correct set of credentials to unlock this magic number.
After the user unlocks the unique identifier, a set of common attributes (called the main profile) is accessible to a system that may require a little more knowledge than the unique identifier itself. This can be, for example, the user name, department, Social Security number, company name, and so on. These attributes do not vary from system to system; they are the same wherever the user logs on. They are a fixed set of values that are tied to the user during the lifetime of the logon session.
But not all systems need to share information. There are sets of information that are only meaningful in the context of a system or related systems. For example, frequent-flier miles are meaningful only under the context of an airline carrier, losing most of their semantic meaning if they are moved from one carrier to another. However, they share the same semantic context when they are used by the mileage program associates (restaurants, hotels, credit cards, and so on). These sets of attributes are stored in the context-based profile.
One digital identity can have only one unique identifier, a limited set of credentials (user name/password pair, digital certificate/PIN pair, biometrical data), a unique set of main profile attributes, and an unlimited set of context-based profile attributes.
The separation of the unique identifier from the set of credentials that are used to access it allows us to evolve the security mechanisms without the need to create a different ID for the same user over time. It makes a unique identifier last for the lifetime of the individual who is the subject of the identity. This is also the base of the current identity-federation systems: No matter what protocols and credentials are used to unlock the unique identifier, the same value will be presented any time that it is required.
The separation of the unique identifier from the multiple sets of profile attributes also helps us evolve the semantic and syntactic meaning of these attributes without affecting the user ID. One can use binary or XML-encoded data to store and/or present user-profile attributes without interfering in the now stable unique identifier.
Today, digital identities fall into two different, and sometimes incompatible, worlds: the managed and the unmanaged user spaces. What are the differences between them, and how can we build an effective digital-identity strategy?
Managed users are the ones to whom you can apply corporate security policies during the lifetime of their interactions with your network resources. They are full-time and part-time employees, vendors, trainees, and so on. They normally have signed a labor (or equivalent) contract with your company, and so they are subject to all corporate policies regarding working hours, safe Internet browsing, workstation and laptop security procedures, and so forth. The internal IT staff controls every device that it uses and is able to block the access to any network resource when it detects misbehavior. Also, the managed users are subject to full auditing on their activities and are not the owners of the data that they store on local and network storage.
Unmanaged users live in a different world and have completely different security and privacy requirements. The unmanaged user is your customer or your business partner; it can be you, a managed user in your company’s internal network, but a customer (unmanaged user) at an online bookseller site. Unmanaged users are the owners of the information they produce, the owners of their financial data and your IT staff has limited action, except in the case of misuse of the resources or fraudulent actions. Essentially, they use your systems when they are connected to the Internet and do not have direct access to the resources in your corporate network. They can imagine what your network looks like, but you have to create and manage all necessary resources (or views) that hide the internal network’s topology and functional specifications from their eyes.
Unmanaged users pose another, different set of challenges than that of managed users: privacy. Privacy issues are difficult to handle and exposure of user data to theft can put your company’s reputation at risk. These challenges resulted in the proposal of Kim Cameron’s seven laws of identity (see http://www.identityblog.com/stories/2005/05/13/TheLawsOfIdentity.pdf for more information):
Law #1 User Control and Consent
Technical identity systems must only reveal information identifying a user with the user’s consent.
Law #2 Minimal Disclosure for a Constrained Use
The solution that discloses the least amount of identifying information and best limits its use is the most stable long-term solution.
Law #3 Justifiable Parties
Digital-identity systems must be designed so that the disclosure of identifying information is limited to parties that have a necessary and justifiable place in a given identity relationship.
Law #4 Directed Identity
A universal-identity system must support both “omnidirectional” identifiers for use by public entities and “unidirectional” identifiers for use by private entities—thus, facilitating discovery while preventing unnecessary release of correlation handles.
Law #5 Pluralism of Operators and Technologies
A universal-identity system must channel and enable the interworking of multiple identity technologies run by multiple identity providers.
Law #6 Human Integration
The universal-identity metasystem must define the human user to be a component of the distributed system integrated through unambiguous human-machine communication mechanisms offering protection against identity attacks.
Law #7 Consistent Experience Across Contexts
The unifying identity metasystem must guarantee its users a simple and consistent experience, while enabling separation of contexts through multiple operators and technologies.
Faced with so many choices and challenges in identity management, some users are asking: Should this be at the core of my IT services or should I delegate all of these activities to a third-party? This question leads to a deeper business question: What is the value of all of these identities for my company? How much time/money do I spend with identity management, and what is the corresponding return on investment? In the world of Software + Services and Software as a Service, how do I create value from identity management? Is there value in outsourcing it? We must choose solutions to these problems carefully, because (as expected) they are highly dependent on each company’s business model and current technology infrastructure.
When drilling down the business justifications and current technology trends, one can reach the following conclusions:
1. Identity management is not a core business activity for the company.
2. There is no value that can be obtained from identity management.
3. Identity management is still a nontrivial technology, so that specific competencies must be nurtured inside the IT department.
4. Identities are becoming a commoditized IT function.
You have to understand how digital identities are being used by your systems and what kind of resources you are providing, allowing, or denying access to. When protecting internal network resources, it makes sense to retain the identity-management competency in-house, mainly because it is being used to build and maintain strict access-control lists (ACL) and is generating tons of logged data that will be used later when required by law or by internal auditing procedures. On the other side, customer access to internal resources is rare, and you can get much more value from a consistent and updated user-profile and historical interaction data than from the identity-management activities. Customers usually have access only to application-generated views of data and business logic, and do not interact directly with internal IT data or equipment.
There is another aspect when dealing with customer-related data. Some countries restrict the storage location of their citizens to in-country hosted servers, demanding that you build and manage a local server infrastructure to handle their data and associated tasks.
This brings us back to the digital-identity anatomy presented earlier. Companies can extract more value from the information contained in the main and context-based profiles than from the identifier and credentials layers. Only companies that are dedicated to the business of generating and managing digital identities (like digital-certificate issuers) generate value from the two innermost layers. The handling of the processes of authentication is being commoditized, with little value coming from them. The value lies in the information that can be generated from users, and not the processes that authenticate the user.
If you are building a new digital-identity strategy for your company, first think about the audiences with which you will be dealing and the infrastructure that your company will be using.
There are two audiences you will be exposed to: internal users and external customers. You cannot get rid of your internal users:
They will be the ones who will help the company be successful, the ones who will create value for the business. They are part-time and full-time employees, vendors, and trainees. Depending on your business, you might not have a significant number of external online customers. Customers will always exist, but they might not always be represented by digital identities and might not be worth tracking online. However, if you are part of the IT team of an online bookseller, for example, you will have to think about how you will handle the IDs for external customers.
For internal users, you might build an identity-management infrastructure to handle all authentication, authorization, and internal network resource accesses, providing the correct auditing functions for future behavior and fraud analysis. However, if your company is a start-up and wants to minimize the costs of building and maintaining its own infrastructure (and if the regulations allow), you will probably use cloud services to host your computing needs. If so, use either state-of-the-art identity-management systems that are able to federate with external systems or identity cloud services (like Windows Live ID) to handle all of your internal users’ needs.
For external users, I tend to recommend the use of identity cloud services for managing IDs. As explained earlier, the value lies in the main and context-based profiles, and not in the identifier and credential layers. You would have to build an infrastructure that was big enough to handle current and future customer audiences, which can lead to a big infrastructure that is dedicated to customer identity management. Also, think about the benefits of attracting more users, because you are using a system that already has millions of preconfigured users that will not have to register a new set of credentials and will not have to learn new procedures for authentication.
Digital identities and their use are still evolving, based on the evolution of online services that are provided by the ubiquity of the Internet. Identity management is no longer merely a set of procedures for authentication, authorization, and provisioning of user accounts. If you understand how a digital identity works and how the protocols are evolving over time, you will be able to build for your company a much stronger and lasting identity architecture. Today, building a new identity architecture means working with systems and protocols that allow identity federation—further enhancing the use of your internal and/or external identities.
Fernando Gebara Filho is an infrastructure architect in the Development and Platform team at Microsoft Brazil. He joined Microsoft in 1999 as an infrastructure consultant at the local Microsoft Consulting Services team, where he specialized in Active Directory and Microsoft Exchange projects. In 2004, Fernando joined the Development and Platform group in the infrastructure-architecture role, where he had a chance to get more involved in the identity-management space and get a more agnostic view of the subject.
This article was published in the Architecture Journal, a print and online publication produced by Microsoft. For more articles from this publication, please visit the Architecture Journal Web site.