February 14, 2001
In my last column, I discussed defining the vision for the Web Services guidance team's first sample project, the Favorites Service. I apologize for the long delay between columns; I've been out for the better part of a month with a nasty cold. I hope things are back on track now for regular weekly columns for the next couple months.
To recap, the goal of the Favorites Service is to provide a way for applications to store end users' favorite links to Web sites in a safe, secure, central location, so that a user can access his or her favorites through these applications, regardless of which machine the user happens to be using. From a technical perspective, this seems like a pretty straightforward service to implement. It's basically just a specialized data store.
About the same time we started looking at the Favorites Service, there was a flurry of news articles about user privacy—specifically about the information third parties could collect through advertisements on Web pages. This got us thinking: The whole Web Services model is based on a Web page that uses third-party services, most likely without the knowledge of the end user. Were there privacy issues to worry about?
Even without a good definition of user privacy, we were able to come up with a few possible scenarios our proposed Favorites Service would enable that seemed questionable. Based on our initial research into the issues, we decided to implement the Favorites Service in phases, deferring the questionable scenarios to later phases. In this column, I'll discuss what we discovered during our initial research, the hard problems we've deferred, the privacy issues that remain in phase one of the project, and the impact of these on our design and implementation.
Let's start by looking at what we mean by user privacy. We will focus our discussion on user privacy and the Web. Whenever you use the Web, there are three kinds of information that might be exchanged between the application you are using (such as a Web browser) and the Web sites the application is connected to (such as the pages displayed in the browser):
- Information you create using some combination of applications, such as the e-mail you write, your vacation photos, your financial records, and so on.
- Information about you, such as your name, address, personal interests, and so on, collected by an application in order to provide services to you.
- Information about the machine and/or network connection you are using, such as an IP address, collected by an application in order to provide services to you.
The issue with user privacy is how this information is collected, used, and distributed. If you buy a book from an online bookstore, of course you will need to provide your name, address, and credit card number in order for the bookstore to complete the order. But what if the bookstore dumps this information in a database, along with records of the specific books you've purchased? On one hand, it could use the information to provide useful services, such as notifying you when new books by your favorite authors are published. On the other hand, it could sell your personal information, resulting in a flood of unwanted junk mail. What constitutes fair use of this information by the companies that provide the applications you use?
The situation becomes even more uncertain with Web Services. The end user probably doesn't even know that any Web Services are being used. If a Web Service collects information that can be tied to a specific end user (known as personally identifiable information), how does the service provider inform the user about what information is collected and how it may be used or distributed? Do applications that distribute the personal information to Web Services need to disclose this to end users? Traditionally, businesses have not disclosed that they outsource particular aspects of their business processes. For example, a company might not disclose that order fulfillment or customer support are outsourced, although both the order fulfillment company and customer support organization have access to personal information about customers. But the rules may be different online. Only time will tell…
Fair Information Practices
Fair information practices keep customers informed and in control of their personal information. Such information is protected from unwanted use, access, or distribution, so that customers are confident and satisfied when using a company's products. Our first step toward understanding what user privacy meant to the Favorites Service was to read up on Microsoft's fair information practices. Microsoft's Corporate Privacy Group defines five elements of fair information practices:
- Notice. Your company should define a clear policy regarding the collection, use, and distribution of personal information. This policy should include primary and secondary use of data, distribution of data across business divisions within the company, data sharing with affiliate and non-affiliate businesses, and contract obligations with vendors who support business transactions. The company should establish guidelines for policy changes and the impact of changes on data collected prior to the change. You'll want to work with your legal advisors to make sure the policy is something you can enforce in your Web sites and Web Services. Make the policy available to customers and users through multiple distribution channels, including online and offline.
- Consent. You should provide flexible and accessible mechanisms for users to manage their preferences for data collection, use, and distribution. You'll need to categorize information into reasonable and meaningful groupings so that users can figure out what they are consenting to, and so that it doesn't take too long for the user to set up preferences. It's important to think about the default values for user preferences. Does the user need to explicitly enable a particular use of personal information (known as opting in) or explicitly disable the use (known as opting out)?
- Access. The user should be able to view and/or edit any personal information you store, to ensure that it is kept up to date and to manage usage preferences. You'll need to figure out which information the user can edit and which information can only be viewed. For example, the user might not be allowed to edit a unique user identifier, but could be allowed to edit a password. Ideally the tools to manage personal information would be available to both online and offline users.
- Security. You should implement appropriate security measures to protect users' personal information. This includes authentication and authorization mechanisms to protect access to stored data. It also may include mechanisms to protect data during transmission between machines. Security measures should be proportional to the sensitivity of the information. For example, you'll be a lot more concerned about security if you're working with a user's bank account or medical records than if you're working with a list of his favorite authors.
Fair Information Practices and Favorites
This all sounded reasonable in theory, but it still wasn't completely clear to us how this applied to our Web Service, or how you would implement all these elements for Web Services in general. So I spent a few hours discussing the issues with a member of the Corporate Policy Group. We started with a list of scenarios the Favorites Service could potentially enable (based on our initial vision statement):
- A user installs some add-in to Internet Explorer that provides a set of menu options such as Internet Explorer favorites, except that the favorites are actually stored at coldrooster.com. (If you read the last column, you know that we defined a business scenario around a consulting firm. We can now reveal that the name of this fictitious consulting firm is Cold Rooster Consulting, in honor of the rooster that has been hanging around our building on the Microsoft campus. Hence coldrooster.com.)
- Coldrooster.com provides a Web application that lets users manage their favorites.
- A Web site, say msdn.microsoft.com, provides a button on each of its pages that a user can click to add that page to the user's favorite stored at coldrooster.com.
- Msdn.microsoft.com provides a Web page that displays a user's favorites, which were originally stored by msdn.microsoft.com on the user's behalf.
- Msdn.microsoft.com provides a Web application that lets a user manage the favorites that were originally stored by msdn.microsoft.com on the user's behalf.
- Cold Rooster Consulting periodically takes all the stored favorites, stripped of any information that links them back to a particular user, and dumps them into a separate database for analysis.
- Msdn.microsoft.com provides a Web page that displays all of the favorites stored by a user, regardless of the Web site that originally stored the favorite on the user's behalf.
- Msdn.microsoft.com provides a Web application that lets users manage all of their favorites.
- Cold Rooster Consulting provides a separate Web Service that msdn.microsoft.com can license. This service lets licensees retrieve information such as "favorite favorites" or "people who saved this page also saved these pages," but only for the msdn.microsoft.com domain.
- Cold Rooster Consulting provides the Web Service described in scenario 9, except that the recommendations returned to msdn.microsoft.com can include favorites from other domains.
Since we would need to link a user's favorites to their personal identification, such as an e-mail address or Microsoft Passport identifier, in order to make all the user's favorites available through any application and any machine, user favorites data definitely fell within the category of personally identifiable information. If we stuck with this definition of the Favorites Service, we would need to implement fair information practices through a combination of policy, procedures, and code.
From a security standpoint, user favorites don't fall into the same category as medical records, but a user will still want to have some control over who can access them. For example, by looking at the favorites I have stored on my home machine, you could find out which sports teams I support, what kinds of books I like to read, what kinds of music I like to listen to, and where I have my bank accounts—not information I want everyone in the world to have access to. And if anyone could modify my favorites, they could replace the links I've selected with other sites (possibly for nefarious purposes, such as intercepting confidential information) or add new links to my favorites. So we would definitely want to secure access to user favorites. And we'd probably want to let users specify which applications could read or write which favorites. For example, I might let MSDN modify my favorites for the msdn.microsoft.com domain, but I wouldn't want MSDN to even see the links for my favorite sports teams. Why should MSDN care about those?
To let users control which applications could read or write which favorites, we would need to implement the consent and access elements of fair information practices. We'd also probably want to implement auditing code to support the enforcement element.
Suddenly our simple little Web Service doesn't sound so simple! What level of control should we give users? Should we let them specify exactly which applications can read or write favorites from each domain? Or should we group applications and domains into zones to simplify configuration? And which of the scenarios listed above should be enabled by default?
The remaining scenarios become increasingly dicey from a privacy perspective. That's not to say that they shouldn't be implemented, just that it would be harder to write an accurate yet understandable policy statement, and users might not be comfortable with the scenarios, so they probably should be disabled by default (the user must opt in).
Scenario 7 initially sounds pretty innocuous, but what it really means from a Web Service perspective is that an application can get a copy of all of a user's favorites from the Favorites Service. Once the application has a copy of the data, it can do whatever it wants with it. If we supply a Web Service that supports this scenario, we'd probably want to restrict access to the Web Service to known clients with privacy policies that meet some minimal criteria.
Scenario 8 is even more problematic. Once an application has the ability to modify a user's favorites, what's to prevent the application from adding random pages to the user's list or deleting a favorite that points to a competitor's site? In other words, how can the Web Service distinguish valid service requests made by an application on behalf of an end user from service requests made by an application that the end user is unaware of? The available security mechanisms that work with HTTP and XML don't really support this kind of client/server/service scenario directly—we'd need to implement some custom security solution. Even with the custom security mechanism, there would probably be additional work required to provide a way for users to specify which applications could edit which favorites.
Finally, scenarios 9 and 10 go even further into the realm of online profiling than scenario 6 does. The technical issues really aren't any different from those already mentioned, but the user discomfort level would be even higher.
Based on this analysis of the scenarios, we decided to step back and rethink the vision for the initial delivery of the Favorites Service. The new vision for phase one focuses on scenarios 3 - 5 above. Essentially each application has its own private store for user favorites. If I go to msdn.microsoft.com and store a link to this column, I can only view or edit that link through the user interface msdn.microsoft.com provides.
This approach eliminates several hard problems. In fact, it eliminates the entire user privacy issue as it relates to user favorites! Since each application that uses the Favorites Service effectively has a separate store of user favorites , there's no need for a global user identification scheme that the Favorites Service understands. Each application can use whatever kind of identifier it wants. The Favorites Service has no way of interpreting these identifiers or correlating information stored by different applications. Because the data can only be accessed by a single application (or, more precisely, a single licensee of the Favorites Service), we don't need to worry about providing a way for users to opt in or out of various scenarios. We've effectively delegated the issue of user privacy back to the calling application.
That's not to say we don't care about solving the technical challenges raised in our analysis of the scenarios above. We do want to address these in a future phase of the Favorites Service. We just want to take a bit more time to think things through and come up with a solution we feel comfortable recommending to the developer community.
The Remaining Privacy Issues
Although we don't need to worry about user privacy with respect to user favorites in phase one, there are still some privacy issues to think about. We decided to license access to the Favorites Service. This means that we'll need to maintain some contact information about licensees. That information falls into the category of personally identifiable information. So we have the standard privacy issues that any application that maintains account information faces.
We've addressed these issues using a combination of policy and code. The following diagram provides a high level view of our system architecture:
Figure 1. Favorites Service architecture in phase one
Our service is implemented with a layered architecture and is deployed on two physical tiers, the Web farm and the data cluster. Licensee account information is stored in a database on the data cluster. Our Web Service and the Web site through which licensees manage their account information are deployed on the Web farm. There are several layers of protection for licensee information:
- The data cluster is not accessible from machines outside Cold Rooster Consulting.
- The Favorites Service does not need to access licensee contact information, so it uses a Logon component to authenticate licensees. The Logon component only retrieves the information it needs.
- On the other hand, the license management Web site does need to access licensee contact information. How else can it let the licensee edit the data? The Web site performs all data access through the Licensing component. Access controls on the Licensing component prevent anything other than the license management Web site from calling the component.
- Access controls on the licensee database prevent anything other than the Logon component and Licensing component from accessing the database.
- Confirmation e-mails are sent to addresses specified in contact information whenever contact information is modified.
Net effect: It should be very difficult for unauthorized users to access or modify the licensee contact information, unless a licensee's identifier and password are compromised. Even in that situation, if someone attempted to change contact information, the current contact would be informed.
User privacy is a thorny issue for developers of Web Services and the applications that use them. Our analysis of the problem for the Favorites Service caused us to rethink the entire objective for the service. Even with the reduced scope, a significant number of requirements were added in order to ensure that user information was protected from inappropriate use. The most important requirement was the need to restrict access to licensed applications. Next week, we'll look at licensing in more detail: the business models we considered, the model we selected, and the impact of the model on our design and implementation.
If your Web Service needs to maintain personally identifiable information, you have a lot of work to do beyond implementing the core functionality of your service. You need to address all five elements of fair information practices: notice, consent, access, security, and enforcement. You'll need to determine when you must address these directly with users, and when you can defer the privacy issues to the applications using your Web Service. I highly recommend involving your legal advisors in discussions regarding these issues to ensure that you are up to date regarding user privacy laws wherever your users are located. The following resources provide additional information about user privacy: