Licensee Requirements from Dev, Test, and Ops
March 14, 2001
In the past few columns, I've been looking at the kinds of requirements that your choice of business model can impose on your Web Services. In this column, we'll look at requirements from another perspective—your customer's. In particular, we'll look at the sorts of things that your customer's developers, testers, and operations staff need to make the most effective use of your Web Service.
Feedback on Your Comments
A few of you managed to find the column on authentication and authorization, and one reader had a question about the hashed logon key returned to the client application: "…as a direct comparison can be made against the clear and encrypted portions of the string, it is easily crackable. What measures would you proposed to make this even more secure, and what would the performance overheads be?"
Recall that the purpose of hashing the key is to provide a way to detect most invalid keys without hitting the database of logon keys, mitigating the threat of a denial of service attack in which the attacker sends zillions of service requests to our Web Service with some random value for the logon key. To attack database availability, the attacker would need to generate a key that looked valid. Our keys have the form:
UuidToString(key) & Hex(Hash(key, secret))
Thus to generate a valid looking key, the attacker would need to guess the secret. One way to guess the secret is to intercept a valid key and crack it. However, if you're able to intercept valid keys, you don't need to crack them—you can just pass in the valid key you've intercepted. Since the key was generated by the Favorites Service, it will look valid even after it expires.
To mitigate the risk that an attacker will either guess the secret or use expired keys to launch denial of service attacks, we could periodically change the secret and propagate it to all the servers in our Web farm. This would have minimal overhead, assuming a reasonable interval for changing the secret. One issue is what to do about currently logged on clients. If we just change the secret, all current logon keys become invalid. We could either return an error and force all clients to log on again or validate against hashes computed both with the old and new secrets. Another approach would be to compute a more complex key that included a timestamp. If the timestamp was within the logon interval, then the key would be considered valid enough to check against the database. There would be additional performance overhead on every operation due to the more complicated key calculation and validation procedure.
A couple people suggested that we could reduce vulnerability to reply or denial of service attacks by incorporating the client's IP address into the logon key. The problem with using the client's IP address is that if the client application is sitting behind a bank of proxy servers, the IP address might change on every request. That could result in a scenario where the client application logged on, then the next request is rejected as containing an invalid key. The client application would log on again, the next request would be rejected, the client application would log on again, the next request would be rejected…you get the idea.
If you believe there is a high risk of attacks against your Web Service, you should consider a system-level technique or third-party technique instead of writing your own application-level technique. Most of these techniques (and platform-specific implementations) have been thoroughly analyzed for vulnerabilities and have known best practices to mitigate the effect of attacks. Each of the techniques discussed in the column has strengths and weaknesses that need to be weighed against anticipated threats, consequences of attack, and compromises to usability and performance in order to select the technique that's right for your scenario.
Now let's turn our attention to this week's topic. There doesn't seem to be a standard term for the collection of stuff that your customer's developers, testers, and operations staff need to make the most effective use of your Web Service, so we'll just call it "licensee requirements" for now.
One key developer requirement is likely to be an easy to use programming model. A particular difficulty for Web Service developers is that client applications can be implemented on many different platforms using many different toolsets and programming languages. It's surprisingly easy for platform or toolset dependencies to creep into a Web Service interface (especially in the realm of error handling), to the extent that developers may not be able to easily construct requests or interpret responses using certain toolsets. We'll discuss the issue of API design in more detail in next week's column. For now, unless you know that all client applications will be written using a particular toolset or implemented for a particular platform, keep in mind that there's a potential issue here. One way to detect platform or toolset dependencies is to write your test applications using different toolsets. If possible, you should test using client applications running on a different platform as well.
Developers, testers, and operations staff will also want detailed error responses if a request to the service fails. If a parameter passed to the service is invalid, the error response should indicate which parameter is at fault. On the other hand, if an operation failed due to a problem at your server that the client application can't do anything about, there's no need to report the internal details of the error. The error response should be clear that the problem is with the server and provide sufficient information for the licensee's operations staff to initiate a troubleshooting conversation with your Web Service's operations staff.
Assuming that your Web Service has an acceptable API, developers will need some documentation that tells them how to use it. The minimum documentation, of course, is the service description file. This file tells developers what messages your Web Service accepts and what messages, if any, it generates in response. It also indicates what communication protocols can be used to communicate with the Web Service and where to locate the Web Service's endpoints.
The service description file tells the developer a great deal about the syntax of messages, but very little about the semantics. In essence, the service description file is no different from the header files used by C and C++ developers. If you want developers to call your Web Service operations in the proper order with good values for parameters, you'll need to supply reference pages for each operation. Reference pages should describe the expected values for inputs, the kinds of outputs that may be generated, and any assumptions about the order that operations are called. Reference pages also help client application testers define test cases to cover error paths as well as normal execution paths.
A nice addition to the documentation would be a developer's guide suggesting how to call the Web Service efficiently. For example, instead of querying the Web Service over and over again, perhaps data can be cached by the client application. This will improve performance both of your Web Service (because there are fewer incoming requests) and the client application (because it isn't making as many remote requests). If certain operations are time-consuming or return potentially large amounts of data, the guide could suggest ways to reduce the volume of data returned or ways to call operations asynchronously. You should explain how to handle errors client applications could normally expect to see. You should also think about issues the client application developer may encounter if their application is hosted on a Web farm. As you write applications to test your Web Service, you'll probably collect quite a bit of information to pull into a developer's guide.
One good way to illustrate techniques discussed in your developer's guide is to provide sample code. You won't be able to provide sample code in every programming language for every toolset and platform, but you should try to cover the major toolsets you expect your customers to use. Consider whether you expect licensees to call your Web Service from server-side Web applications, client-side browser-based applications, or standalone client applications. There may be specific issues for each kind of application that you want to illustrate with sample code. For example, there may be security issues calling a Web Service from client-side script code. If you expect your licensees to do that, you might want to provide sample code to show them how it's done.
After the client application developer has written some code, he'll want a way to test his application. Here's another challenge. Do you really want pre-release client applications hitting the production Web Service and data stores? What if you charge money to access your Web Service—does every call to the service during client application development cost money? There are several approaches to addressing this issue. Each has strengths and weaknesses you'll need to consider before deciding which one(s) to supply.
One option is to supply a stub service that can be deployed locally. The stub service should perform exactly the same parameter checking as your production service. It should also provide a way to generate all the error responses your service generates. However, it should be easy to install on a typical developer or test machine. The advantage of this approach is it lets the developer find and fix coding errors in the application without hitting your Web servers. However, this approach does not help identify communication problems across the Internet. Some additional development and test time will be required to implement the stub service, but there is no design or implementation impact on your production service.
A second option is to deploy a test service in parallel to your production service. The test service can be exactly the same as your production service, but is deployed on different servers. A test service can be used to troubleshoot most communication problems, as well as coding errors in the client application. One drawback to this approach is that your operations staff needs to maintain a second installation of your service—one that could be brought down by buggy applications. Another potential problem is that licensees could use the test service from their production applications, instead of using your production service. To avoid this, you might need to define some limits on the test service behavior. For example, the test mode for a Web Service that searched a store catalog might operate against a database with 100 items instead of the 1,000,000 in the production data store. The test mode for a Web Service that retrieves weather information might operate against a database of week-old data. The test mode for a Web Service that stores information might limit the amount of data each client application can store. Depending on what kind of limits you define, additional development, test, or operations work may be required. Finally, you may also want to use a different billing model for the test service than your production service. This might also require additional work to implement and manage.
A third option is to define special accounts or parameters to identify test mode requests. The advantage of this approach is that it can be used to troubleshoot problems communicating with your production service. A major disadvantage, of course, is that pre-release client applications are hitting the production service. In addition, developers may not be able to find and fix errors that occur with real accounts or parameter values. If you support this kind of test mode, you need to anticipate the additional load that developers will generate as they debug their applications. You must also implement the Web Service to anticipate receiving incorrectly formed messages or messages will invalid parameter values. You will need to protect against denial of service attacks launched by sending test mode requests. In addition, you will need to add logic to your Web Service to detect and handle test mode accounts or parameters.
Regardless of which option(s) you supply, you will also need to document how to install, configure, and/or use the test mode.
Licensees will also want to see documentation about known issues with your Web Service. The most important category of known issues is discrepancies between the documented behavior and your implementation. This helps developers and testers determine whether issues they have found are in their application or in your Web Service. Of course the list of issues will change over time, so you'll probably want to maintain this list on your Web site, rather than in a help file or printed documentation.
A second category of known issues is interoperability issues with specific toolsets that might be used to implement client applications. Developers can use this information to identify issues they need to work around with the toolset they are using. Testers will also find this information helpful while troubleshooting issues communicating with your Web Service. Since most toolsets are still in beta or have frequent Web releases and since you can't test with every toolset on every platform, this is another list that will change over time. Again, it would be a good idea to maintain this list on your Web site, not just in a help file or printed documentation.
In addition to guide material for developers writing client applications, testers may want some guidance regarding how to test applications that use Web Services. For example, you could provide a set of suggested test cases covering use of your Web Service to ensure that client applications handle both routine and catastrophic errors properly. You could also provide a set of troubleshooting procedures to help determine whether communication errors are due to problems at the client site, the server site, or somewhere in-between. This might include information about using your test mode, as well as information about how to monitor and interpret SOAP messages on the wire.
If there is anything that must be installed or configured on client machines in order for them to use your Web Service, licensee operations staff will want clear procedures that describe how to set things up. For example, if you use client certificates for authentication, the operations staff needs to know where to install and configure the client certificate. If you are using IP security, the operations staff may need to contact someone with a list of IP addresses to associate with a license before the client application can access the Web Service.
One challenge with providing deployment procedures is that the exact procedures will vary from one licensee to another. However, you should at least be able to supply checklists or baseline procedures that can be customized by the licensee.
Similarly, licensee operations staff will want clear troubleshooting procedures to follow if calls to your Web Service start failing. Again, it can be difficult to supply exact procedures, but you should be able to come up with baseline procedures that can be customized by the licensee.
The troubleshooting procedures for operations staff are somewhat different than the procedures mentioned earlier for testing. The goal of these procedures should be to help the operations staff determine as quickly as possible whether they can resolve the problem themselves (using a supplied recovery procedure), whether they need to call in someone to troubleshoot the client application, or whether they need to initiate a support request with your Web Service support staff. The procedures you supply should specify exactly how to initiate support requests, including how to contact support (phone number, e-mail, fax, URL, etc.), what account information must be supplied (licensee name, password, etc.), and what information to supply with the problem description (error messages, etc.).
In addition to the procedures you define, you may want to supply diagnostic tools such as wizards to guide operations staff through the troubleshooting procedure or tools to capture message traces to send to your support staff.
Testers and operations staff may also want information about the expected response time of the various operations provided by your Web Service. This is difficult to provide, because many of the variables affecting response time are out of your control.
If you decide to provide expected response time information, you should clearly state that the values you supply measure the expected processing time within the Web Service and do not include the time required to send the request and receive the response.
One way to help testers and operations staff monitor performance variations would be to supply a "null" Web Service. This service would be deployed on the same Web farm as your production service. The operations exposed by the service would not do any work, they would simply return a response as quickly as possible, without accessing any remote resources or data stores. Licensees could monitor response times for calls to the null service to analyze network speed.
Another approach would be to supply a profiling mode for your Web Service that could return the processing time for the call in addition to the normal response values. This could be implemented using the SOAP Header mechanism: a client would indicate it wanted profiling information by including a specified Header in the SOAP request, the processing time would be returned in an equivalent Header in the SOAP response. This would enable someone on the client side of things to compare overall response time to the processing time on the server. If there's a huge difference, the performance problem is somewhere in the network.
Even if you chose not to provide information about expected performance, you should let licensee operations staff and testers know about changes to expected performance. At a minimum, there should be an easy way for the licensee to determine whether your service is online or not. If your Web Service is mission critical, you should provide both online and offline access to server status information. If the licensee can't reach your Web Service, they may not be able to reach your Web site either. An alternative would be to provide a phone number to call for current server status information.
You might also want to provide information about temporary changes to expected performance. For example, if you experience a rapid increase in average load, you might let licensees know they may see decreased performance until you are able to bring additional hardware online. If you have operations that are handled asynchronously and a backlog of pending requests have queued up, you could let licensees know that response times will be slow until the backlog is handled.
Finally, licensees will want to know in advance about scheduled downtime for server maintenance. You might want to send notices to licensee operations staff via e-mail, rather than relying on the operations staff visiting your Web site to read the latest announcements.
I've already mentioned customer support a few times, but it's worth pointing out separately. The more mission critical—or expensive—your Web Service is, the more support licensees will expect. At a minimum this would include support for operations issues. You might also want to provide developer support.
Setting up the infrastructure to track support requests and staffing your support mechanisms can be an expensive, time-consuming effort. Figure out what kind of support you want to provide and how licensees can submit requests early in the project lifecycle, so customer support is operational when your Web Service goes live.
To summarize, a Web Service should be accompanied by a "software development kit" (SDK). The SDK should explain how to use the Web Service and possibly provide tools to help developers and testers troubleshoot their applications before they start hammering your production Web Service. Contents of a SDK might include:
- A copy of the service description file, and a reference to the location of the latest service description file.
- Reference pages for each operation supported by the Web Service.
- A Developer's Guide describing best practices for writing client applications.
- Sample code that shows how to write client applications.
- Documentation how to install and/or use a test mode service.
- Additional guide material about how to test client applications.
- Information on expected performance. You might also provide a null service or a profiling mode to help analyze performance of specific calls.
A Web Service should also be accompanied by a "user's manual." The user's manual should include:
- Deployment procedures to ensure client applications are recognized as valid licensees.
- Troubleshooting procedures for operations staff.
- Procedures for contacting customer support if local troubleshooting is unable to resolve a problem.
In addition, you should provide information on your Web site such as:
- Lists of known issues, including bugs and interoperability issues with specific toolsets.
- Information about current server status.
Finally, you should consider providing customer support for operations issues and potentially application development.
Most of these requirements will not impact the design of your Web Service. However, they do represent a substantial amount of work that needs to be accounted for in your project schedule. It's a good idea to do this work in parallel with development of your Web Service. You'll find that your testers will need the references pages in order to construct good test cases, and that the test team will be an excellent source of guide material, sample applications, and troubleshooting procedures that can be cleaned up for release to your customers.
Licensee Requirements for the Favorites Service
When we first started thinking about licensee requirements for the Favorites Service, we really didn't have any example Web Services to look at for ideas. The preceding discussion summarizes the ideas we came up with based on experiences with traditional APIs and components, as well as conversations with other teams at Microsoft.
Cold Rooster Consulting's Approach
The key decision that might impact the design of our Web Service was what kind of test mode and diagnostic services to provide. As Cold Rooster Consulting, we decided to use the parallel test service approach. Client applications would still need to log on to access the test service (using the regular licensee credentials), however they would connect to a different endpoint and usage would not have any impact on licensing fees. To keep client applications from using the test service as a production service, we would arbitrarily limit the number of users per licensee and the number of favorites per user. We decided not to provide a null service or profiling mode in the initial release. This approach has minimal impact on the design and implementation of the Favorites Service, at the expense of operations.
We also decided to provide operations support by phone and ad hoc developer support through a newsgroup. Support information, such as known issue lists and current server status, would be provided on our Web site. Current server status would also be available by telephone.
The functional specification calls for a developer kit including the following items:
- Reference page listing the locations of service description files for all Web Services.
- Reference pages describing each operation exposed by all Web Services.
- A developer's guide explaining how to use the Web Services from a client application.
- A test guide explaining how to test usage of the Web Services in a client application.
- Instructions for accessing the test service and a description of the limitations of the test service.
- Lists of known issues, including interoperability issues with specific toolsets.
- Sample client application illustrating user favorites management operations, implemented using the SOAP Toolkit 2.0, with full source code.
- Sample client application illustrating report operations, implemented using the SOAP Toolkit 2.0, with full source code.
- Troubleshooting procedures for operations staff.
- Information about customer support offerings.
We did not separate out a users manual, but that information is included in the developer kit. In addition, the functional specification calls for the following information to be available on Cold Rooster Consulting's Favorites Web site:
- Discovery document for all Web Services.
- Service description files for all Web Services.
- Online version of all documentation from the developer kit, including current lists of known issues.
- Information about current server status.
None of these requirements impact the design or implementation of the Favorites Web Services. However, there is a considerable amount of documentation to create and get online. We expect that some of the documentation will be fairly sketchy in the initial release. For example, we may not have a lot of information about interoperability issues. However, since the documentation is online, we will be able to update it as we acquire new information.
Differences in the MSDN Sample
Although we approached requirements analysis and design from the perspective of Cold Rooster Consulting, the reality is that the Favorites Service is being implemented as a sample for MSDN. We do not have a large operations staff or the ability to provide phone support. And since we aren't charging money to access the production service, we don't see any need to provide a test mode. So while our design is compatible with deploying a test service, we won't actually do it. On the other hand, we will provide a single machine install of the complete Favorites Service so you can try it out locally.
We won't be providing phone-based support, but we will respond to issues posted on the MSDN Web Services Guidance Feedback newsgroup. Nor will we provide phone-based server status information. Since the Favorites Web site will be deployed on the same Web farm as the Web Services, you'll just have to assume that if you can't reach the site, the service is down.
It can be easy to overlook the supporting tools and documentation your customer's developers, testers, and operations staff need to make the most effective use of your Web Service. While their requirements may not have a significant impact on your Web Service design or implementation, fulfilling the requirements can take a significant amount of work. If you're not sure exactly who your customers will be, try thinking about what tools and documentation you would want at each stage of the project lifecycle if you were writing an application that used your Web Service. The better job you do of providing these tools and documentation up front, the lower your support costs should be.
Next week we'll turn our attention to the most important requirements of your Web Service—the functionality you want to expose to other applications. In particular, we'll look at issues related to defining a message-based programmatic interface that will be used by client applications hosted on different platforms, using different implementations of the SOAP specification, running on machines located around the world, communicating over potentially slow or unreliable connections to the Internet.