ACS Retry Guidelines
Published: January 9, 2013
Updated: February 21, 2014
Applies To: Windows Azure
Windows Azure Active Directory Access Control (also known as Access Control Service or ACS) supports a number of different token issuance and management endpoints to which clients can send token requests. This topic defines guidelines for implementing retry logic when token requests fail.
Token request failures that return HTTP 500-series error codes typically respond to retries. In some scenarios, the client is an application or service that makes automated requests to ACS. In other scenarios, such as web-based federation that uses the WS-Federation protocol, the client is a web browser and the end-user must retry the operation manually. This topic covers error-handling scenarios in which the client is an application or service.
These scenarios include:
Management operations that use the ACS Management Service
Token requests for Web Services using the OAuth WRAP protocol (see How to: Request a Token from ACS via the OAuth WRAP Protocol)
Token requests for Web Services using the OAuth 2.0 protocol (see Code Sample: OAuth 2.0 Certificate Authentication)
The following guidelines explain how to implement retry logic in the error-handling scenarios.
Guideline #1: Implement retry logic based on HTTP 500-series error responses
Retry logic is strongly recommended when ACS returns HTTP 500-series errors. The following list includes examples of typical HTTP 500-series errors.
HTTP Error 500 - Internal Server Error
HTTP Error 502 - Bad Gateway
HTTP Error 503 - Service Unavailable
HTTP Error 504 – Gateway Timeout
Although individual HTTP codes can be enumerated in the retry logic, it is sufficient to invoke retry logic if any HTTP 500-series error is returned.
Retry logic should be triggered by HTTP error codes, such as HTTP 504 (External server timeout), and not by ACS error codes, such as ACS90005. ACS error codes are informational and subject to change.
Typically, retry logic is not recommended when HTTP 400-series error codes are returned. A 400-series HTTP error response code from ACS means the request is invalid and needs to be revised. One exception is error code 429 ("Too many requests"), which indicates that the namespace has exceeded the token request rate limit for an extended period. For 429 errors, retries with a backoff timer can resolve the immediate token request backlog until the administrator has time to review and revise the namespace workload distribution. For more information, see ACS Service Limitations.
Guideline #2: Retries should use a back-off timer for optimal flow control
When a client receives an HTTP 500-series error, the client should wait for a specified period of time before retrying the request. For best results, it is recommended that this period of time increase with each subsequent retry. This approach allows transient errors to be resolved quickly while optimizing the request rate for transient network or server issues that take longer to resolve.
For example, use an exponential back-off timer where the delay before retry increases exponentially with each instance, such as Retry 1: 1 second, Retry 2: 2 seconds, Retry 3: 4 seconds, and so on.
Adjust the number of retries and the time between each retry based on your user experience requirements. However, we recommend up to five retries over a period of five minutes. Failures caused by a timeout take longer to resolve.
Guideline #3: Verify that the item does not exist before attempting to create or delete it
When performing create or delete operations with the ACS Management Service, such as creating a new relying party application or deleting a rule, the retry logic should query if the item exists before performing the operation.In some circumstances, such as a transient network failure that occurs while delivering the server response, a creation or deletion operation can succeed even when the client gets an error response.
If a create operation is retried without checking for the existence of the item, duplicate items can being created. , Also, the system might return an HTTP 400 error if the item must be unique.
If a delete operation is retried without checking for the existence of the item, the system might return an HTTP 400 error when it cannot find the item.