6 out of 6 rated this helpful - Rate this topic

Windows Azure Traffic Manager Overview

[This topic contains preliminary content for the CTP release of Windows Azure Traffic Manager. To begin using the feature, go to the Virtual Network tab located in the Windows Azure Management Portal.]

The Windows Azure Traffic Manager allows you to control the distribution of user traffic to Windows Azure hosted services. The hosted services can be running in the same data center or in different centers across the world. Traffic Manager works by applying an intelligent policy engine to the Domain Name Service (DNS) queries on your domain name(s).

The conceptual diagram (Figure 2) demonstrates Traffic Manager routing. The user uses the company domain www.contoso.com and eventually reaches a hosted service to service the request. The Traffic Manager policy dictates which hosted service receives the request. Although Traffic Manager conceptually routes traffic to a given hosted service, the actual process is slightly different because it uses DNS. No actual service traffic routes through Traffic Manager. The user computer calls the hosted service directly when Traffic Manager resolves the DNS entry for the company domain to the IP address of a hosted service.

Traffic Manager routing

Figure 2 - Traffic Manager routing: conceptual diagram


DNS sequence

Figure 3 – Traffic Manager routing

The numbers in the above diagram (Figure 3) correspond to the numbered descriptions below:

  1. User traffic to company domain: The user requests information using the company domain name. The typical process to resolve a DNS name to an IP address begins. Company domains must be reserved through normal internet domain name registration are maintained outside of Traffic Manager. In this diagram, the company domain is www.contoso.com.

  2. Company domain to Traffic Manager domain: The DNS resource record for the company domain points to a Traffic Manager domain maintained in Windows Azure Traffic Manager. In the example, the Traffic Manager domain is contoso.trafficmanager.net.

  3. Traffic Manager domain and policy: The Traffic Manager domain is part of the Traffic Manager policy. Traffic enters through the domain. The policy dictates how to route that traffic.

  4. Traffic Manager policy rules processed: The Traffic Manager policy uses the chosen load balance method and monitoring status to determine which Windows Azure hosted service should service the request.

  5. Hosted service domain name sent to user: Traffic Manager returns the DNS Name of the hosted service to the IP address of a chosen hosted service to the user. The user’s Local DNS resolver resolves the domain to the IP address of a chosen hosted service.

  6. User calls hosted service: The user calls the chosen hosted service directly using the returned IP address. Since the company domain and resolved IP address are cached on the client machine, the user continues to interact with the chosen hosted service until its local DNS cache expires.
    It is important to note that the client resolver in Windows caches DNS host entries for the duration of their time-to-live (TTL). Whenever you evaluate Traffic Manager policies, retrieving host entries from the cache bypasses the policy and you could observe unexpected behavior. If the TTL of a DNS host entry in the cache expires, new requests for the same host name should result in the client resolver executing a fresh DNS query. However, browsers typically cache these entries for longer periods, even after their TTL has expired. To reflect the behavior of a Traffic Manager policy accurately when accessing the application through a browser, it is necessary to force the browser to clear its DNS cache before each request.

  7. Repeat: The process repeats itself when the client’s DNS cache expires. The user may receive the IP address of a different hosted service depending on the load balancing method applied to the policy and the health of the hosted service at the time of the request.

The following items include additional details about this process:

You can choose one load balancing method per policy. You cannot mix different methods in a given policy. The load balancing methods are:

  1. Performance: Select Performance when you have hosted services in different geographic locations and you want to use the "closest" hosted service.

    For more information, see Performance load balancing method in Windows Azure Traffic Manager.

  2. Failover: Select Failover when you have hosted services in the same or different datacenters and want to use a primary service for all traffic, but provide backups in case the primary or the backup hosted services go offline.

    For more information, see Failover load balancing method in Windows Azure Traffic Manager.

  3. Round Robin: Select Round Robin when you want to distribute load equally across a set of hosted services in the same datacenter or across different regions.

    For more information, see Round Robin load balancing method in Windows Azure Traffic Manager.

The following sections outline the details of each of the load balancing methods.

The Performance load balancing method knows the origin of the traffic and routes it to the closest hosted service. “Closeness” is determined by a network performance table showing the round trip time between various IP addresses and each Windows Azure data center. This table is updated at periodic intervals and is not meant to be a real time reflection of performance across the Internet. This method does not take into account the load on a given hosted service although WATM monitors your hosted services based on the method you choose and does not include them in DNS queries if they are down.

The following graphic shows the general process for the Performance load balancing method.

Performance load balancing flow

Figure 4 - Conceptual flow of the Performance load balancing method for Windows Azure Traffic Manager

The following numbered steps correspond to the numbers in the diagram (Figure 4).

  1. Performance Times Table built periodically: The traffic manager infrastructure runs tests to determine the round trip times between different points in the world and the Windows Azure data centers which run hosted services. These tests are run at the discretion of the Windows Azure system.

  2. Incoming request: Your Traffic Manager domain receives an incoming request from a client computer.

  3. Performance table lookup: Traffic Manager looks up the round trip time between the location of the incoming request and the hosted services that are part of your policy using the table created in step 1.

  4. Best performing service chosen: Traffic Manager determines the location of the hosted service with the best time. In this example, that is HS-D.

  5. DNS name returned: Traffic manager returns the DNS name of hosted service D to the client machine.

  6. Service call: The client computer resolves the DNS name to the IP address and calls the hosted service.

Points to note:

  • When the performance table is updated, you may notice a difference in traffic patterns and load on your hosted services. These changes should be minimal.

  • The DNS Time-to-Live (TTL) informs the Local DNS resolvers how long to cache DNS entries. Clients will continue to use a given hosted service until its local DNS cache expires.

  • Traffic Manager monitors the state of the hosted services in your policies and will not send traffic to those it sees as down. If all hosted services in the policy are detected to be offline, Traffic Manager acts as if all hosted services in the policy are actually online so it will at least resolve the DNS name to an IP address, even though the client computer will call a hosted service that may not respond. For more information about how hosted services are monitored, see Monitoring hosted services in Windows Azure Traffic Manager.

  • For information about the Service Management API for Traffic Manager, see Operations for Traffic Manager.

If the primary hosted service is offline, traffic is sent to the next in order. If both the 1st and 2nd hosted services in the list are offline, the traffic goes to the 3rd and so on.

Failover load balancing flow

Figure 5 - Conceptual flow for the Failover load balancing method for Windows Azure Traffic Manager

The following numbered steps correspond to the numbers in the diagram (Figure 5).

  1. Your Traffic Manager domain receives an incoming request from a client.

  2. Your policy contains an ordered list of hosted services. Traffic Manager checks which hosted service is first in the list. It verifies that the hosted service is online. If the hosted service is unavailable, it proceeds to the next online hosted service. In this case HS-A is unavailable, but HS-B is available.

  3. Traffic Manager returns the DNS entry to the client. The DNS entry points to the IP address of HS-B.

  4. The client calls HS-B.

Points to note:

  • The DNS time-to-live (TTL) informs the Local DNS resolvers how long to cache DNS entries. Clients will continue to use a given hosted service until its local DNS cache expires.

  • Using a failover load balancing method without monitoring is not useful because all traffic will always be sent to the primary hosted service.

  • Traffic Manager monitors the state of your hosted services and will not send traffic to hosted services it thinks are down.

  • If all hosted services in the policy are offline, Traffic Manager returns the DNS name of the primary hosted service.

  • For information about the Service Management API for Traffic Manager, see Operations for Traffic Manager.

The Round Robin method splits up traffic across various hosted services. It keeps track of the last hosted service that received traffic and sends traffic to the next one in your list of hosted services. If you have set up monitoring, this this method will not send traffic to hosted services that are down.

For more information, see Monitoring hosted services in Windows Azure Traffic Manager.

For information about the Service Management API for Traffic Manager, see Operations for Traffic Manager.

The following graphic shows the general process for the Round Robin routing method.

Round Robin load balancing flow

Figure 6 - Conceptual flow for the Round Robin load balancing method

The following numbered steps correspond to the numbers in Figure 6.

  1. Your traffic manager domain receives an incoming request from a client.

  2. Your policy contains an ordered list of hosted services. Traffic Manager knows which service received the last request.

  3. The Traffic Manager sends the DNS entry that points the next hosted service in the list back to the client computer. In this example, this is hosted service C.

  4. The Traffic Manager updates itself so that it knows the last traffic went to hosted service C.

  5. The client computer uses the DNS entry and calls hosted service C.

Points to note:

  • The DNS time-to-live (TTL) informs the Local DNS resolvers how long to cache DNS entries. Clients will continue to use a given hosted service until its local DNS cache expires. When you test the policy from a single client, you need to ensure that the DNS name is resolved for each request to observe the round robin behavior.

  • Traffic Manager monitors the state of hosted services and will not send traffic to hosted services it thinks are unavailable. If all hosted services in the policy are unavailable, Traffic Manager acts as if all hosted services in the policy are actually available so it will at least resolve the DNS name to an IP address, even though the client machine will call a hosted service that may not respond.

  • For information about the Service Management API for Traffic Manager, see Operations for Traffic Manager.

Windows Azure Traffic Manager can monitor your hosted services to ensure they are available. In order for monitoring to work correctly, you must set it up the same way for every hosted service in the policy.
You may choose to monitor “/”, in which case Traffic Manager will try to access the default directory of the services in the policy to ensure that they are available.

If you want to monitor a specific path and filename, then you must do the following:

  1. Create a file with the same name on each hosted service you plan to include in your policy.

  2. Allow the traffic manager to perform an http(s) GET on the file.

  3. Specify the monitoring endpoint for the files that you want to monitor in the Specify a monitoring endpoint section of the Create Traffic Manager policy dialog box as shown in the following graphic (Figure 7):


    Specify Monitoring EndpointFigure 7 – Specify a monitoring endpoint

The fields for monitoring are available in the Create Policy and Edit Policy dialog boxes. They are as follows:

  • Protocol –Choose HTTP or HTTPS. NOTE: HTTPS monitoring does not verify that your SSL certificate is valid, it only checks that certificate is present.

  • Port – Choose the port used for the request. Standard http and https ports are among the choices.

  • Relative path – Give the path and the name of the file that the monitoring system will attempt to access. Leaving just a forward slash implies that the file is in the root directory (default).

Example

  • Protocol: HTTP

  • Port: 80

  • Relative path: /WATMmonitorfile.htm

    TipTip
    / is a valid entry for the relative path.

A timeline illustrating the monitoring process with a single hosted service is displayed is below. This scenario shows the following: In this scenario, a Windows Azure hosted service:

  1. The Windows Azure hosted service is available and receiving traffic via this Traffic Manager policy ONLY.

  2. The Windows Azure hosted service becomes unavailable.

  3. The Windows Azure hosted service remains unavailable for a time much longer than the DNS TTL

  4. The Windows Azure hosted service becomes available again.

  5. The Windows Azure hosted service resumes receiving traffic via this Traffic Manager policy ONLY.

WATM Monitoring Sequence

Figure 8 - Monitoring sequence example

The numbers in the diagram correspond to the numbered explanation below.

  1. GET - The Traffic Manager monitoring system performs a GET on the file you specify in the relative path area.

  2. 200 OK - The monitoring system expects a 200 OK message back within 10 seconds. When it receives this response, it assumes that the hosted service is available.

  3. 30 seconds between checks – This check will be performed every 30 seconds.

  4. Hosted service offline - The hosted service becomes unavailable. Traffic Manager will not know until the next monitor check.

  5. Attempts to access monitoring file (4 tries) - The monitoring system performs a GET, but does not receive a response in 10 seconds or less. It then performs three more tries at 30 second intervals. This means that at most, it takes approximately 1.5 minutes for the monitoring system to detect when a service becomes unavailable. If one of the tries is successful, then the number of tries is reset. Although not shown in the diagram, if the 200 OK response(s) come back greater than 10 seconds after the GET, the monitoring system will still count this as a failed check.

  6. Marked offline - After the fourth failure in a row, the monitoring system will mark the hosted service as offline.

  7. Traffic to hosted service decreases – Traffic may continue to flow to the offline hosted service. Clients will experience failures because the hosted service is unavailable. Clients and secondary DNS servers have cached the DNS address of the offline hosted service. They continue to resolve the DNS name of the company domain to the IP address of the hosted service. In addition, secondary DNS servers may still hand out the DNS information of the offline service. As clients and secondary DNS servers are updated, traffic to the offline hosted service IP address will slowly stop. The monitoring system continues to perform checks at 30 second intervals. In this example, the hosted service does not respond and remains offline.

  8. Traffic to hosted service stops – By this time, most DNS servers and clients should be updated and traffic to the offline hosted service will stop. The maximum amount time before traffic completely stops is dependent on the TTL time. The default DNS TTL is 300 seconds (5 minutes). Using this value, clients stop using the service after 5 minutes. The monitoring system continues to perform checks at 30 second intervals and the hosted service does not respond.

  9. Hosted service comes back online and receives traffic – The hosted service becomes available, but Traffic Manager does not know until the monitoring system performs a check. Traffic Manager sends a GET and receives a 200 OK in under 10 seconds. It then begins to hand out the hosted services DNS name DNS servers as they request updates. As a result, traffic starts to flow to the hosted service once again.

For more specific information about the DNS address handout process, see Load balancing methods in Windows Azure Traffic Manager.

Windows Azure Traffic Manager displays policy and hosted service health in the Windows Azure Management Portal. Keep in mind that although there is no charge for Traffic Manager, its monitoring transactions are included in monthly data transfer costs.

The poll state column displays the most recent monitor status of your Traffic Manager policies. Use this status to understand the health of your domains according to your Traffic Manager monitoring settings. When a policy is healthy, DNS queries will be distributed to your hosted services based on the policy’s configuration, e.g., Round Robin, Performance or Failover. Once the Traffic Manager monitoring system detects a change in monitor status, it updates the poll state entry in the Management Portal. It can take up to five minutes for the state change to refresh.

The following table shows the combined status of each scenario:

 

If the hosted service is: The hosted service poll state will display: The policy poll state will display:

Available

Online

Online

Unavailable

Offline

Degraded

Awaiting Response and at least one is available

Unknown and Online

Warning

Awaiting Response and at least one is unavailable

Unknown and Offline

Degraded

All Awaiting Response

Unknown

Unknown

Enabled

Inherits the hosted service poll state

Inherits the policy poll state

Disabled

Unknown

Policy poll status inherits status of enabled hosted services



For example, if the hosted services for a given policy are all responding to Traffic Manager with ‘200 OK’, then each hosted service will display an Online poll state. The overall policy poll state would also be Online. If monitoring fails to receive ‘200 OK’ from a hosted service then that line item will show Offline and the overall policy health will be Degraded. In this case DNS queries continue for the domain but no traffic will be sent to the offline hosted service.

When a hosted service is disabled in Traffic Manager its poll state changes to Unknown and the disabled service is removed from DNS. All queries will route to the remaining online hosted services in policy. You can disable all but one hosted service per policy. If the last enabled hosted service goes offline then the overall policy will be marked Degraded which means DNS queries will continue but end users may be directed to an offline service. If on the other hand the remaining single online hosted service goes to an Awaiting Response state then the overall policy state will be marked Unknown.

When all hosted services are offline or have an unknown status, Traffic Manager continues to answer DNS queries based on the load balancing method applied to the policy. For more information about how traffic is distributed, see the following: Direct Incoming Traffic to Hosted Services Based on Network Performance by Using Windows Azure Traffic Manager, Create a Failover Policy by Using Windows Azure Traffic Manager, and Load Balance Traffic Equally Across a Set of Hosted Services by Using Windows Azure Traffic Manager.

When you create a Traffic Manager policy, the Create Traffic Manager Policy dialog will list the Windows Azure hosted services available for to be included in the policy. Before you create a policy, consider the following guidance for creating and naming hosted services.

  • Services should be in a single subscription – All hosted services must be in the same subscription where you are creating the policy. You cannot add hosted services in different subscriptions to a single policy.

  • Production services only – Only hosted services in a production environment are available for routing. You cannot route to services running in a staging environment. Note that if you perform a VIP swap while a policy is routing traffic, the traffic will now use the hosted service just swapped into the production environment.

  • Name your hosted services so they can be easily found and clustered into policies –The Traffic Manager Create Policy and Edit Policy dialog boxes allow you to perform a "contains" search on the DNS name of the hosted service. Note the DNS name is different than the hosted service name. The DNS name is used because it is guaranteed to be unique in a subscription, while the name of the hosted service may not be. In order to avoid confusion, give a hosted service a name and DNS prefix that are the same or similar. If you have over 20 hosted services, bad naming can make finding the correct hosted service difficult. Badly named hosted services will make policies difficult to maintain and prone to errors.

  • All services in a policy should service the same operations and ports – If you mix your hosted services, it becomes more likely that a client will call a hosted service that cannot service its request.

  • All services in a policy must use the same monitoring method – You can only chose a single path and file to monitor all hosted services in a given policy. If you move services from one policy to another and you have different methods of monitoring those services, some hosted services in the policy will appear to be consistently offline. As a result, these "offline" services will never have their IP addresses handed out to clients and thus never receive traffic. You can enter "/" in the Relative path and filename text box so that monitoring will try to access the default path and filename.

  • Keep monitor charges low – Keep in mind that although there is no charge for Traffic Manager, its monitoring transactions are included in monthly data transfer costs.

  • Storage – How you design the location and distribution of your storage is an important consideration when using Traffic Manager. Think of the end-to-end transaction and how your data will flow when you design and deploy your applications for Traffic Manager.

  • SQL Azure – Similar to storage design and analyze your application state and data requirements when you extend your hosted service to multiple geographic regions.

  • Make your prefixes unique and easy to understand – Traffic Manager domains must be unique. You can control the first part of the domain only. The Traffic Manager domain is used for identification and routing purposes only. Client machines will never use the domain name so it does not necessarily need to be easily readable. However, policies are identified by this domain so it is important that you be able to quickly identify it from other domains listed in the Management Portal interface.

  • Use dots to add uniqueness or make domains readable – You can use periods to separate parts of your domain name prefix as well.  If you are planning to create multiple policies in Traffic Manager, use a consistent hierarchy to differentiate one service from the other. For example, Contoso has global services for web, billing, and utility management. The three policies would be web.contoso.trafficmanager.net, bill.contoso.trafficmanager.net, and util.contoso.trafficmanager.net. When setting up hosted services use names that include location, for example, web-us-contoso.cloudapp.net and web-asia-contoso.cloudapp.net.  Your limitations are those imposed by DNS and the size of the edit box in the Create Traffic Manager policy dialog box. Assume a domain name is a sequence of labels separated by dots ( label.label.label.label.etc.). At the time of this documentation, the limits for domain names in Traffic Manager are as follows:

    • Each label can be a maximum of 63 characters.

    • You cannot have more than 40 labels total. Since two labels are taken up by trafficmanager.net, that leaves 38 for your prefix.

    • The entire domain can be a maximum of 253 characters. Keep in mind that trafficmanager.net takes up 19 of those characters.

  • DNS TTL (time-to-live) - The DNS time-to-live (TTL) value controls how often the client’s local caching name server will query the Windows Azure Traffic Manager DNS system for updated DNS entries. Any change that occurs within Traffic Manager, such as policy changes or changes in hosted service availability, will take this period of time to be refreshed throughout the global system of DNS servers. We recommend you leave the setting at the default value of 300 seconds (5 minutes).

See Also

Did you find this helpful?
(1500 characters remaining)

Community Additions

ADD
© 2013 Microsoft. All rights reserved.
facebook page visit twitter rss feed newsletter