Investigating Performance Bottlenecks
This topic describes a recommended process for how to investigate bottlenecks.
The source of the problem could be hardware or software related. When resources are underused, it is usually an indication of a bottleneck. Bottlenecks can be caused by hardware limitations, by inefficient software configurations, or by both.
Identifying bottlenecks is an incremental process whereby alleviating one bottleneck can lead to the discovery of the next one. The science of identifying and alleviating these bottlenecks is the objective of this topic. It is possible for a system to perform at peaks for short periods of time. However, for sustainable throughput a system can only process as fast as its slowest performing component.
Bottlenecks can occur at the endpoints (entry/exit) of the system or in the middle (orchestration/database). After the bottleneck has been isolated, use a structured approach to identify the source. After the bottlenecks is eased, it is important to measure performance again to ensure that a new bottleneck has not been introduced elsewhere in the system.
The process of identifying and fixing bottlenecks should be done in a serial manner. Vary only one parameter at a time and then measure performance to verify the impact of the single change. Varying more than one parameter at a time could conceal the effect of the change.
For example, changing parameter 1 could improve performance. However, changing parameter 2 in conjunction with changing parameter 1 could have a detrimental effect and negate the benefits of changing parameter 1. This leads to a net zero effect and results in a false negative on the effect of varying parameter 1 and a false positive on the effect of varying parameter 2.
Measuring performance characteristics after changing settings is important to validate the effect of the change.
Hardware: It is important to use consistent hardware as varying the hardware can display inconsistent behavior producing misleading results. For example, do not use a laptop.
Test Run Duration: It is also important to measure performance for a fixed minimum period to ensure that the results are sustainable. Another reason to run tests for longer periods is to ensure that the system has gone through the initial warm/ramp up period where all caches are populated, database tables have reached expected counts, and throttling is given sufficient time to regulate throughput once predefined thresholds are hit. This approach will help discover optimal sustainable throughput.
Test Parameters: It is also important to not vary test parameters from test run to test run. For example, varying map complexity and/or document sizes can produce different throughput and latency results.
Clean State: Once a test is complete it is important to cleanup all state before running the next test. For example, historical data can buildup in the database impacting runtime throughput. Recycling the service instances helps to release cached resources like memory, database connections, and threads.
It is reasonable to expect a certain amount of throughput and/or latency from the deployed system. Attempting to have high throughput and low latency places opposing demands on the system. However, it is realistic to expect optimal throughput with reasonable latency. As throughput improves increased stress (such as, higher CPU consumption, higher disk-I/O contention, memory pressure, and greater lock contention) is placed on the system. This can have a negative impact on latency. To discover optimal capacity of a system it is important to identify and lessen any and all bottlenecks.
Bottlenecks can be caused by completed instances residing in the database. When this occurs, performance can degrade. Giving the system sufficient time to drain can help fix the problem. However, discovering the cause of the backlog buildup and helping fix the issue is important.
To discover the cause of the backlog, analyze historical data, and monitor Performance Monitor counters (to discover usage patterns, and diagnose the source of the backlog). This is a common situation where large volumes of data are processed in a batched manner on a nightly basis. Discovering the capacity of the system and its ability to recover from a backlog can be useful. It helps you to estimate hardware requirements for handling overdrive scenarios and the amount of buffer room to accommodate within a system to handle unforeseen spikes in throughput.
Tuning the system for optimal sustainable throughput requires an in depth understanding of the deployed application, the strengths and weaknesses of the system, and usage patterns of the specific scenario. The only way to discover bottlenecks and predict optimal sustainable throughput with certainty is through thorough testing on a topology that closely matches what will be used in production.
Other topics in this section guide you through the process of defining that topology, and provide guidance on how to lessen bottlenecks to help you avoid bottlenecks.
Bottlenecks can occur at various stages of the deployed topology. Some bottlenecks can be addressed by upgrading hardware. Hardware can be upgraded by scaling-up (more CPU’s, memory or cache) or by scaling-out (additional servers). The decision to scale up or out depends on the type of bottleneck and the application being configured. The following guidance is about how to change hardware deployment topologies based on bottlenecks encountered. An application needs to be built from the ground up to be capable of taking advantage of scaling up or out. For example:
Scaling up a server with additional CPU’s and/or memory may not help lessen the problem if the application is serialized and dependent on a single thread of execution.
Scaling out a system with additional servers may not help if the additional servers add contention on a common resource that cannot be scaled. However, scaling-out provides additional benefits. Deploying two dual-processor servers instead of one quad-processor server helps provide a redundant server that serves the dual purpose of scaling to handle additional throughput and provides a highly available topology.