Windows Azure SQL Database and SQL Server -- Performance and Scalability Compared and Contrasted
Updated: January 10, 2014
Authors: Conor Cunningham, Tobias Ternström, Silvano Coriani, Ewan Fairweather
Contributing Authors: Ralph Squillace, Karthika Raman
While SQL Server and Windows Azure SQL Databases (SQL Database for short, formerly SQL Azure) have large and important similarities, they are not identical, and while the differences are relatively small, they affect the way that applications perform on SQL Database compared to SQL Server. As a result, the application architecture and performance evaluation techniques for each platform also differ.
This document explains these performance differences and their causes and includes real-world customer wisdom from experience troubleshooting performance on production customer SQL Databases. This document also examines common SQL Server performance evaluation techniques that do not work on SQL Database. For a broader discussion on the design of high-performance and large scale Windows Azure applications, see Best Practices for the Design of Large-Scale Services on Windows Azure Cloud Services.
It is important to note here that SQL Server in addition to running on-premises, can aslo be run in a Windows Azure virtual machine. The comparisons and approach explained in this document apply to SQL Server regardless of whether it is running in an on-premises environment or in a Windows Azure Virtual machine. However, when SQL Server is running in a Windows Azure virtual machine, there are some additional set of performance considerations. This is explained in detail in the Performance Guidance for SQL Server in a Windows Azure Virtual Machine article.
Windows Azure SQL Database offers the ability to reserve resources for your database through the Premium Edition. For more information, see Premium Edition for Windows Azure SQL Database Guidance.
The structure of this article is as follows.
First, several standard application scenarios and the application patterns and approaches they typically take are described.
Second, several common expectations that customers have for SQL Database are described and the ways in which those expectations can lead less successful approaches are discussed.
Third, the topic focuses first on classic patterns for good performance in shared environments, then diving a little deeper into some performance patterns for SQL Database in particular.
Fourth, the topic outlines several general performance evaluation techniques for relational databases.
Fifth, the topic discusses several techniques to properly evaluate the performance of your SQL Database instances, given that the features available to you are different than those for SQL Server.
Sixth, the appendix provides several SQL scripts that can help you evaluate and troubleshoot your SQL Database instances.
Common Application Scenarios and Patterns
When dealing with performance analysis and optimization, different application patterns require varying approaches due to different architectures and technical requirements. These differences can cause difficulties in defining a common set of best practices and techniques to measure and understand performance characteristics. Nevertheless, there are some common patterns that can be used to categorize applications and generalize guidance. This section describes some common patterns seen on Windows Azure SQL Database in applications to date.
Business Applications Going to a SaaS delivery model (CSV/ISV)
This group contains traditional business applications (Accounting, CRM, ERP, and so on) embrace the Software-as-a-Service (or SaaS) delivery model, and which are often related to the implementation of a new business model or targeting new customer segments. The biggest challenges that a traditional software development company encounters moving to a new software delivery model are issues related to their existing codebase and to employee skills and approaches. Moving to the SaaS model inherently means adopting a multi-tenancy model as a way to host multiple customers on the same set of physical resource. This maximizes hardware investments and reduces management costs, but typically requires a redesign of the application with a special focus on the data layer level to spread data over several databases while maintaining data consistency. A common expectation is that the platform will perform “just like the on-premises equivalent”, even though the environment is hosted and hardware resources are typically shared with other applications.
Batch Processing and Mainframe Application Migration
This scenario typically involves the migration of existing batch-processing applications from an expensive environment to achieve cost savings in both licensing and maintenance. For mainframe applications, a lot of the existing business logic (commonly COBOL) will have to be retained without modification because of complexity or the difficulty of retaining the internal knowledge required to rewrite the code, especially if it was written long ago. These environments have to be carefully evaluated to understand how much tuning and optimization has to be applied to make them run well on Azure. While that can be tedious, experience teaches that often these types of applications do not have strict latency requirements and instead require just enough throughput to complete large batch jobs in the required period of time.
Custom Departmental Applications
Cloud platforms like Windows Azure are typically the ideal environment for custom application projects. These applications are usually developed on a project-by-project basis, implementing new business processes based on frameworks that are developed, refined, and reused across multiple projects. A common requirement for this kind of software development effort is the ability for the cloud platform to support hybrid and connectivity capabilities such as data sync and replication, process orchestration, integration, and so on. Time-to-market and rapid solution development and deployment are critical to these types of projects. This also requires a way to immediately discover, analyze and solve performance bottlenecks and scalability issues.
New Generation Applications
This pattern represents important new software development projects that have been created directly in a cloud environment that involve some degree of business risk. These types of applications are developed, tested, and put into production in the Cloud – and will never be anywhere else. Because of the inherent complexity in running application and services at scale, it is very important to understand the behavior and performance of the application and the cloud platform that is hosting it to understand as completely as possible how it fits with the business requirements, cost model, and performance and scale expectations. Running this kind of applications usually requires some level of tradeoff between development cost and investment risks.
Cloud-only, Massively Scalable Applications
This kind of application is typically a brand-new possibility: it will likely never have existed without the elastic and massively scalable environment that cloud platforms provide. Application providers have often previously tried to implement the application using a traditional architecture in a traditional data center, but failed due to the unpredictable workload generated by their user base and the difficulties and costs of designing a traditional infrastructure for peak load, let alone managing it once it was designed and built. The often high visibility and business-critical requirements require a completely new approach in designing and implementing a fully scalable and partitioned solution both at the application tier and at the data tier. With a higher degree of complexity, the ability to quickly analyze and troubleshoot end-to-end application performance is mandatory starting from the early stages of implementation.
Expectations for Windows Azure SQL Database
A common belief held by new SQL Database users is:
SQL Database is a low cost, elastic, highly available, and low management overhead version of SQL Server provided as a Service.
This is basically true. SQL Database is very similar to SQL Server:
It supports a large subset of the SQL Server programmability surface area (including supporting both TDS & T-SQL, having a master database, using SQL authentication, and so on).
It does re-use a version of the SQL Server engine for its core RDBMS functionality.
Customers should have minimal re-learning required to start working on the new platform.
However, production applications are not built at this level of abstraction. As a result, when designing applications it is critical to go beyond this level of comprehension and concretely understand the key differences between SQL Server and SQL Database, why these differences exist, and what to do differently in SQL Database to make sure your application runs efficiently and well.
Below are three important examples where the experience delivered in SQL Server is different from SQL Database. The following three areas are the three most important areas to master in order to design your application to run at the appropriate scale, performance, and for the best costs both now and into the future.
More details about each item are provided below:
|Concept||SQL Server||SQL Database|
Servers are co-located and exactly how they are connected can be controlled. Typical latency between an application server and the SQL Server is sub-millisecond. For most cases, an application that performs a large number of database calls to complete a single operation will perform acceptably.
In SQL Database typical communication between an Azure service and a SQL Database is in the low milliseconds, often an order of magnitude higher than on-premise.
Companies purchase enterprise-grade hardware and there are IT personnel on staff that make sure the right machine is put in place for the job. Machine failures are very rare and there is little reason to spend time building command or connection resiliency such as automatic retries for failed operations into the application. Upgrades and servicing are performed on a publicized schedule with planned downtime.
Windows Azure is built for the economics of the cloud and uses commodity hardware and similar machines across the entire service. These machines are less powerful and fault tolerant than those typically used on-premises, and servicing of a specific machine can occur at any time. While data is protected through redundancy, connection-related errors can occur.
An application’s databases are commonly hosted on their own machines without sharing hardware with other applications databases.
A single machine hosts up to hundreds of databases, and each of these could potentially be from different applications and will share machine resources. Logic in the service moves databases to different hosting machines to spread out the aggregate load of the service and to provide resource fairness.
With the introduction of Premium Database, you have now the ability to reserve a certain level of capacity for a Windows Azure SQL Database. By reserving a fixed amount of capacity for your SQL Database and its secondary replicas, the Premium Edition will deliver more predictable performance for cloud applications, relative to existing SQL Database Web and Business Editions.
Premium databases also greatly reduce the multi-tenancy issues when compared to other editions of SQL Database.
Given the above examples, what are the most common reasons for applications to be moved to, or built for SQL Database?
Elasticity. We want to get access to capacity faster and without tying up capital in hardware.
Low Management Overhead. Most manageability is built-in to SQL Database. Most importantly just executing CREATE DATABASE immediately provides a highly available database covered by 3-4 replicas on different physical machines in different racks.
Price. We want to run our service at a lower cost.
While the above information covers high level concepts rather than fundamental differences in core RDBMS functionality these concepts are core to how the underlying database engine will perform and as such it is important to take them into account when designing your application.
Understanding Issues and Maximizing Results
It is important to understand how to make the right application design choices to minimize issues and maximize benefits of using SQL Database.
While a typical SQL Server application might place everything associated with an application in one (potentially big) database, a typical Azure SQL Database application works best if the major functions of an application are split into separate databases. These separate databases leverage the resources of multiple machines instead of one, and this approach simplifies splitting single application job functions into multiple databases as they grow.
Next, the use of commodity hardware and servicing model in SQL Database implies that you should build commands to expect that some operations will fail and need to be retried. This requires thinking about how to split up operations into smaller chunks, avoiding long-lived temporary state, and making operations easily restartable. This programming model allows applications to continue working if a database server fails-over in the middle of the execution of your program.
SQL Database-based applications also use different approaches for troubleshooting and logging. Like web servers, the model to troubleshoot these applications is more suited to logging than active troubleshooting (querying DMVs directly, etc.). These log streams then become a part of the management of the application, feeding dashboards that show which parts of an application are healthy or require attention. When implemented, this model enables larger-scale applications with less labor via more automation.
Latency is another important aspect to consider in SQL Database applications. Because some parts of the application are not hosted local to your users, this often requires asynchronous execution/queuing patterns in order to make the application more responsive. In other cases, it requires applications to consider how to make more deliberate choices about how many round-trips are needed to complete an operation when compared to an on-premise SQL Server equivalent application. These differences show up as different patterns and focus areas when developing an application to shine on SQL Database.
Finally, testing the performance of a hosted application is best done on the same production hardware on which it will run, but the details of performance testing on shared, hosted hardware is different than on a dedicated set of machines. This is covered in more detail in section 5 of this document.
Total Cost of Ownership (TCO): Balancing Development Cost Against Service Cost
When designing an application, there are four basic options available for hosting the SQL part of the application:
- SQL Server on raw iron (i.e. non-virtualized)
- SQL Server in on-premise/hosted VM
- SQL Server in a Windows Azure Virtual Machine
- Windows Azure SQL Database
When moving from alternatives 1 through 4 in order, there is a clear movement towards cost saving on OPEX & COGS, but there is reduced ability to solve performance issues using scale-up techniques like specialized hardware as well as specific performance related SQL Server features. If performance problems are found in a virtualized environment, such as SQL Database, these need to be solved through changes in the application rather than by a boost of hardware. Examples of application changes are implementing some type of scale-out pattern or categorizing the application’s various tasks into buckets where some have to be given more resources and be run synchronously (a bank transaction as an example) and other tasks can be queued as well as executed using less resources (an example could be sending out account summary e-mails to the bank’s customers).
The last decade has been focused on making it easier for developers to create applications through the introduction and broad adoption of technologies such as Object Relational Mappers (Entity Framework, Hibernate, NHibernate, and so on). While technologies like these create a faster time-to-solution they also abstract the core tasks that are being performed by the database and the impact that these have on performance and scale away from the developer. In the (pre-cloud) times where typical data center utilization percentage often is in the single digits, even heavily used applications built on these technologies executing highly un-optimized database calls will survive simply due to the massive overprovisioning of hardware.
In the past decade, the actual cost of running an application has been hidden from the developer, largely; the average developer doesn’t need to care much about it. When you ask larger customers about their insights into the cost of running their applications, a pattern emerges. These customers typically know how much they spend on large systems such as their main ERP, of course, but the hundreds of other applications that they run and maintain fall into an “other apps” bucket. They count up the total cost of these applications by simply taking their total applications cost and subtracting the cost of their main applications (such as that ERP system); details about the cost of a single application rarely exists.
Because many customers don’t know the cost of a single application, awareness of this unspecified cost rarely appears on the development team maintaining the application. In this environment, developers have little incentive to reduce the application’s operational cost. Windows Azure changes this notion completely as an invoice from the cloud provider can be broken down into the cost of the individual Azure VMs, Web and Worker roles and SQL Database that run a specific application.
This isn’t just a bill; armed with this knowledge, businesses have the information that they can use to make decisions related to the right choice for fixing application performance problems. For example, a performance problem can be addressed by:
- increasing the resources available in Azure to the application’s worker roles and databases
- using the Premium Edition option to reserve resources like CPU, DISK IO, and Memory.
- re-architect the application to reduce billed resource consumption
- moving it on-premise and buying custom hardware
It is critical to move developers into the cloud mindset of the App TCO = development cost + service cost. Sometimes the right choice is to invest in application development to decrease the service cost, while other times one should increase the application resources instead. This choice may also be impacted by the skillsets of the organization and the scale of the problem they need to solve. If a strong background in application development is in place, the decision of implement app changes to reduce the resource demand of the application can be an easy one, while in other situations it may be a better choice to invest in using more resources available in the platform to achieve required performance targets.
In order to run applications efficiently, it is important to understand application’s use of any cloud service through telemetry. In Windows Azure we suggest evaluating great third party tools like New Relic or Opstera, but it is also worth considering custom solutions based on Windows Azure Diagnostics. Finally, being careful with abstracting away too much complexity, such as in the ORM case above, may lead you down a path where performance or scalability problems are too hard or costly to solve.
Next Level of Detail: Performance Factors
This section will explain common factors that impact performance of database operations. The first sub-section covers the traditional factors used in SQL Server. Note that most of these still apply to SQL Database. The second section covers reasons that are net-new in SQL Database. Finally we will walk through the surface area of troubleshooting and diagnostic tools that Azure SQL Database provides and use a practical example to show you how they can be used.
Classic Performance Factors in SQL Server
Performance analysis in SQL Server has a long history – this section should help give you a basic understanding. For more details please see:http://msdn.microsoft.com/en-us/library/dd672789(v=SQL.100).aspx
Plan Change/Plan Choice Issues
SQL Server’s Query Optimizer makes plan choices that impact system performance. For example, the optimizer may decide to perform an index seek or a table scan to satisfy a given query. The optimizer’s choices are usually right, but there are cases where that choice can be incorrect either because the input data (such as statistics) are out of date or if a query hits a model limitation in the Optimizer (such as not reasoning about data skew across joins in all cases when estimating cardinality).
The optimizer can make poor plan choices individually (impacting a single query) or in combination (where the overall performance of the system is sub-optimal due to a set of plan choices). The optimizer, other than basic intra-plan search space heuristics, does not consider cross-plan optimization.
For more information on the Optimizer, please reference the Optimizer Chapter in SQL Server 2008 Internals (by Conor Cunningham and others) or the upcoming SQL Server 2012 Internals.
Improperly Configured Software or Hardware
SQL Server runs on the user’s hardware, so it is the responsibility of the administrator to purchase and configure that hardware. If the customer does not set up and configure the hardware correctly, this can reduce overall system performance. Common examples include:
Driver choice/version for various components (such as disk controllers) being misconfigured
NUMA configuration/CPU affinity settings
Memory settings (max server memory, etc.)
This situation is usually handled by precise documentation and pre-acceptance testing for new hardware to validate all of the core capabilities of the system (using tools such as SQLIOSim http://www.microsoft.com/en-us/download/details.aspx?id=20163)
Locking and Latching
Locking is used to allow multiple users to make consistent changes to the same data in a controlled manner. While it is a necessary part of a database system, all locks are not created equal. For example, the optimizer can create plans for multiple queries that conflict, slowing overall system throughput compared to other possible plan choices. Occasionally, the optimizer can create combinations of plans that can cause deadlocks (for example, when different locks are acquired in opposite order across two or more plans). Different plan choices (or workload calling patterns) are often needed to address this kind of situation.
Latching is similar to locking in that it adjudicates multi-user access to structures within SQL Server. However, latching deals with the internal physical page structures. Latching is somewhat orthogonal to locking, but both issues can apply and the core resolution is often similar: If there is a hot latch, then the typical answer is to change the calling pattern (either by forcing different plan shapes or re-arranging the calling code) to mitigate the impact of the blocking latch on overall throughput.
Multi-User Operations and Blocking
Even when locking is not an issue with multi-user operations in a database, it is still possible for performance or throughput to be reduced because some operations are denied sufficient resources while another one has more than enough. While the first manifestation of this condition is an operation taking longer than usual, the internal cause is that one or more resources are under pressure. Common factors include the wait type SOS_SCHEDULER_YIELD (which indicates that the operation is waiting on the CPU to become free), long I/O waits (which indicate that SQL Server is generating I/O requests faster than the I/O system can process them), and RESOURCE_SEMAPHORE (memory).One common technique to deal with the blocking issue is to just buy a larger machine or faster components.
Checkpoint and System Operations
Checkpoint is the process by which the DBMS flushes dirty pages from memory to disk to minimize the time that recovery would take if the database server crashed. This process can cause a spike in I/O operations that can slow the performance of queries and impact both latency and throughput. Applications typically would need to be load tested for a period longer than the checkpoint frequency so that the impact of checkpoint could be measured and included in capacity planning and certification.
|Note that SQL Server 2012 contains a feature called “indirect checkpoint” (for more information, see http://msdn.microsoft.com/en-us/library/ms189573.aspx#IndirectChkpt) which can reduce the point-in-time I/O overhead of checkpoints by spreading the I/O load of checkpoint more evenly over a longer period of time. This can lower the I/O impact on performance and capacity troubleshooting.|
Performance Factors in Azure SQL Database
Just to restate: It should be clear that Windows Azure SQL Database is not identical to SQL Server. SQL Database is a hosted service run in a different data center. It is multitenant, meaning that multiple customers share the same physical machine. It contains automatic features such as automatic high-availability (HA) and automatic backups. It runs on commodity hardware instead of large servers. As a service, it is sold to customers as a service instead of as a license. All of these factors impact the overall business model for SQL Database as well as the way you consider performance and scaling considerations.
In addition, SQL Database does not currently deliver a number of features commonly used in Data Warehouse queries (meaning that the initial focus of SQL Database is on OLTP systems). These features include database compression, parallel queries, ColumnStore indexes, and table partitioning, along with total database size support and I/O subsystem oriented to a data warehouse workload. While it is not impossible to build data warehouse solutions using SQL Database today, please understand that these feature differences will affect comparisons with SQL Server.
This section explains how each of these important factors impacts performance troubleshooting on SQL Database and will assume that customers are focusing on OLTP solutions.
One of the largest differences from SQL Server is that SQL Database exposes a multitenant service. Each physical machine can have hundreds (or more) of user databases stored on one computer. This allows SQL Database to deliver a very low price point (starting at a few dollars per month in the U.S.A.). This difference is significant: By default, SQL Database has different users sharing the resources of a single machine in a cluster and trying to balance overall load on the cluster by moving customers around to different machines within a cluster to try to maximize fairness and balance load.
Therefore, a customer running a performance test against their hosted database may be running against a system that contains multiple other active databases from other users. Just like normal SQL Server, having other customers run on the same hardware impacts performance test results. Therefore, care must be taken to avoid making final conclusions about SQL Database performance based on a single performance test since other users can make those conclusions invalid.
You have the ability however to reserve resources for your database and the replicas by using the Premium Edition option. With this you have a guaranteed CPU, Disk IO and memory reserved for your database. You can also use this option when you need it by upgrading to a Premium Edition and downgrading to a Business or Web Edition when the capacity is no longer required.
Using the Premium database removes cases where performance variance can cuase small queries to take longer than expected in latency-sensitive operations. It can also make it possible to migrate some on-premises applications without significant changes since it is an experience that is closer to the traditional, isolated experience assumed in those applications. For more information, see Premium Database for Windows Azure SQL Database Guidance.
SQL Database is built using large clusters of machines. Individually, these machines are not massive; they are standard, rack-based server machines that maximize a balance of price and performance instead of overall performance alone. This difference is important, as many SQL Server applications are built on the assumption they run on a machine that is “fast enough or “big enough” (meaning customers would buy a larger machine if they ever ran out of resources when running their application). SQL Database does not deliver extremely high-end hardware in its hosted offering. Instead, it is a “scale-out” model instead of a “scale-up” model. When a customer application exceeds the limits of a single machine, the expectation is that the customer will rewrite their database to use multiple databases (spread over multiple machines) instead of a single machine.
Commodity hardware also fails at a higher rate than traditional on-premise solutions. The hardware does not have redundant power supplies, ECC memory, or other features to maximize their uptime in all cases. Additionally, there are multiple machines that make up the cluster that need to work together to deliver a complete solution, both at the data tier and in the application tier. The current SLA for SQL Database is that a database will be available 99.9% of the time. In other words, the average SQL Database instance can be unavailable for about 42 minutes each month and still meet that goal. While not every single database actually meets that goal, the overall availability numbers for most databases do. This is still often lower than a traditional SQL Server system (many SQL Server users do not formally measure their downtime, but often downtime events are planned and notifications are sent -- SQL Database failures are more random since they are generally unplanned from the perspective of the application). This downtime implies that the design of the application should account for downtime of each database in a solution by using techniques that avoid single-points of failure (focusing on removing critical paths that always rely on a single database being “up”), including caching in other tiers when appropriate, and using retry logic on connection and commands to be resilient to failure in the application and service.
Larger services or solutions should formally separate each function into one or more separate databases. For example, if a service has a feature that sends mail to customers to perform registration for the service, it may be necessary to isolate the data for this capability from other data in the service and to even spread the mail data across multiple databases in order to handle high load (if it exceeds the capability of a single machine). It is not implausible to have multiple different sets of databases, each serving a different core function, to manage the capability limits of machines both in size and availability.
Since there are fundamental differences in how large solutions are architected on Windows Azure, the definition of “performance” used to evaluate SQL Server solutions likely requires adjustment. If a SQL Server solution performed better previously after purchasing a SAN, the performance numbers for I/O latency may not be as good when you do not have one in SQL Database. While often most of the same core business requirements can be met without issue, you should not evaluate the raw performance of the hardware as a method of determining whether SQL Database can solve a particular business problem. The hardware is slower, but you can often use multiple machines to get the same business result with the desired performance targets.
SQL Database does not contain the exact same feature set as SQL Server. SQL Database instances are programmatic concepts – they represent a logical database, rather than a specific instance on a server, for example. Additionally, some features in SQL Server have been blocked from SQL Database’s programming surface. SQL Database does not expose file groups or intra-database partitioning today, for example, as these are not needed for the standard SQL Database programming paradigms. Additionally, some use cases related to high-end Data Warehousing (ColumnStore indexes) are also not exposed at this time. These differences enable SQL Database innovations related to the hosting model, and most of these same customer benefits will be handled through other approaches over time in SQL Database.
SQL Database also aims to reduce labor costs associated with running, managing, and tuning databases. As part of this goal, it includes automatic mechanisms that handle some of the more regular-but-complicated-to-do-perfectly tasks such as:
Database Backups (taken every few minutes for each active database)
High-Availability (SQL Database delivers this with at least three replicas for each database without any user action)
Database consistency checks
Automatic Upgrades – SQL Database will automatically roll out new versions of the service in a mostly transparent fashion.
These differences mean that there is some fraction of each machine’s capacity that is reserved to perform these operations. This impacts the methods that can used to determine peak performance limits for a system, which is discussed more below. Formally defining performance goals for each business operation helps isolate performance evaluations from the impact of these automatic mechanisms.
No Downtime (Service Mantra)
Windows Azure makes it possible to build very large, Internet-facing services. While this is not completely unique to Windows Azure, the expertise required to build such a solution with SQL Server often restricted this kind of application to those with deep pockets and strong desire. Those high-end solutions often would also take steps to reduce or eliminate downtime their solution or service by designing their code to avoid taking down databases or other components. For example, a no-downtime solution would involve some High-Availability (HA) solution such as Database Mirroring or always-on to avoid hardware failure and to orchestrate upgrades such that servers were down only minimal amounts of time. This approach is critical for SQL Database-based solutions since a larger fraction of the applications are services and a new set of customers are trying to build such fully-online solutions on this new platform.
SQL Database already provides you high-availability built in to the core solution. Downtime caused by internal upgrades to the SQL Database service is handled by the same retry logic used to handle failures related to commodity hardware. Customer services, however, must also consider how to orchestrate the upgrade process across multiple tiers of an application. For example, if a service includes a web or application tier above a data tier, one common technique used in upgrades is to:
Deploy new stored procedures side-by-side with existing stored procedures (sp_createaccount_v1, sp_createaccount_v2)
Upgrade the application tier machines (using the “VIP Swap” technique in Windows Azure) to switch to the new version of the application. Upgrade in place is another option in case VIP swap would have an impact on correlated services (for example, cache).
Drain the existing work on the “v1” version of the stored procedures
When all prior work is completed, roll out another change that removes the “v1” versions of the stored procedures
Effectively, application upgrades can be made online by decomposing an application upgrade into small steps that can be performed without downtime. If some steps require downtime, then every effort should be taken to minimize that time by scheduling these for a small number of internal customers at a time (during their off hours) or by copying the user’s data to a new database in the background such that they never see the overall service as “down”.
This mental shift requires some changes in engineering practices, but the decomposition also allows you to consider solutions like taking down only part of a service for upgrades instead of the whole thing. For example, you might be able to disable logic that shows customers their prior month bills while keeping the main service up and running for the duration of an upgrade operation. The decomposition required to handle commodity hardware also provides the opportunity to redefine and minimize the customer-visible impact of downtime more precisely.
Customers need to determine if performance goals must be met during upgrade cycles. If so, then the service/application should be provisioned to handle any extra load or disruption caused by the service’s own deployment.
Cross-Tier Latency Differences
Many customers deploy SQL Server and all other components of a solution on a local subnet, meaning that the network latency between tiers is often negligible. When you move a solution to hosted infrastructure, there may be added latency between the customer and some part of your solution. Additionally, the various tiers in a Windows Azure are not going to be on the same network subnet, so there will be some small latency difference between any two tiers. SQL Server solutions that are very “chatty” (they do a lot of small queries or updates) may find that they run more slowly on SQL Database due to these physical network differences. For those familiar with early client-server computing, the same solutions apply here: Think carefully about round-trips between tiers in a solution to handle any visible latency differences.
Here’s two practical methods to measure the impact of this latency:
Implement a simple Web or Worker Role application that continuously executes a simple “SELECT 1” query against a Windows Azure SQL Database instance, and traces begin and end response times.
Remote desktop into that Web/Worker Role instance and execute the Resource Monitor applet. In the Network tab you’ll find a “TCP Connection” grid with a “Latency (ms)” column that gives you an idea of the latency in your connections against SQL Database (you can find your Windows Azure SQL Database virtual server IP Address by execute ping or tracert against its fully qualified name, and look at the connections to Port 1433).
For a complete discussion on best practices and optimizations related to latency, please refer to the following Windows Azure Guidance article: Performance Considerations with Windows Azure SQL Database.
SQL Database Limitations
Finally, there are sometimes issues within the SQL Database service itself that can impact how you do performance analysis. One such issue is that SQL Database does not expose all the features that exist on SQL Server. Some features missing in Windows Azure SQL Database – for example, SQL Profiler -- are commonly used to perform performance troubleshooting for SQL Server on premises. This feature gap is a result of the change in programming surface area to expose databases instead of servers as the primary concept in the SQL Database service. In most cases, the SQL team plans to deliver the same or equivalent functionality in stages to address such limitations.
Another issue that can exist is that SQL Database itself can have bugs. While customers in SQL Server have their own copy of the software, it is shared in SQL Database. Occasionally a bug will exist in our software and the SQL team may need to mitigate an issue or block a feature until that problem is resolved. The SQL team plans features to help make this process more visible and manageable for customers on our service.Due to possible differences between behavior on different SQL Database clusters in operation at any point in time, performance evaluation is best considered in a fashion that can be run at different times and in different data centers to help isolate customers from any temporary limitations in the service. These adjustments will remove most such factors from impacting the conclusions drawn from any single test on the service.
Recommended Performance Evaluation Techniques
This section explains some recommended techniques to validate and optimize a solution for SQL Database and discusses some specific techniques to avoid.
Define Requirements for Each Operation
Given the platform differences that exist, it does not make sense to perform “apples-to-apples” comparisons with SQL Database -- the hardware is different and the platform has differences that suggest a different approach. When migrating existing solutions, customers are making the most progress on the platform using techniques that determine what performance is needed for core operations and then determining how to adjust code to meet those requirements.
Consider the following example:
A customer has an existing on-premise ASP.NET application that makes 10 calls to a local SQL Server database and renders a page in 2 seconds. This web application is being considered for porting to Windows Azure running a web or app tier worker role and using SQL Database as the database. Initially, the customer decides to just move the application over and get it to compile. After spending time working through various issues to get it to deploy on Windows Azure, they notice that the web page now takes 4 seconds to render (2 seconds slower than before). The customer concludes that the cloud must be “slow”.
There are several likely factors which can explain this latency difference (all specific examples taken from the points made earlier in this topic):
There may be latency between the caller and the web server (the caller is far away from the data center and it takes time for the request to travel over the internet)
There may be some additional small latency added for each SQL call from the ASP.NET application because the two tiers are not on the same sub-net
The computers being used locally may be different from those used by SQL Database
The query plans may be different
Again, the most important point is that you should not try to compare SQL Server and SQL Database performance for any existing solution – there are many reasons why it can be different. Remain focused directly on the important question: What level of performance is required to solve the business problem? In this case, if the requirement is that the web page renders for the customer in two seconds, then there are a number of different ways that one might be able to achieve this on the Windows Azure platform (though the exact solution may differ between an on-premise SQL Server solution, a Windows Azure Virtual Machines (IaaS) solution, and SQL Database). Possible changes to consider for this example to improve performance towards the two second goal:
Determine the latency between the client and the web server (build a test web page and measure the response time for a web page that does not need to contact a database). If this is a significant factor, then consider finding a closer data center to reduce the latency and/or consider using a Content Distribution Network (CDN) to reduce the customer-perceived latency by caching content nearer each user.
Consider whether it is possible to combine round-trips from the ASP.NET application to reduce any physical network overhead associated between tiers. This can improve latency somewhat as well.
Examine the time to run each query, and search for slow query plans that can be improved, locking that can be eliminated, or database resources that can be increased. If necessary, define performance goals for each operation and repeat the technique on each sub-problem.
Determine how much you can stress test your database before getting throttled (look at the new sys.event_log and sys.database_connection_stats DMVs in the master database) to understand if you need to consider:
Reducing the number of application level connections
Scaling out the data model adding more databases to get the level of throughput you need to achieve
- Reducing the number of application level connections
The key takeaway is to start with the business problem and define goals instead of trying to determine if SQL Database has the same performance as SQL Server; it will almost certainly be different.
Create and Use Minimal, Repeatable Examples for Key Issues
When you find a performance issue, it is often useful to try to isolate that issue away from all other sources of resource use and variance. Continuing with the preceding example, let’s assume that one specific query ended up accounting for 3 seconds of the total time. Pull that query out and run it directly against the database (perhaps from inside of a stored procedure invoked from SQL Server Management Studio) to isolate the query from the original solution. Perhaps the web application is somehow misconfigured or perhaps the query has a poor plan. Running isolated tests removes potential causes and helps quickly isolate key problems to be addressed. Sometimes, it may even make sense to take that poorly performing operation and pull it apart further (for example: removing key parts of a query text) to find the point where that performance problem stops occurring and the operation “becomes fast” again. This technique is often used by the SQL product team when helping customers to isolate issues. Adding SET STATISTICS IO ON and SET STATISTICS TIME ON settings at the connection level can help in providing additional details on execution performance.
Hosted infrastructure can be more difficult to troubleshoot. It is not possible to directly look at the performance counters or other data that is available in on-premise deployments. Being able to make a repeatable test with logging (perhaps stored in a different database so as to not impact the results) is quite useful because that test can be deployed in different data centers. While the hardware used in each data center today is largely similar, there are some differences and these can show up on specific tests, especially for a customer scenario with tight latency requirements on a specific operation. Once you determine that a given operation is critical, you should validate that isolated example in every data center where the application is planned for deployment. Not all customers have a formal performance test suite, and this step can also help on-premise validations as well.
Capture Test Results over Multiple Runs
In isolated environments, it is not always necessary to consider the performance of an operation over many runs. Shared environments can have greater variance due to background system operations as well as the impact from other tenants in the system. Therefore, it is useful to take the repeatable test and run it multiple times. Collecting the results from many runs will often show some performance variance across calls and you can reason about these by graphing the percentile distribution for a given repeated operation.
Consider the following example:
A customer query has a desired latency requirement for an operation of 20ms and, running the query 1000 times, the customer notices that the average is 26.95ms. By looking at the percentile distribution of the latency over all calls, the data looks like this:
(The x-axis is the percentile for a given test of running the same query N times. The y-axis is the latency in milliseconds.)
Interestingly, the data distribution is not uniform -- there is a large spike at the top end of the data set. It should be clear that looking only at the average can be deceiving. Many latency response curves are exponential, and you can describe these by picking a few key points along the curve and describing the latency at key percentiles:
For this example, the main problem appears at the tail of the data distribution. There is an occasional data point that is far, far worse than the others. Being able to capture and describe the problem in this way will help explain that the problem is not with the normal case -- it is a problem with a few important but rare situations.
This given example can happen in shared environments and it often relates to a dropped TCP packet on one of the routers along the path used in the solution. If the packet is dropped, TCP will eventually retry after about one second. In this case, the likely cause of this issue is network congestion. If this problem occurs with significantly higher frequency, it could imply that there is a problem with the network hardware (meaning you should contact Microsoft Customer Support) or perhaps there is some other issue impacting the results.
You should use data logging and then statistics that characterize the distribution to isolate problems and then decide whether they are critical to the business. This area of analysis is a real problem (though the example above was created to illustrate the issue). When there have been issues like this with real customers, some customers end up just ignoring the anomalous situations (meaning that it is not important to their business and they change their requirements to focus on the 50th percentile time instead of the average), while in other cases (when working on local installations) they have replaced hardware to fix real problems found in network routers using this technique after the issue was isolated.
Troubleshooting and Performance Analysis in SQL Database
In the same way that the architecture and development approach has to adjust to this cloud environment, the operational mindset also has to change. Implementing an effective system in Azure SQL Database requires operational processes that are adapted for the cloud environment. The traditional approach and techniques used to gain insight into resource usage and efficiency and to detect errors needs to be modified and re-evaluated in order to ensure they are effective at making operational decisions in a multitenanted environment.
This section covers through practical examples of techniques which can be used on a Windows Azure SQL database to gain actionable insight. It uses practical examples with which to troubleshoot, isolate, and resolve real customer issues and outlines what tools are available and how to utilize them.
Initial Troubleshooting and Analysis in SQL Database
SQL Database exposes a number of Dynamic Management Views (DMVs) which can be used to understand resource consumption and identify failure conditions.
The current recommended approach is:
Identify Resource Consumers: Use database-level DMVs to identify top resource consumers, and monitor query execution and efficiency (for example, using sys.dm_exec_query_stats and sys.dm_db_missing_index_details)
Use a Delta Approach: Regularly take snapshots of current requests executing on SQL Database to check for locking, blocking, latching and other contention issues (for example, using sys.dm_exec_requests)
Monitor for Errors: Monitor the master database DMV’s to look at connection related issues such as throttling (for example, using sys.event_log and sys.database_connection_stats)
Monitor Database Resource Consumption: Monitor the sys.resource_stats DMV to look at the database resource consumption over time.
Some of the DMV’s maintain cumulative information from the last time that the primary Windows Azure SQL Database was moved (an example is sys.dm_exec_query_stats). Others contain a snapshot from a point in time (sys.dm_exec_requests). The performance of a query is affected by resource availability, concurrency on the primary and secondary nodes on which your SQL Database instance resides at that point in time, as well as application issues such as locking or blocking. In the on-premise environment, sys.dm_os_wait_stats is typically used in conjunction with sys.dm_exec_query_stats to understand query performance and resource constraints, which can be inferred from the system-wait information. The sys.dm_db_wait_stats provides an aggregated view of all the waits encountered by threads that executed during the operations. For example, lock waits indicate data contention by queries and page IO latch waits indicate slow IO response times.
Another technique is also possible:
Take an initial snapshot of query stats at the beginning of your monitoring period. Record this in a temporary table or a physical table.
Take regular snapshots of exec requests during your monitoring period.
Take a second snapshot of query stats.
Compare the delta between the snapshots and use the exec requests information to understand if locking, blocking, or resource waits impacted the query performance.
Understanding Resource Consumption in SQL Database
One customer had an issue in which the customer was experiencing slow user requests from a web application which called a Windows Azure SQL Database.
The script below identifies the top batches executing on your SQL Database instance. This example displays the orders by the total CPU time in order to identify the most CPU-intensive query. The same query pattern, ordering on different attributes, can find other large consumers of each kind of resource (logical I/Os, elapsed time, and so on). In many transactional systems, the top few results are consistent (for example, top aggregate CPU consumers also consume the most I/O resources).
/* Top batches by total CPU time No execution plan **************************************/ SELECT TOP(25) SUBSTRING (st.text,1, 512), SUM(qs.total_worker_time) AS total_cpu_time, SUM(qs.total_elapsed_time) AS total_elapsed_time, CAST((CAST(SUM(qs.total_worker_time) AS decimal) / cast(SUM(qs.total_elapsed_time) AS decimal) * 100) AS int) AS cpu_vs_elapsed_percentage, SUM(qs.execution_count) AS total_execution_count, SUM(qs.total_worker_time)/SUM(qs.execution_count) AS average_cpu_time, SUM(qs.total_elapsed_time)/SUM(qs.execution_count) AS average_elapsed_time, SUM(qs.total_logical_reads) AStotal_logical_reads, SUM(qs.total_logical_reads)/SUM(qs.execution_count) AS average_logical_reads, SUM(qs.total_logical_writes) AS total_logical_writes, COUNT(*) AS number_of_statements, qs.plan_handle, db_name(st.dbid) AS [database name], st.objectid FROM sys.dm_exec_query_stats AS qs CROSS APPLY sys.dm_exec_sql_text(sql_handle) AS st GROUP BY st.text, qs.plan_handle, db_name(st.dbid), st.objectid ORDER BY SUM(qs.total_worker_time) DESC
In this example, the stored procedure dbo.spGetTranslationTable is consuming more than 42 times as much CPU as the next batch.
Executing the SPROC and Determining Plan Efficiency
To determine the resource utilization and efficiency of a given query, a common technique is to execute it in isolation using SQL Server management studio (SSMS). The SET STATISTICS IO and TIME options can be used to provide useful information, and the actual execution plan can also be obtained. This is illustrated below:
SET STATISTICS IO ON; SET STATISTICS TIME ON; exe [dbo].[spGetTranslationTable] @lang='en'
In this example, the query consumed 900 logical reads of the translations table. Looking at the query plan, it is a result of a scan of the translations clustered index to return a relatively small subset of the rows. Creating a new index on the table reduced the logical I/O and resolved the issue.
sys.dm_exec_requests provides data on query performance such as locking, blocking and wait types. (For an example query, see Snapshot of sys.dm_exec_requests with WAIT TYPE and WAIT TIME in the appendix section below). Some additional new wait types have been implemented in SQL Database to represent SQL Database-specific functionality. The SE_REPL_SLOW_SECONDARY_THROTTLE and SE_REPLY_COMMIT_ACK wait types are related to the log replication mechanism that SQL Database uses to meet its availability target and provide elasticity. If these waits are present in the DMV results, it often indicates I/O pressure through the replication channel. Mitigations include:
Minimize DML operations
Manage the size of application batch insert and transaction sizes
Reduce the total number of application inserts
Manage database operations such as index maintenance carefully
|If sustained periods of SE_REPL blocking significantly impact an application, please contact Microsoft Customer Support for assistance.|
If the load on a database causes SE_REPL waits or if that database is throttled by Windows Azure, there are a number of options. For write-intensive workloads, consider sharding the data to partition data into multiple databases. This both reduces load on any one node in the system and reduces the risk that a single database machine can experience a situation where throttling or blocking can impact your whole application. For read-intensive workloads, consider using a distributed cache technology (such as Windows Azure Caching) to cache information from the database and query for it less frequently.
Screenshot showing SE_REPL_SLOW_SECONDARY_THROTTLE:
Screenshot showing SE_REPL_COMMIT_ACK:
Additional DMVs that SQL Database Provides
The surface area of Windows Azure SQL Database is being continuously extended to provide additional DMVs and other troubleshooting information. These index-related DMVs have been enabled in SQL Database:
The query shown in the Missing Index Analysis section (in the appendix, below) identifies missing index recommendations in largely the same way as you would interact with SQL Server. This information is also available in the execution plan which you can obtain from the plan cache or through SSMS. The screenshots below illustrate this.
DMVs available in the Master database at the Virtual Server Level
Each SQL Database is a member of a virtual server (that is, a logical server). This logical server has a master database, which has now been extended to provide the sys.event_log and sys.database_connection_stats DMVs. The sys.database_connection_stats view provides a rollup of how many TDS connections are successful, terminated or throttled in an aggregate five-minute window. The sys.event_log view provides more information on successful SQL Database connections as well as connection failures, deadlocks, and throttling events. These DMVs allow you to quickly troubleshoot application connectivity issues.
For more information, see:
The sys.resource_stats DMV provides a view of the resource consumption over time. This DMV can be used to investigate the resource usage and for capacity planning for your application.
Query to return all the information in sys.resource_stats DMV for the last 7 days:
SELECT * FROM sys.resource_stats WHERE database_name = 'MyTestDB' AND start_time > DATEADD(day, -7, GETDATE())
Screen shots of the results returned:
Query to return average and maximum resource usage for a database in the past 7 days:
SELECT avg(avg_cpu_cores_used) AS 'Average CPU Cores Used', max(avg_cpu_cores_used) AS 'Maximum CPU Cores Used', avg(avg_physical_read_iops + avg_physical_write_iops) AS 'Average Physical IOPS', max(avg_physical_read_iops + avg_physical_write_iops) AS 'Maximum Physical IOPS', avg(active_memory_used_kb / (1024.0 * 1024.0)) AS 'Average Memory Used in GB', max(active_memory_used_kb / (1024.0 * 1024.0)) AS 'Maximum Memory Used in GB', avg(active_session_count) AS 'Average # of Sessions', max(active_session_count) AS 'Maximum # of Sessions', avg(active_worker_count) AS 'Average # of Workers', max(active_worker_count) AS 'Maximum # of Workers' FROM sys.resource_stats WHERE database_name = 'MyTestDB' AND start_time > DATEADD(day, -7, GETDATE())
Screenshot of the results returned:
Query to determine usage of more than a single core:
SELECT (SELECT SUM(DATEDIFF(minute, start_time, end_time)) FROM sys.resource_stats WHERE database_name = 'MyTestDB' AND start_time > DATEADD(day, -7, GETDATE()) AND avg_cpu_cores_used > 1.0) * 1.0 / SUM(DATEDIFF(minute, start_time, end_time) ) AS percenage_more_than_1_core FROM sys.resource_stats WHERE database_name = 'MyTestDB' AND start_time > DATEADD(day, -7, GETDATE())
Screen shot of the results returned:
The queries covered in this section can be used to troubleshoot and find the limits of a single database.
Top Statement by Total CPU Time with Query Plan on the Statement Level
/* Top statements by total CPU time w/ query plan on the statement level note - can be expensive to run this **requires** statement_level_query_plan.sql ************************************************************************/ SELECT TOP(25) SUBSTRING(qt.text,qs.statement_start_offset/2, (CASE WHEN qs.statement_end_offset = -1 THEN len(convert(nvarchar(max), qt.text)) * 2 ELSE qs.statement_end_offset END -qs.statement_start_offset)/2) AS statement_text, SUBSTRING (qt.text , 1, 512) AS batch_text, qs.total_worker_time, qs.total_elapsed_time, CAST((CAST(qs.total_worker_time AS decimal) / cast(qs.total_elapsed_time AS decimal) * 100) AS int) AS cpu_vs_elapsed_percentage, qs.total_worker_time/qs.execution_count AS average_cpu_time, qs.total_elapsed_time/qs.execution_count AS average_elapsed_time, qs.total_logical_reads, qs.execution_count, qs.total_logical_reads/qs.execution_count AS average_logical_reads, db_name(qt.dbid) AS [database name], qt.objectid, pln.statement_plan FROM sys.dm_exec_query_stats qs CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt CROSS APPLY master.dbo.statement_level_query_plan(plan_handle) AS pln WHERE statement_text LIKE '%' + REPLACE(LEFT(SUBSTRING((select text from master.sys.dm_exec_sql_text(sql_handle)), statement_start_offset/2, 1 + CASE WHEN statement_end_offset = -1 THEN LEN((SELECT text FROM master.sys.dm_exec_sql_text(sql_handle))) - statement_start_offset/2 ELSE statement_end_offset/2 - statement_start_offset/2 END) ,3000), '[','[') + '%' ORDER BY qs.total_worker_time DESC
Top 50 Query Plans in SQL Azure Database
-- TOP 50 Query Plans in SQL Azure Database SELECT TOP(50) CONVERT(bigint, DATEPART(yyyy,getdate())*10000000000)+(DATEPART(mm,getdate())*100000000)+(DATEPART(dd,GETDATE())*1000000)+(DATEPART(hh,GETDATE())*10000)+(DATEPART(mi,GETDATE())*100)+(DATEPART(ss,GETDATE())) AS timestampKey , GETDATE() AS eventdateUTC, CONVERT(decimal(8,2),(CAST(qs.execution_count AS FLOAT)/(DATEDIFF(mi,qs.creation_time,qs.last_execution_time)+1))*(((CAST(qs.total_worker_time AS FLOAT) / qs.execution_count) / 100000.00)+((CAST(qs.total_elapsed_time AS float) / qs.execution_count) / 1000000.00))) AS Weighting, query_plan.value('declare default element namespace "http://schemas.microsoft.com/sqlserver/2004/07/showplan"; (/ShowPlanXML//Object/@Index)', 'varchar(255)') AS IndexName, qp.query_plan, CASE WHEN query_plan.exist(' declare default element namespace "http://schemas.microsoft.com/sqlserver/2004/07/showplan"; /ShowPlanXML/BatchSequence/Batch/Statements//RelOp/@PhysicalOp[. = "Hash Match"]' )=1 THEN 1 ELSE 0 END AS hash_match , CASE WHEN query_plan.exist(' declare default element namespace "http://schemas.microsoft.com/sqlserver/2004/07/showplan"; /ShowPlanXML/BatchSequence/Batch/Statements//RelOp/@PhysicalOp[. = "Table Scan"]' )=1 THEN 1 ELSE 0 END AS table_scan , CASE WHEN query_plan.exist(' declare default element namespace "http://schemas.microsoft.com/sqlserver/2004/07/showplan"; /ShowPlanXML/BatchSequence/Batch/Statements//RelOp/@PhysicalOp[. = "Clustered Index Scan"]' )=1 THEN 1 ELSE 0 END AS clustered_index_scan ,SUBSTRING(qt.text,qs.statement_start_offset/2+1, (CASE WHEN qs.statement_end_offset = -1 THEN LEN(CONVERT(nvarchar(max), qt.text)) * 2 ELSE qs.statement_end_offset END -qs.statement_start_offset)/2) AS statement_text, qs.total_worker_time/qs.execution_count AS average_cpu_time, qs.total_elapsed_time/qs.execution_count AS average_elapsed_time, qs.total_logical_reads/qs.execution_count AS average_logical_reads, qs.total_logical_writes/qs.execution_count AS average_logical_writes, qs.execution_count, qs.plan_generation_num, qs.total_worker_time, qs.total_elapsed_time, qs.total_logical_reads, qs.total_logical_writes, db_name() AS [db_name], qt.objectid, qs.query_hash, qs.creation_time, qs.last_execution_time FROM sys.dm_exec_query_stats qs CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) AS qp WHERE qt.text NOT LIKE '%sys.%' ORDER BY qs.total_elapsed_time DESC
Snapshot of sys.dm_exec_requests with WAIT TYPE and WAIT TIME
-- Snapshot of current sys.dm_exec_requests with WAIT TYPE and WAIT TIME declare @i int = 0 DECLARE @Requests TABLE ( [timestampKey] [bigint] NULL, [eventdateUTC] [datetime] NOT NULL, [statement_text] [nvarchar](max) NULL, [command] [nvarchar](32) NOT NULL, [cpu_time] [int] NOT NULL, [total_elapsed_time] [int] NOT NULL, [wait_type] [nvarchar](60) NULL, [wait_time] [int] NOT NULL, [last_wait_type] [nvarchar](60) NULL, [wait_resource] [nvarchar](60) NOT NULL, [reads] [bigint] NOT NULL, [writes] [bigint] NOT NULL, [logical_reads] [bigint] NOT NULL, [row_count] [bigint] NOT NULL, [granted_query_memory] [int] NOT NULL, [query_hash] [binary](8) NULL, [query_plan] [xml] NULL, [hash_match] [bit] NULL, [table_scan] [bit] NULL, [clustered_index_scan] [bit] NULL, [session_id] [smallint] NOT NULL, [blocking_session_id] [smallint] NULL, [start_time] [datetime] NOT NULL, [status] [nvarchar](30) NOT NULL, [db_name] [nvarchar](128) NULL ) WHILE(@i < 20) BEGIN INSERT INTO @Requests SELECT CONVERT(bigint, datepart(yyyy,getdate())*10000000000)+(datepart(mm,getdate())*100000000)+(datepart(dd,getdate())*1000000)+(datepart(hh,getdate())*10000)+(datepart(mi,getdate())*100)+(datepart(ss,getdate())) as timestampKey , getdate() as eventdateUTC, SUBSTRING(t.text,r.statement_start_offset/2+1,(case when r.statement_end_offset = -1 then len(convert(nvarchar(max), t.text)) * 2 else r.statement_end_offset end -r.statement_start_offset)/2) as statement_text, command, cpu_time, total_elapsed_time, wait_type, wait_time, last_wait_type, wait_resource, reads, writes, logical_reads, row_count, granted_query_memory, query_hash, query_plan, case when query_plan.exist(' declare default element namespace "http://schemas.microsoft.com/sqlserver/2004/07/showplan"; /ShowPlanXML/BatchSequence/Batch/Statements//RelOp/@PhysicalOp[. = "Hash Match"]' )=1 then 1 else 0 end as hash_match , case when query_plan.exist(' declare default element namespace "http://schemas.microsoft.com/sqlserver/2004/07/showplan"; /ShowPlanXML/BatchSequence/Batch/Statements//RelOp/@PhysicalOp[. = "Table Scan"]' )=1 then 1 else 0 end as table_scan , case when query_plan.exist(' declare default element namespace "http://schemas.microsoft.com/sqlserver/2004/07/showplan"; /ShowPlanXML/BatchSequence/Batch/Statements//RelOp/@PhysicalOp[. = "Clustered Index Scan"]' )=1 then 1 else 0 end as clustered_index_scan ,session_id, blocking_session_id, start_time, status, db_name() as [db_name] FROM sys.dm_exec_requests r cross apply sys.dm_exec_sql_text(r.sql_handle) t cross apply sys.dm_exec_query_plan(r.plan_handle) p WHERE @@spid<>r.session_id waitfor delay '00:00:00:100' set @i = @i+1 end select * from @Requests;
Missing Index Analysis
-- Missing indexes stats SELECT CONVERT(bigint,datepart(yyyy,getdate())*10000000000)+(datepart(mm,getdate())*100000000)+(datepart(dd,getdate())*1000000)+(datepart(hh,getdate())*10000)+(datepart(mi,getdate())*100)+(datepart(ss,getdate())) [timestampKey] ,getdate() [eventdateUTC], db_name() as [db_name], mig.index_group_handle, mid.index_handle, 'CREATE INDEX missing_index_' + CONVERT (varchar, mig.index_group_handle) + '_' + CONVERT (varchar, mid.index_handle) + ' ON ' + mid.statement + ' (' + ISNULL (mid.equality_columns,'') + CASE WHEN mid.equality_columns IS NOT NULL AND mid.inequality_columns IS NOT NULL THEN ',' ELSE '' END + ISNULL (mid.inequality_columns, '') + ')' + ISNULL (' INCLUDE (' + mid.included_columns + ')', '') AS create_index_statement, migs.*, mid.database_id, mid.[object_id] FROM sys.dm_db_missing_index_groups mig INNER JOIN sys.dm_db_missing_index_group_stats migs ON migs.group_handle = mig.index_group_handle INNER JOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle WHERE CONVERT(decimal (28,1), migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans)) > 10 ORDER BY migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans) DESC