Article
11/03/2006

This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links.

Trace-scrubbing Tools

Andrew Zanevsky

Andrew Zanevsky shares his trace-scrubbing procedures that make it easy for you to handle large trace files and aggregate transactions by type–even when captured T-SQL code has variations.

SQL Server Profiler is a veritable treasure trove when it comes to helping DBAs optimize their T-SQL code. But, the surfeit of riches (I'm reminded of the Arabian Nights tale of Aladdin) can be overwhelming. I recently had one of those "sinking" feelings when I first tried to make sense of the enormous amount of data collected by traces on a client's servers. At this particular client, the online transactions processing system executes more than 4 million database transactions per hour. That means that even a 30-minute trace that captures "SQL Batch Completed" events results in a table with 2 million rows. Of course, it's simply impractical to process so many records without some automation, and even selecting the longest or most expensive transactions doesn't necessarily help in identifying bottlenecks. After all, short transactions can be the culprits of poor performance when executed thousands of times per minute. [Subscribers may want to re-read Linchi Shea's June 2002 article, "Distill SQL Server Log Files," for his Perl code. Linchi's PDF and Perl code are both included in this article's Download.–Ed.]

Trace-scrubbing rules

My intuition told me to use aggregation. But it didn't take me long to discover that grouping and aggregating trace files by transaction type didn't work, thanks to slight variations in executed queries. Each stored procedure call had different parameters, and queries were executed with different search arguments and slightly varying WHERE clause conditions. Some transactions included comments or had a different number of spaces, tabs, or line breaks within the otherwise identical code. Some queries used synonyms for keywords.

	To solve the problem of large trace analysis, I developed a trace-scrubbing process that strips less significant details of all trace records and allows grouping by transaction type. For example, if my trace file contains 50,000 records about execution of the same stored procedure (each with different parameters), then my process groups them into one line with duration, CPU, reads and writes–all totaled and averaged. This lets me quickly identify the transaction types that are responsible for a large percentage of overall system utilization.

	Implementation of trace scrubbing requires 1) parsing of all T-SQL statements recorded by the trace, and 2) eliminating variable parameters, as in the following example. Imagine that the three transactions listed here are captured by a trace.

  1. Proc_A 
      123, 
      'alpha', 
      0x0a9fff8b
2. EXECUTE [Proc_A] -345, 'beta', null /*comment*/
3. EXEC    dbo.Proc_A  9876543, default    -- test

	Although all three transactions execute stored procedure Proc_A, variations in execution, parameters, line breaks, tabs, spaces, and comments make them look very different. My scrubbing algorithm aggregates them into one record with transaction type stored as "Proc_A%".

	In order to recognize that several transactions executed the same stored procedure or, for example, performed INSERTs into the same table with different values, the scrubbing algorithm performs the following operations:

Eliminates EXEC and EXECUTE keywords.
Replaces all line breaks and tabs with spaces.
Treats multiple spaces as one.
Eliminates any numeric constants, hexadecimal values, literal strings, and NULL keyword.
Removes comments of both types ("double-dash" and /*...*/).
Removes commas between eliminated parameters.
Deletes the "dbo" owner name that may be used in front of many object names and in most cases is the default.
Reduces the code of each analyzed transaction to the first 800 characters. This allows me to treat long batches as the same transaction type if their first parts are identical (after reduction of less significant elements as described by other rules).
Implements special scrubbing for transactions that involve system procedures such as sp_cursoropen, which passes the actual executed query as a literal parameter to the SQL Server. For these cases, I extract the actual query from the parameters and discard the sp_cursoropen "wrapper."

	This set of rules substantially reduces the number of transactions that you have to consider and makes it easier to manually process the resulting list. Note that these rules are empirical and may have to be modified for different databases. For example, while working with another client, I had to add logic to my scrubbing procedure to ignore database names, because their server has more than 400 databases with identical schemas. The removal of database names allowed me to aggregate trace results by stored procedure name, regardless of the database context, and to identify the slowest and most expensive ones. Scrubbing in your particular environment may require further customization.

Stored procedures

The trace.sql code in the accompanying Download contains three stored procedures for trace scrubbing and analysis. Stored procedure Process_Trace, whose parameters are listed in Table 1, performs scrubbing on a given trace table. It finds transaction types after parsing all code captured in a trace table and creates two additional tables that allow linking of each transaction type to the original trace records with full SQL code. It assumes that the trace table contains, at a minimum, the following standard columns: RowNumber, with unique row identifiers; TextData, which stores full SQL code of each transaction; and EventClass, which contains numeric trace event types, as documented in BOL. By default, the procedure only analyzes events of types 10 (RPC_Completed) and 12 (TSQL_Batch_Completed), but you can specify any list of event types as a parameter. A sample trace definition template file that I use is available in the Download as performance.tdf.

Table 1. Parameters of stored procedure Process_Trace.

Parameter	Datatype	Description
@table	varchar(128)	Trace table name.
@debug	tinyint	Reserved for system maintenance.
@drop	Bit	0 (default) — normal processing, 1 — drop summary tables and index created by the previous execution of this procedure on the same trace table and quit.
@Event_Class_Filter	varchar(200)	Comma-separated list of trace event IDs to include in the scrubbing. Default — '10,12' (RPC_Completed and TSQL_Batch_Completed). A full list can be found in BOL.

	Before you execute Process_Trace, you need to run a server-side trace, saving data to a file, and then load the trace file into a database table using system function fn_trace_gettable. (An alternative approach is to run Profiler, capturing data directly into a table, but I don't recommend it for heavy-duty production tracing, because your client-side workstation performance may be insufficient to capture all trace events and some of them may escape recording.) Once you have a trace table, pass it as a parameter to Process_Trace. You can think of this procedure as a preliminary step, compiling data for the next one that generates reports.

	Scrubbing of a million trace records takes several minutes and results in a list of transaction types that may still be several thousand rows long, only some of which are likely to be important for performance analysis. In order to identify the bottlenecks in this list, I developed another stored procedure, Calculate_Most_Expensive_Transactions, whose parameters are listed in Table 2. Using the original trace table and additional tables created by Process_Trace, this procedure generates a report about transaction cost by "scrubbed T-SQL type." It assumes that the original trace table contains columns Duration, Reads, Writes, and CPU, in addition to the three columns listed earlier that are required by Process_Trace.

Table 2. Parameters of stored procedure Calculate_Most_Expensive_Transactions.

Parameter	Datatype	Description
@trace_table	varchar(128)	Trace table name.
@output_table	varchar(128)	Report output table name or NULL (default) for standard output.
@topN	Int	Include scrubbed types in the report only if they fall within the top N items in at least one ranking category. Default — 50. Show all other types aggregated in one 'All Other' line.
@average_ms	Int	Individually list in the report (not as 'All Other') all scrubbed transaction types with average duration over @average_ms milliseconds. Default — 1000 (that is, 1 second).
@debug	tinyint	Reserved for system maintenance.

	Depending on your environment, SQL Server resources, and application requirements, you may be interested in total, average, or maximum values of captured parameters, as well as execution counts for each transaction type. The procedure calculates all of these values, computes totals for the whole trace table and cost percentages of each transaction in the total cost of the whole recorded transaction mix, and ranks transaction types by execution count, total transaction duration, total reads/writes, and total CPU usage. For good measure, I've thrown in the number of times each transaction type executed under 1 second, over 1 second, 2 seconds, 5 seconds, and 10 seconds.

	Once you've executed Process_Trace, you can run Calculate_Most_Expensive_Transactions and give it the same trace table name. (The output will appear in the result window of Query Analyzer.) Optional parameters allow storing it in a table for retention and historical trend analysis, but you don't need this option for testing.

	The result set allows you to identify the most expensive transactions by any parameter. You can find out what stored procedure or query was the most expensive I/O user, quickly find the top five heavy hitters in CPU utilization, identify the transactions that take the longest to execute on average, and so on. I paste the report into Microsoft Excel and sort it in different ways to analyze different performance aspects. A sample report is too wide for this page, but one is included in the Download as file SampleReport.xls. Fragments of this report are shown in Table 3 and Table 4.

Table 3. Sample trace report–top 10 transactions by total CPU resource used.

SQL_Type	Exec_Count	Total_CPU	Average_CPU	Percent_CPU	CPU_Rank
getListView%	1705	346890	203	11.709	1
getNext%	6199	107948	17	3.644	2
SELECT Auc_Number, sum(Sold_Count) FROM vwAuc GROUP BY Auc_Number	64	107107	1673	3.615	3
reportView%	66	71701	1086	2.42	4
getBuyerChargeSummary%	299	64048	214	2.162	5
AssignableWorkers%	162	63393	391	2.14	6
getDocument%	67	61230	913	2.067	7
reassignItem%	54782	57942	1	1.956	8
searchLocations%	27	54264	2009	1.832	9
SELECT * from GetProcessNoteList_vw WHERE Process_ID=%	640	54219	84	1.83	10
All Other:				66.625

Table 4. Sample trace report–top 10 transactions by average duration.

SQL_Type	Exec_Count	Total_Duration	Average_Duration	Max_Duration
SELECT Item_ID, Flags FROM v_DetailedPresale WHERE%	1	27,546	27,546	27,546
GetClosing_Approval_Listing;%	1	24,046	24,046	24,046
ProcessClosingqualifier%	1	16,656	16,656	16,656
SELECT * FROM v_Caller_Process_Provider_Association WHERE Phone_Number_%LIKE%	2	26,343	13,171	25,123
getIncompleteProcessDocForms%	2	25,029	12,514	16,546
NT_CSAUndelivered%	2	24,093	12,046	17,313
p_MoveBuyerPaymentDocVPRState%	2	22,829	11,414	20,783
AssociateSPEmployee%	1	11,296	11,296	11,296
SELECT retVal FROM v_Caller_Process_Provider_Association WHERE phone_number_%_Raw like%	3	32,389	10,796	16,530
delayedProcessClosing%	3	30,562	10,187	30,530
All Other:	1,361,157	12,816,138	9	30,643

	Excel also helps me to create pie charts illustrating "hit parades" of most expensive transactions, such as the ones shown in Figure 1 and Figure 2. They help me to instantly focus on the most critical transactions with high execution cost hindering performance. In most cases, I find that just a handful of transaction types out of the thousands of all executed ones are responsible for the bulk of SQL Server utilization. Narrowing down this list is crucial in performance analysis and optimization effort. Different servers may have different bottlenecks. Some are short on I/O resources, some on CPU power; for other servers, the capacity is sufficient, but average transaction duration is unacceptable. Depending on the specific challenges of your environment, you may derive different reports and charts from the base report that will isolate problems in the most important areas.

	To give you a taste of the data that the report offers for analysis, the SampleReport.xls file provides the following information about stored procedure getListView%. It was executed 1705 times during the trace. Out of this number, 1498 times it took less than 1 second, 120 times up to 2 seconds, 67 times up to 5 seconds, 16 times up to 10 seconds, and 4 times more than 10 seconds. Total duration of all executions was 961 seconds. All executions combined cost the server 3.6 million page reads, 733 writes, and 347,000 CPU units (cache page hits). Average duration per execution was a little over 0.5 second; it took 2125 reads (less than one write operation) and 203 CPU units. Maximum duration was almost 12 seconds, and the maximum number of reads in a single execution was 29,000+, 12 writes, and 2235 CPU units. Out of all transactions, this type was responsible for one-eighth of one percent of the total execution count, over 7 percent of the total duration, 4 percent of all reads, 0.8 percent of writes, and 11.7 percent of total CPU usage. In the proverbial hit parade, this transaction type was #61 in execution count, #2 by total duration, #4 in I/O (Reads+Writes), but #1 in CPU utilization.

	Based on this information, I knew that if my SQL Server was hurting on CPU capacity, then optimization of this stored procedure could potentially cut CPU usage by up to 11.7 percent. It also had the potential to yield up to 7 percent of I/O utilization. If my users complained about poor response time, I could see that this procedure was making them wait 1-12 seconds 207 times during the trace period. Optimization could potentially decrease the number of such user wait incidents, thus improving their experience. Some of this analysis would have been very difficult without scrubbing and aggregation.

	Note that reports generated by Calculate_Most_Expensive_Transactions contain transaction type IDs. These values can be used to retrieve all original SQL code for a particular transaction type using stored procedure Report_All_Transactions_By_SQL_ID, whose parameters are listed in Table 5. This procedure helps to analyze details of a scrubbed transaction type and find examples of actual unscrubbed T-SQL code.

Table 5. Parameters of stored procedure Report_All_Transactions_By_SQL_ID.

Parameter	Datatype	Description
@Table	varchar(128)	Trace table name.
@SQL_ID	int	Scrubbed SQL type id from a report produced by Calculate_Most_Expensive_Transactions.
@Sort	varchar(20)	Column name to sort by. Possible values: StartTime, Reads, CPU (default), Duration, or spid.

	Implementation of the scrubbing algorithm helped me to improve performance for several companies in a short amount of time, with little or no prior knowledge of the application and database schema. After I narrow down performance bottlenecks to just a few stored procedures or queries, I can focus on analyzing them and, for example, finding a missing index, a cursor that can be replaced with set processing, or another more effective algorithm implementation. My trace "treasures" are now neatly organized in their "coffers" and don't threaten overflowing.

	In a future article, I'll share with you a fully automated system that my colleagues and I built around the stored procedures presented in this article. It executes daily traces on a production server and automatically generates and publishes expensive transaction reports, using trace scrubbing and aggregation.

Download 410ANDREW.ZIP

To find out more about Microsoft SQL Server Professional and Pinnacle Publishing, visit their Web site at http://www.pinpub.com/

Note: This is not a Microsoft Corporation Web site. Microsoft is not responsible for its content.

This article is reproduced from the October 2004 issue of Microsoft SQL Server Professional. Copyright 2004, by Pinnacle Publishing, Inc., unless otherwise noted. All rights are reserved. Microsoft SQL Server Professional is an independently produced publication of Pinnacle Publishing, Inc. No part of this article may be used or reproduced in any fashion (except in brief quotations used in critical articles and reviews) without prior consent of Pinnacle Publishing, Inc. To contact Pinnacle Publishing, Inc., please call 1-800-788-1900.

Trace-scrubbing Tools

Trace-scrubbing rules

Stored procedures

Additional resources