Tuning is a bit more art than science. No single tuning technique solves every problem, and very often techniques interact. The approach should always be methodical, including baseline and differential testing particularly when justifying capital expenditures.
Performance limitations should be viewed as a series of roadblocks. Removing one roadblock doesn’t clear the road, but simply makes it easier to drive. Not until all of the roadblocks have been removed does performance truly jump. On a recent project a customer went through the entire list of options described below, save one. Performance improved modestly, some 30% or so. But when the last roadblock was removed, the system performance immediately doubled. On the identically configured UAT (User Acceptance Test) environment they did the “obvious”; they started with the last item, the “one that made the most difference”. To their surprise, removing that obstacle made only a small difference in performance. Again, it was not until they executed the entire list that the UAT environment achieved the performance of the production system. While this pattern won’t always be the case, there is a strong argument to implement all of the recommended adjustments, and to the greatest extent possible, to achieve maximum benefit. Similarly, do not be discouraged if the first recommendation implemented doesn’t double the speed of the system. Other roadblocks are likely blocking the way.
Recommended Methodology
Begin with an “inventory” of the current system. Even for a new installation if it was created from (corporate) standard images. Various applications and system utilities (e.g., Anti-Virus software) make changes to the registry to support their operations. An adjustment calculated for performance improvement may have already been made by another installation, in fact, simply applying the adjustment as part of a registry file (.reg) could actually de-tune the system. Remember, too, that there are no “undo” operations for individual registry changes (not withstanding registry checkpoints). A careful inventory creates a rollback position if one or more changes show adverse effects.
Tuning should generally be performed from the bottom up. As engineers, it’s very tempting to attempt to target individual bottlenecks for performance improvement and attack them on an ad hoc basis. And this methodology has its place; a badly written SQL query can’t be tuned performant. But very often performance problems are merely symptoms of underlying platform issues. For example, one organization had written a query joining nineteen tables which performed very poorly. Their developers tried the usual query tuning with little or no effect, and then began a major rewrite effort to “pre-fetch” all of the information into a separate table using batch updates, etc. However, the “real” problem was neither their architecture nor their query (which had actually been well optimized). Instead, it was a single-threaded infrastructure design that serialized their entire database file I/O through a single data file. Simply changing the database implementation (covered below) allowed the very same query to run in 200 milliseconds instead of 20 seconds, irrespective of the number of parallel query executions. The cost was minimal (a SAN layout change) and the result spectacular.
Optimizing deployed applications and/or database content carries significant risk. Modified application bits usually must be recertified (a process that could span weeks of testing). And very often a change in one part of the application leads to problems (e.g., resource starvation) in another area. Changing applications absent a solid, highly performing infrastructure is usually a zero-sum game that ultimately proves futile. The changes merely mask problems instead of solving them. Far better results are obtained by increasing the size of the resource “pie” by introducing parallelism, expanding and pre-allocating resource pools, etc.; the alternative is to simply keep re-slicing it to accommodate competing and oft-changing application requirements.
The bottom-up methodology starts with the iron and works its way up to the applications as follows (in the recommended order of execution):
- Hardware – Memory, I/O, BIOS, Firmware
- Windows – Disk and Memory Management, LANMan Stack, TCP/IP Stack
- Network Infrastructure - Subnets, Devices, Drivers, Device Settings
- SQL Server Application – Cache, Locks, TEMPDB, Memory Management
- BizTalk Server (and User Application) Databases – Filegroups, LOB data
-
BizTalk Server Applications (not covered in this document) – Threading, Throttling
Approaches for each of the covered areas are detailed in this document. It’s not necessary to do everything to do anything, but it’s best to do as much as possible. Each section can stand on its own as a worthy project, including costs and benefits. The advantage of proceeding upwards through the list is minimizing the strong possibility of wasting time “fixing” a problem at a higher level when a simple lower level change would eliminate it. Nothing worse than spending a week troubleshooting an application level performance problem only to discover a simple BIOS patch (that should have been installed two months ago) is the culprit. Or that the anti-virus software profile was changed to scan all files with a particular extension, the one being used by a file splitter in a critical application that is suddenly having performance problems “for no reason”.
Testing
It is important to document the performance of the system both before and after each “section” of tuning. But testing is only effective when it means some basic requirements.
Accuracy and precision are the two most important concepts in testing, are frequently poorly understood, and even when understood are often confused. A precise test is one that is repeatable, that yields approximately the same results for the same input across multiple test runs. Precision testing is very hard to achieve in BizTalk Server (or other messaging-based) systems. Outside applications can have widely varied responses. The local system may have periodic, capacity draining scheduled jobs. Test apparatus often lacks the ability to inject messages at a consistent rate. Even basic network variations (e.g., competing load from other systems or applications) can cause an otherwise precise test to produce unusable results.
Accuracy is the degree to which a test reflects reality. The oft used analogy is that precision is the ability to hit the same spot with a dart over and over; accuracy is whether the spot is the bull's-eye or in the next room. For example, it is very important to test with adequate and representative load. But what defines adequate? A high transaction volume system can’t be tested by sending through a single transaction. But how many transactions are required? While it’s always desirable to tune a system so it passes performance requirements with room to spare (headroom), it’s equally important to ensure that the capital expenditures, potential disruption and uncertainty, and other side-effects of tuning changes justify the results when the requirement has already been met.
The usual method of testing is to create a baseline of the current system, make one or more adjustments, then retest and compare the results. But while performance testing isn’t functional testing, a system should be functionally sound before taking the baseline or generating the post-adjustment results. Processing errors can effectively invalidate the baseline and/or the post-change results, rendering them useless in terms of accuracy.
One other comment about testing, specifically regarding an issue that sometimes appears in BizTalk Server or affiliated systems, there is a significant difference between simultaneous and concurrent “users”. A good example is the phone company. Everyone with a phone is a concurrent user of the phone system simply by being connected to the network but only a small number of connections are simultaneously in use, that is, making a call at the same time. Or, while a web site may have many concurrent users (sessions), only a fraction of them is simultaneously requesting a page, posting back information, etc. For tests to be accurate, they must reflect the true ratio of concurrent to simultaneous users where applicable to the system being evaluated.
Appendix A has details on the testing performed on the recommendations below. It specifically addresses the testing issues raised in this section. There is also an excellent white paper, available for download at the BizTalk Server site, titled “Managing a Successful Performance Lab” (http://go.microsoft.com/fwlink/?LinkId=98734); it contains many helpful suggestions that can also be applied outside the lab.
Base Hardware Platform Tuning
Hardware manufacturers (Dell, IBM, HP, etc.) regularly release updates to BIOS, firmware, and drivers. Not all of these need be immediately installed, but many are critical updates that resolve performance and/or availability issues. Network driver and firmware updates in particular often add tuning parameters that tie directly to higher performance; BIOS and SAN drivers and firmware often incorporate performance improvements as well.
It is critical that hyperthreading be turned off for BizTalk Servers. This is a BIOS setting, usually found in the Processor section of the BIOS setup. Hyperthreading makes the server appear to have more processors/processor cores than it actually does; hyperthread processors typically provide between 20 and 30% of the performance of a physical processor/processor core. BizTalk Server counts the (apparent) number of processors and adjusts its self-tuning algorithms accordingly; the “false” processors cause these adjustments to be skewed and are actually detrimental to performance.
It has been reported (for example http://go.microsoft.com/fwlink/?LinkId=96724) that hyperthreading can also reduce the performance of SQL Server 2005. If hyperthreading is to be used with SQL Server applications, it should be performance tested to ensure it is actually improving performance.
Windows Tuning – (10-20% or Greater Performance Improvement)
Microsoft Windows Server® comes out of the box configured as a file server, and a relatively low-bandwidth one at that. The TCP/IP stack is single threaded, the GUI has at least some of the high-presentation (and high resource consumption) features enabled, etc. Presumably this is the most common usage of the platform, but interestingly enough nearly all of the layered server applications (e.g., BizTalk Server, SQL Server, IIS, and Windows SharePoint® Services) are intended for exactly the opposite paradigm. For those, Windows Server should be configured for network applications, high network throughput and parallelism, and no kernel-hogging “fancy” graphics. Realigning Windows for network operations can yield significant performance results with no other cost than a reboot.
Disk Infrastructure
Somewhere in the world there is guidance advising creation of a three-disk RAID 5 configuration that is then Windows partitioned into separate logical volumes for the SYSTEM drive, PAGEFILE, and DATA drives; it must exist because it is impossible that so many systems could be configured this way by accident. Under no circumstances should this be the configuration for a high-performance server. In fact, the terms high-performance and RAID 5 are oil and water; they just don’t mix. For more details on why RAID 5, or any RAID that doesn’t “start” with 1 (e.g., RAID 1, RAID 1+0, etc.) should be avoided at all costs, consult Battle Against Any Raid Five. This site lists all of the ignoble ways RAID 5 steals system performance, as much as 20% or more. If the urge (aka, “corporate policy”) to use RAID 5 is too strong to resist, use the hardware controller to carve the requisite number of drives from the available storage as separate LUNs rather than using Windows partitions. This way the configuration will violate only two rather than all three of the Windows best practices for system partitions (especially those for the PAGEFILE):
-
Put the PAGEFILE on a separate physical volume.
-
Avoid RAID 5 or software RAID 1 (H/W RAID 1 mirrors, for availability, are acceptable and encouraged for High Availability environments).
-
Avoid Windows partitions.
Many corporations create separate logical drives for the Windows O/S (managed by one group) and application software (managed by one or more other groups). This does not by itself present a significant performance issue, so long as the Windows PAGEFILE is not placed on one of the partitions but rather a separate local (not SAN/NAS) physical spindle or mirror set. Be sure there are sufficient drive letters left to perform the remainder of the SQL Server infrastructure tuning (see below).
The suggested minimum configurations for local disks on various types of servers (these configurations all assume a SAN/NAS for the data disks, where needed):
- BizTalk Server
-
SYSTEM
-
PAGEFILE
-
STREAMING (if used, otherwise omit; may also be combined with MAPPING)
-
MAPPING (if used, otherwise omit; may also be combined with STREAMING)
- SharePoint Portal and/or Windows SharePoint Services/MOSS
- SQL Server/Analysis Services/Reporting Services
- IIS
-
SYSTEM
-
PAGEFILE
-
CONTENT (if serving web pages)
For BizTalk Servers that have applications with a heavy use of streaming XML, the temporary data store should be placed on a separate physical volume/mirror set. The same is true for BizTalk Servers that employ large maps. Both of these mechanisms currently use the GetTempPath functionality (searching, in order, the path specified by the TMP environment variable, the path specified by the TEMP environment variable, the path specified by the USERPROFILE environment variable, and the Windows directory). If a separate volume is used, the TMP environment variable for the Windows account context in which the application host runs must be changed to use that volume.
General Performance
There are a number of basic Windows Server settings that can be set using the Windows GUI. Defragmenting the PAGEFILE and pre-allocating the MFTs (Master File Tables) of each disk requires a third-party tool such as fully-enabled Diskeeper (rather than the Diskeeper Lite that comes with Windows Server). There are also some useful defragmenting tools (e.g., PageFileDefrag) available at the Microsoft SysInternals website.
PAGEFILE
| |
Difficulty Level
|
Action
|
Comment
|
| | | Placed on a separate local (not SAN/NAS) Physical Drive | Do not place on a Windows partition |
| | | Defragmented (no more than two fragments, preferably one) | Requires a third party tool |
| | | Fixed size (not system managed) | Prevents future fragmentation |
UI Performance
| |
Difficulty Level
|
Action
|
Comment
|
| | | All accounts and default set to minimize UI impact | GUI impact is Windows kernel cycles, so minimize it |
Server Options
| |
Difficulty Level
|
Action
|
Comment
|
| | | Configure server to favor background services | This is a server, not a desktop |
| | | Configure server to favor Programs | Unless building a file server, favor programs over system cache |
Network Options
| |
Difficulty Level
|
Action
|
Comment
|
| | | Maximize data throughput for network applications | Unless building a file server |
Disk Optimization
| |
Difficulty Level
|
Action
|
Comment
|
| | | Defragment all disks (local and SAN/NAS) | Use a third party tool if more than a few disks |
| | | Defragment MFTs and pre-allocate to appropriate size | Defragment with Windows defragmenter, pre-allocate with a third-party tool |
Anti-virus Software
Set the Anti-virus software to scan only on file open or creation, not modification.
| |
Difficulty Level
|
Action
|
Comment
|
| | | Disable real-time scanning on SQL Server files (.mdf, .ndf, .ldf, .mdb) | |
| | | Disable real-time scanning on non-executable file types in BizTalk Server file-drops (usually .XML, but can also be .csv, etc.) | |
Intrusion Detection
Network traffic scanning can cause connections to incorrectly fail, resulting in possible dropped messages and poor performance.
| |
Difficulty Level
|
Action
|
Comment
|
| | | Disable network scanning between BizTalk Server and SQL Server | |
| | | Disable network scanning between BizTalk Server and interface servers (e.g., WebsphereMQ) | |
Registry Settings
The following registry settings should be applied to the BizTalk Server, SQL, IIS, SharePoint Services, etc. according to the comments for each. Note the difficulty/risk indicator for each, and ensure the appropriate resource understands the ramifications of each setting accordingly.
HKLM\SYSTEM\CurrentControlSet\Control\ Session Manager\Memory Management
| |
Difficulty Level
|
Value Name
|
Value Name
|
Comment
|
| | | DisablePagingExecutive | 1 | Prevent Windows from paging itself out |
| | | SystemPages | 0xFFFFFFFF | Mitigates heavy kernel usage |
HKLM\System\CurrentControlSet\ Services\LanmanServer\Parameters
| |
Difficulty Level
|
Value Name
|
Value Name
|
Comment
|
| | | IRPStackSize | +10 (Decimal) | Use DWORD 0x20 if not present, DWORD 0x32 if heavy usage of administrative shares or anti-virus S/W |
| | | SizReqBuf | 0x4000 (16384) | |
HKLM\SOFTWARE\Microsoft\ MSMQ\Parameters
| |
Difficulty Level
|
Value Name
|
Value Name
|
Comment
|
| | | IgnoreOSNameValidation | 1 | Set this value on clustered MSMQ servers that are not supporting NLB-balanced queues on multiple hosts. Allows access to an aliased clustered queue by the BizTalk Adapter. For more information, see http://support.microsoft.com/kb/899611. |
HKLM\SYSTEM\CurrentControlSet\ Services\Tcpip\Parameters
| |
Difficulty Level
|
Value Name
|
Value Name
|
Comment
|
| | | DefaultTTL | 0x40 (64) | Reduce the time dead packets stay on the network |
| | | EnablePMTUDiscovery | 1 | Beware of this if server is directly exposed to potential attackers (incoming web connections) |
| | | EnablePMTUBHDetect | 1 | Important if the system is using a SOAP or HTTP adapter and/or initiating web connections to other systems |
| | | TcpMaxDupAcks | 2 | Wait for two duplicate acknowledgements before resending packets with lagging sequence numbers |
| | | Tcp1323Opts | 1 | If experiencing high packet loss/retransmits, set to 3 |
| | | SACKOpts | 1 | Selective Acknowledgements, VERY important for large TCP Window sizes |
| | | TcpWindowSize (1Gb network only) | 0x591C8 (365000) | Default is 17x MSS (Max Segment Size) for 100Mb |
| | | MaxFreeTcbs | 0x5000 (20480) | Increase the number of TCP Control Blocks to eliminate delays when creating connections |
| | | TcpMaxSendFree | 0xFFFF (65535) | Helps improve performance under high network loads |
| | | MaxHashTableSize | 0xFFFF (65535) | Maximize the size of the TCP hash table (stores TCP connection information) |
| | | MaxUserPort | 0xFFFF (65535) | Prevents port exhaustion |
| | | TcpTimedWaitDelay | 0x1E (30) | Release unused ports as quickly as possible |
| | | GlobalMaxTcpWindowSize | Depends on port speed | 0xFFFF (65535) for 100Mb, 0xFFFFFFFF for 1Gb networks |
| | | NumTCBTablePartitions | Depends on processor core count | Four per processor/processor core, do not include hyperthreading |
| | | TcpAckFrequency | Depends on port speed | 5 for 100Mb, 13 for 1Gb - Requires Windows Server 2003 Hotfix 815230 if going to be set to zero - can also be set at the interface level if mixed speeds; only set for connections primarily processing data |
| | | SynAttackProtect | 0 | Set this on all servers of any type that are not directly exposed to the internet |
HKLM\SYSTEM\CurrentControlSet\Services\AFD\Parameters
AFD is the Windows File Descriptor management – setting these values allows Windows to vary the size of the socket file descriptor backlog to accommodate “rushes” of connection attempts; it is also used as a TCP/IP stack hardening technique as it compensates for DOS (Denial Of Service) attacks that attempt to start a large number of false connections.
| |
Difficulty Level
|
Value Name
|
Value Name
|
Comment
|
| | | EnableDynamicBacklog | 1 | |
| | | MinimumDynamicBacklog | 0xc8 (200) | |
| | | MaximumDynamicBacklog | 0x4e20 (20000) | |
| | | DynamicBacklogGrowthDelta | 0x64 (100) | |
HKLM\CurrentControlSet\ Control\FileSystem
| |
Difficulty Level
|
Value Name
|
Value Name
|
Comment
|
| | | ContigFileAllocSize | 0x40 (64) | Adjusts the minimum contiguous file allocation to 64KB; this is generally only for database servers, and should be used with great caution as it can cause servers with heavy small file traffic to quickly fill disks. On the positive side, it significantly reduces fragmentation of data files by forcing larger contiguous allocations. |
HKLM\SYSTEM\CurrentControlSet\ Control\SecurityProviders\SCHANNEL
HKLM\CurrentControlSet\ Services\InetInfo\Parameters
HKLM\CurrentControlSet\Services\ lanmanserver\Parameters
| |
Difficulty Level
|
Value Name
|
Value Name
|
Comment
|
| | | DisableStrictNameChecking | 1 | Set these on all SQL Servers to allow access to SQL Server Analysis Services using Application Names (aliases) |
HKLM\SYSTEM\CurrentControlSet\Control\Lsa
| |
Difficulty Level
|
Value Name
|
Value Name
|
Comment
|
| | | DisableLoopbackCheck | 1 | Set these on all SQL Servers to allow access to SQL Server Analysis Services using Application Names (aliases). |
Example - 100Mb Network SQL Server Registry merge file
Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa]
“DisableLoopbackCheck”=dword:00000001
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Network\Connections\StatMon]
"ShowLanErrors"=dword:00000001
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management]
"DisablePagingExecutive"=dword:00000001
"SystemPages"=dword:ffffffff
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\AFD\Parameters]
"EnableDynamicBacklog"=dword:00000001
"MinimumDynamicBacklog"=dword:000000c8
"MaximumDynamicBacklog"=dword:00004e20
"DynamicBacklogGrowthDelta"=dword:00000064
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\lanmanserver\parameters]
"IRPStackSize"=dword:00000020
"SizReqBuf"=dword:00004000
“DisableStrictNameChecking”=dword:00000001
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters]
"DefaultTTL"=dword:00000040
"EnablePMTUBHDetect"=dword:00000001
"EnablePMTUDiscovery"=dword:00000001
"TcpMaxDupAcks"=dword:00000002
"Tcp1323Opts"=dword:00000001
"SACKOpts"=dword:00000001
"MaxFreeTcbs"=dword:00005000
"TcpMaxSendFree"=dword:0000ffff
"MaxHashTableSize"=dword:0000ffff
"MaxUserPort"=dword:0000ffff
"TcpTimedWaitDelay"=dword:0000001e
"GlobalMaxTcpWindowSize"=dword:0000ffff
"NumTCBTablePartitions"=dword:<4 * number of processors/processor-cores>
"TcpAckFrequency"=dword:00000005
"SynAttackProtect"=dword:00000000
Example - 1Gb Network SQL Server Registry merge file
Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa]
“DisableLoopbackCheck”=dword:00000001
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Network\Connections\StatMon]
"ShowLanErrors"=dword:00000001
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management]
"DisablePagingExecutive"=dword:00000001
"SystemPages"=dword:ffffffff
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\AFD\Parameters]
"EnableDynamicBacklog"=dword:00000001
"MinimumDynamicBacklog"=dword:000000c8
"MaximumDynamicBacklog"=dword:00004e20
"DynamicBacklogGrowthDelta"=dword:00000064
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\lanmanserver\parameters]
"IRPStackSize"=dword:00000020
"SizReqBuf"=dword:00004000
“DisableStrictNameChecking”=dword:00000001
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters]
"DefaultTTL"=dword:00000040
"EnablePMTUBHDetect"=dword:00000001
"EnablePMTUDiscovery"=dword:00000001
"TcpMaxDupAcks"=dword:00000002
"Tcp1323Opts"=dword:00000001
"SACKOpts"=dword:00000001
"MaxFreeTcbs"=dword:00005000
"TcpMaxSendFree"=dword:0000ffff
"MaxHashTableSize"=dword:0000ffff
"MaxUserPort"=dword:0000ffff
"TcpTimedWaitDelay"=dword:0000001e
"GlobalMaxTcpWindowSize"=dword:ffffffff
"NumTCBTablePartitions"=dword: <4 * number of processors/processor-cores>
"TcpAckFrequency"=dword:0000000D
"SynAttackProtect"=dword:00000000
"TcpWindowSize"=dword:000591C8
Example - 100Mb Network BizTalk Server Registry merge file
Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Network\Connections\StatMon]
"ShowLanErrors"=dword:00000001
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\AFD\Parameters]
"EnableDynamicBacklog"=dword:00000001
"MinimumDynamicBacklog"=dword:000000c8
"MaximumDynamicBacklog"=dword:00004e20
"DynamicBacklogGrowthDelta"=dword:00000064
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management]
"DisablePagingExecutive"=dword:00000001
"SystemPages"=dword:ffffffff
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\lanmanserver\parameters]
"IRPStackSize"=dword:00000020
"SizReqBuf"=dword:00004000
"DisableStrictNameChecking"=dword:00000001
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters]
"DefaultTTL"=dword:00000040
"EnablePMTUBHDetect"=dword:00000001
"EnablePMTUDiscovery"=dword:00000001
"TcpMaxDupAcks"=dword:00000002
"Tcp1323Opts"=dword:00000001
"SACKOpts"=dword:00000001
"MaxFreeTcbs"=dword:00005000
"TcpMaxSendFree"=dword:0000ffff
"MaxHashTableSize"=dword:0000ffff
"MaxUserPort"=dword:0000ffff
"TcpTimedWaitDelay"=dword:0000001e
"GlobalMaxTcpWindowSize"=dword:0000ffff
"NumTCBTablePartitions"=dword:<4 * number of processors/processor-cores>
"TcpAckFrequency"=dword:00000005
"SynAttackProtect"=dword:00000000
Example - Registry merge file
Use the following code for registry merge file to apply to any server connecting to SQL Server Analysis Services using an alias where the connection fails with a “Network Name not found” error (or similar).
Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\lanmanserver\Parameters]
“DisableStrictNameChecking”=dword:00000001
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Lsa]
“DisableLoopbackCheck”=dword:00000001
Example - 1Gb Network BizTalk Server Registry merge file
Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Network\Connections\StatMon]
"ShowLanErrors"=dword:00000001
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\AFD\Parameters]
"EnableDynamicBacklog"=dword:00000001
"MinimumDynamicBacklog"=dword:000000c8
"MaximumDynamicBacklog"=dword:00004e20
"DynamicBacklogGrowthDelta"=dword:00000064
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management]
"DisablePagingExecutive"=dword:00000001
"SystemPages"=dword:ffffffff
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\lanmanserver\parameters]
"IRPStackSize"=dword:00000020
"SizReqBuf"=dword:00004000
"DisableStrictNameChecking"=dword:00000001
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters]
"DefaultTTL"=dword:00000040
"EnablePMTUBHDetect"=dword:00000001
"EnablePMTUDiscovery"=dword:00000001
"TcpMaxDupAcks"=dword:00000002
"Tcp1323Opts"=dword:00000001
"SACKOpts"=dword:00000001
"MaxFreeTcbs"=dword:00005000
"TcpMaxSendFree"=dword:0000ffff
"MaxHashTableSize"=dword:0000ffff
"MaxUserPort"=dword:0000ffff
"TcpTimedWaitDelay"=dword:0000001e
"GlobalMaxTcpWindowSize"=dword:0000ffff
"NumTCBTablePartitions"=dword:<4 * number of processors/processor-cores>
"TcpAckFrequency"=dword:0000000D
"SynAttackProtect"=dword:00000000
"TcpWindowSize"=dword:000591C8
IIS Settings
IIS has many tunable parameters, and these are changed/enhanced with each new version of the server. The following is a set of performance and BizTalk Server related IIS parameters and their recommended values for BizTalk Server deployments. Some of the parameters are settable for IIS 5.0 only; these are indicated in the comment section.
Note |
|---|
|
Many of these parameters are very important for secondary systems (those referenced by BizTalk Server), especially when web services are being used. If the BizTalk Servers are hosting web pages (not recommended), high message load can cause web timeouts and other conditions also addressed by modifying the values of these parameters. |
ADSUtil Tool
| |
Difficulty Level
|
Action
|
Comment
|
| | | ASPThreadGateEnabled | IIS 5.0 or earlier only: Allows IIS to self-tune based on current load. Test web content performance with this setting turned on. |
| | | ASPThreadGateLow | IIS 5.0 or earlier only. By default this value is 50%. Higher values may improve performance. Test values above 50 when ASPThreadGateEnabled is turned on. |
| | | AspProcessorThreadMax | For guidance on this setting, see http://support.microsoft.com/kb/238583/EN-US/. Set per instructions to maximize processor usage. Be especially careful with this setting if hosting web content on a BizTalk Server, or the BizTalk application process(es) can become processor starved. |
| | | AspScriptEngineCacheMax | The default is 128, but higher values can improve performance. |
| | | AspScriptFileCacheSize | The goal for web page-based content is 90% cache hits. Monitor the ASP Cache Hit Ratio PerfMon stat to check the hit ratio, and experiment with increasing this value (increasing the cache size) to achieve the highest ratio possible based on the content. |
| | | MaxEndpointConnections | The default for this setting is 100. Set this value to 500 or higher if connect failures or blank screens occur when requesting web content. For more information about this setting, see http://support.microsoft.com/kb/229814/. |
| | | ServerListenBacklog | This setting is used with MaxEndpointConnections; the default value is based on the ServerSize parameter. If IIS connections fail try increasing this value. For more information about this setting, see http://support.microsoft.com/kb/q229814/. |
| | | ServerSize | Be sure this is set to a size appropriate to the load (the default is medium load). Test with a value of 2 (high usage) if appropriate. Note |
|---|
|
The value of this setting affects the default values for a number of other settings (see above). |
|
| | | HcDoDynamicCompression | Enable compression of dynamic web content at the request of the invoking web client; test performance with compression enabled. |
| | | HcDoStaticCompression | Test with compression enabled. Enable compression of static web content at the request of the invoking web client; test performance with compression enabled. |
| | | AppAllowDebugging | Set to False on production systems; debugging is very expenseive in terms of resources and also poses a significant security risk. |
| | | AspBufferingOn | Set this parameter to True Tuning. For more information, see http://go.microsoft.com/fwlink/?LinkID=97969. |
| | | AspQueueConnectionTestTime | Test higher value if connections are dropped prematurely. For more information, see http://go.microsoft.com/fwlink/?LinkID=97970. |
| | | AspRequestQueueMax | Increase this value if “Server Busy” messages are returned for web requests. This will often occur on secondary systems (systems referenced by BizTalk Server) but may also occur on BizTalk Server systems hosting web services or web pages (the latter is not recommended). |
| | | AspDiskTemplateCacheDirectory | By default this is set to %windir%\system32\inetsrv\ASP. Change the value to move compiled content templates to the dedicated IIS content drive (see above). |
| | | AspSessionTimeout | Increase the default value if user sessions time out prematurely (this is not an issue for web service hosting and is only effective if sessions are being used). |
| | | ASPSessionMax | Set this to unlimited on dedicated IIS servers, but use care if hosting sessions on a BizTalk server. |
| | | ASPTrackThreadingModel | Set this to FALSE per Microsoft IIS Best Practices (http://go.microsoft.com/fwlink/?LinkId=98089). If set to TRUE, be sure that all applications bind the FreeThreadedMarshaller class. A TRUE setting also consumes significant Windows system/kernel resources. |
| | | CacheISAPI | Set this to TRUE. BizTalk Server uses ISAPI for the http and SOAP interfaces. It will normally be turned on (the default) but “standard” images occasionally turn it off to conserve memory. |
IIS MMC Snap-in Tool
| |
Difficulty Level
|
Action
|
Comment
|
| | | Application Pool Accounts set to appropriate privilege levels | BizTalk Server creates many application pools depending on the configured services. Be sure the accounts for these have correct privileges, especially the pools associated with BAS. |
| | | Extraneous default pages removed | The IIS looks for Default.aspx, Default.asp, Index.htm, etc. Delete all but the one actually being used (may not be any of these). Note |
|---|
|
Some web sites don’t publish a default web page for security reasons. |
|
| | | Web Connection Limit set appropriately | Base this on the size of the expected load. Excess connections waste resources and are a potential security hole. |
| | | Process startup and shutdown time limits set appropriately to application type | The default shutdown timeout after which an ASP worker process is forcibly terminated is very low. Ensure that worker processes servicing long-running requests (e.g., database transactions) have an adequate shutdown time. |
| | | Number of ASP Worker Processes set to at least two | This provides redundancy in case a worker process crashes (the site/service doesn’t go down). |
| | | Application Pool Queue-Length Limit set appropriately | Set this to a higher value if 503 errors are returned for web page/service requests. |
| | | Keepalives Enabled set to TRUE | Discard dead connections quickly; this also helps mitigate DOS attacks. |
| | | ASP.NET version set to current .NET framework version | Make sure the ASP.NET version is set to 2.x for BizTalk Server 2006, and appropriately for all other services and web pages. |
Example - Script to print current state of important IIS performance parameters
@ECHO OFF
ECHO.
SETLOCAL
IF "%1"=="" GOTO :DefaultPath
SET INetPubPath=%1%
:DefaultPath
SET INetPubPath=%SYSTEMDRIVE%
:CheckPath
IF EXIST "%INetPubPath%\inetpub\adminscripts\ADSUTIL.VBS" GOTO :GetSiteList
ECHO The specified inetpub path is invalid or the ADSUtil.vbs Utility script was not found in the adminscripts subdirectory
ECHO.
GOTO :Done
:GetSiteList
REM IF "%2"==""
ECHO Configured Web Sites
ECHO -----------------------------
CSCRIPT //NoLogo "%INetPubPath%\inetpub\adminscripts\ADSUTIL.vbs" ENUM W3SVC /P
ECHO.
REM CALL :PrintParameter ASPThreadGateEnabled
REM CALL :PrintParameter ASPThreadGateLow
CALL :PrintParameter ASPProcessorThreadMax
CALL :PrintParameter ASPScriptEngineCacheMax
CALL :PrintParameter ASPScriptFileCacheSize
CALL :PrintParameter MaxEndpointConnections
CALL :PrintParameter ServerListenBacklog
CALL :PrintParameter ServerSize
CALL :PrintParameter Filters/Compression/gzip/HcDoDynamicCompression
CALL :PrintParameter Filters/Compression/gzip/HcDoStaticCompression
CALL :PrintParameter AppAllowDebugging
CALL :PrintParameter ASPBufferingOn
CALL :PrintParameter ASPQueueConnectionTestTime
CALL :PrintParameter ASPRequestQueueMax
CALL :PrintParameter ASPDiskTemplateCacheDirectory
CALL :PrintParameter ASPSessionTimeout
CALL :PrintParameter ASPSessionMax
CALL :PrintParameter ASPTrackThreadingModel
CALL :PrintParameter CacheISAPI
:Done
ENDLOCAL
GOTO :EOF
:PrintParameter
ECHO %1%
ECHO -----------------------------
ECHO W3SVC 1> ASPParmTest.tmp
CSCRIPT //NoLogo "%INetPubPath%\inetpub\adminscripts\ADSUTIL.vbs" GET W3SVC\%1% 1>>ASPParmTest.tmp 2>NUL
IF %ERRORLEVEL%==0 TYPE ASPParmTest.tmp
ECHO. 1>ASPParmTest.tmp
ECHO W3SVC/1/ROOT 1>> ASPParmTest.tmp
CSCRIPT //NoLogo "%INetPubPath%\inetpub\adminscripts\ADSUTIL.vbs" GET W3SVC\1\ROOT\%1% 1>>ASPParmTest.tmp 2>NUL
IF %ERRORLEVEL%==0 TYPE ASPParmTest.tmp
DEL /Q /S ASPParmTest.tmp 1>NUL 2>NUL
@ECHO.
GOTO :EOF
Network Infrastructure Tuning (15-35% Performance Improvement)
When asked to provide high throughput networking for BizTalk Server systems, these are some of the common responses of network engineers:
-
“A 1Gb network should be enough bandwidth to handle any load.”
-
“The 100Mb network isn’t close to saturated, so a 1Gb network is unnecessary.”
Both of the above statements are true enough on their face, but answer the wrong (and hopefully unasked) questions. For example, while it’s true that 100Mb networks are rarely saturated, what’s required by BizTalk Server is speed of communications. Imagine two cars on a trip. One is traveling 10 mph and the other 100mph. Even though the road isn’t “saturated”, the second car is going to arrive at its destination 10 times earlier than the first.
Now consider the same two cars, this time driving at the same speed, but on a single lane highway. No matter how fast they go, the second car is always going to arrive at least one car-length behind the first. Add more cars, and the per-car arrival delay continues to increase. But take the same set of cars and put them on a TWO lane road, and the per-car delays are cut in half. On a three lane road, the delay is one-third that of the single lane road. And so forth. No matter how fast the speed limit, the single lane road is a serialization mechanism.
Similarly, a single network connection, no matter how fast, bit-wise serializes network traffic. Even though the bandwidth is sufficient to handle the total quantity of traffic, the per-packet delay increases significantly as the number of packets increases.
High network performance for Windows-based systems requires three things, buffering (preventing network congestion), offloading (removing the processing from the host CPUs) and parallelism (subnets). BizTalk Server, perhaps more than any other Microsoft product, is an abuser of networks because of its highly parallel nature and horizontal scalability. The following sections discuss in detail how to mitigate the network bottlenecks that can severely hamper BizTalk Server performance.
Network Device Settings
Windows Server has built-in network packet offloading, and top-end NICs (Network Interface Cards) include drivers that not only use this capability, but provide many other high-performance adjustments. Unfortunately, many if not all of these features are turned off by default because they consume memory from the paged or non-paged pools of the Windows kernel memory. Some of the registry settings from the previous section allow Windows to allocate kernel memory appropriately and hence enable use of these features to enhance network performance (again, the advantage of bottom-up tuning). What’s more, this performance enhancement is free, save the time to turn the features on and reboot the server. On highly loaded servers, it is not uncommon to see 20-50% network performance improvement simply by enabling these features. The tests performed for this project (see Appendix A) showed results of approximately 25% improvement.
Note |
|---|
|
Adjust the network device drivers to maximize the amount of memory available for packet buffering, both incoming and outgoing. Also maximize buffer counts, especially transmit buffers/coalesce buffers. The default values for these parameters, and whether they are even provided, varies between manufacturers and driver versions.
The goal is to maximize the work done by the network interface card on-board hardware, and to allow the greatest possible buffer space for network operations to mitigate traffic bursts.
It is very important that the network speed, duplex, and flow control have fixed settings which correspond to the settings on the switch to which they are connected. This helps ensure the settings match and avoids periodic “auto-synchronization” that takes the connection off-line. |
| |
Difficulty Level
|
Setting
|
Comment
|
| | | Power Management Disabled | Do not allow the network card to be turned off to save power. |
| | | Fixed Speed/Duplex (do not use AUTO) | See Note preceding table. |
| | | Max Coalesce Buffers | See Note preceding table. |
| | | Max Transmit/Send Buffers | See Note preceding table. |
| | | Max Transmit/Send Descriptors | See Note preceding table. |
| | | Max Receive Buffers | See Note preceding table. |
| | | Max Receive Descriptors | See Note preceding table. |
| | | All offload options ON | See Note preceding table. |
| | | Wake On LAN disabled (unless being used) | See Note preceding table. |
| | | Flow Control Explicitly Set (TX and RX, not AUTO) | See Note preceding table. |
Subnets
Subnetting is perhaps the least used yet most effective performance enhancement for a BizTalk Server system. Typical BizTalk Server deployments often include only one subnet for the entire system. Consider that this single subnet must carry the inbound message traffic, the traffic from BizTalk Server to SQL Server to store each message in the message box, the traffic for BizTalk Server to retrieve the message from the message box for orchestration or send operations, the traffic between the message box and the tracking database (in a multi-server environment), the traffic between the tracking database and the BAM (Business Activity Monitoring) database(s), etc. Not to mention the administration traffic, remote desktop traffic, backup traffic, download traffic (anti-virus definitions, etc.). Remembering that networks are effectively bitwise serialization mechanisms, small wonder that additions of subnets and maximizing subnet speeds can result in orders of magnitude performance improvement.
A recent TAP (Technology Adoption Program) lab for a large financial service customer was having difficulty meeting its target SLA (Service Level Agreement). The lab had many BizTalk Server and SQL Servers, distributing the processing load as prescribed in the various white papers on the subject, yet the individual servers were significantly underutilized. Unfortunately, the configuration was using a single subnet for all of its traffic, and the processing servers simply could not receive and send enough messages to stay busy. By adding subnets, and with no other changes to the configuration, a 300% performance gain was achieved and the SLA was met.
Whenever possible, all BizTalk Server subnets should be 1Gb or higher nominal speed. If only 100Mb capability is available, increasing the number of subnets can compensate for the slower nominal speed. Please note that subnetting is definitely not “free”; there is a cost for network cards, cabling, and possibly switch or other network device space. Plan subnets carefully, and with the exception of the DATA subnet, only request them when expecting larger message traffic loads. The DATA subnet is recommended for all BizTalk Server systems of any significant size. Lab testing for this project shows a close parallel between message traffic into BizTalk Server and resulting BizTalk Server to SQL Server traffic. The DATA subnet, dedicated between BizTalk Server and SQL servers, separates this traffic and allows parallel processing and significant performance improvement. For more information, see Appendix A.
Isolating different types of message traffic on separate subnets has several advantages. Traffic types with high performance requirements can be allocated increased bandwidth using teaming (multiple physical connections acting as a single, load-balanced subnet). Often different traffic types have varying QoS (Quality of Service) and high-availability requirements, which can also be easily addressed with multiple subnets.
The following table lists a number of possible subnets for high-performance BizTalk Server systems in the recommended order of implementation. After the DATA subnet, the next priority subnet can change based on the load characteristic of the system, particularly with respect to the load between system components (e.g., BizTalk Server to a backend server vs. BizTalk Server to an external message source/sink).
| |
Difficulty Level
|
Subnet Name
|
Purpose
|
| | | DATA | Dedicated subnet between BizTalk Server and SQL Servers (1Gb recommended). |
| | | MGMT | Dedicated subnet to all servers to carry management traffic (Remote Desktop, downloads, etc.); this is separate from the remote management traffic carried by manufacturer-specific mechanisms such as HP iLO (Integrated Lights Out) or DELL Remote Management Services. |
| | | MSG | One or more dedicated subnets between BizTalk Server and various messaging servers/sources (MSMQ or WebsphereMQ, HL7, SWIFT, Web Services, etc.). |
| | | BACKEND | One or more dedicated subnets for communication with various “backend” servers; these may connect the backend servers to intermediate communications servers (e.g., WebsphereMQ) or may be direct connections into BizTalk Server. |
| | | BAM | Dedicated subnet to move traffic from BizTalk Server message and tracking databases to the BAM databases (multi-server configuration only). |
| | | PORTAL | Dedicated subnet to manage BAS, BAM, and other portal traffic. |
A final note regarding subnets. If at all possible every BizTalk Server-connected subnet should be extended to each BizTalk server in the configuration. This technique facilitates easy reconfiguration of the system to accommodate additional host instances for greater processing, availability, recovery or other needs without requiring physical reconfiguration.
SQL Server Tuning (40-100% Performance Improvement)
BizTalk Server possibly puts more stress on a SQL Server deployment than any other application ever created. The stress is both memory and I/O based, and is exacerbated by the massively parallel, horizontally scalable architecture of the BizTalk Server product.
Many of SQL Server’s features are designed for “self-tuning”, for example data files “autogrow” by default, memory is dynamically managed by default, the size of the lock table is set automatically by default, etc. BizTalk Server functions in this paradigm but its performance is far from optimal, particularly with multiple BizTalk Server computers.
Pre-allocation and parallelism are the two “Ps” of tuning SQL Server for BizTalk Server support. Pre-allocation means that as many resources as possible are pre-configured to their observed maximum size, including memory, files, locks, etc. Parallelism, especially I/O parallelism, is implemented by multi-threading databases and enhancing system artifacts to take advantage of the highly-parallel SMP servers.
The techniques listed below can be used on any high-performance SQL Server implementation, and should certainly be used for application databases that directly or indirectly support BizTalk Server processes. Because of its highly scalable nature, BizTalk Server commonly overruns the systems with which it interfaces; these systems should be optimized as much as possible to improve the performance of the entire system. The lab work done in support of this document did not include application databases on either local or remote systems because of the imprecision they introduce into the measurement process; in field deployments tuning applied to those databases often yields large performance improvements.
Many of the techniques listed are “free”, that is, they require no additional hardware or software to implement. The exception is the manipulation of database storage, which may require additional disk capacity and can take a significant amount of time to implement well, particularly for application (non-BizTalk Server) databases.
Use of these techniques improved performance in the tests for this project by just over 30%, but many field systems have demonstrated higher returns on investment in this type of tuning, some as much as 100%. A small number of very high-performance systems have required this tuning, especially lock table tuning, to function at all.
Memory Management
Pre-allocating SQL Server memory has many advantages. The overhead of dynamic management, particularly under high stress growth, is avoided. Pre-allocated memory can be significantly less fragmented than memory that grows in small increments. In situations where multiple SQL services are installed (e.g., the SQL Server database engine running alongside Analysis Services, Reporting Services and/or Notification Services) it is essential to limit the amount of memory taken by SQL Server. There are three types of memory allocations important to BizTalk Server with SQL Server systems, locks, cache, and the SQL Server process working set.
Locks
Setting the lock table (rather than allowing it to grow dynamically) often improves performance more than any other memory pre-allocation. BizTalk Server systems commonly generate 500,000 or more lock requests per second. The number of requests increases considerably as additional BizTalk servers are deployed in a system. The default number of locks for SQL Server is 5000, which is adequate and/or easily grown for many types of applications. But the path from 5000 to the allocation required to support BizTalk Server-generated loads is not a pretty one, particularly considering the intensity with which the lock requests are made. BizTalk Server systems often experience excessive lock waits and timeouts as the database engine struggles to keep up with the demand.
To avoid these issues, pre-allocate the lock table. This is not a risk-free strategy. SQL Server treats the pre-allocation as a fixed size rather than a base value from which to grow. As applications and load are added to the BizTalk Server system, the number of lock requests typically grows and the pre-allocation needs to be resized accordingly; this implies monitoring of the SQL Server lock performance counters (see Appendix A).
Lock pre-allocation must be done before determining the cache size configuration, as it reduces available memory. Also, there is a significant difference between allocation in SQL Server 2000 and SQL Server 2005. In SQL Server 2000, the number of locks specified (using either the Enterprise Manager GUI or the sp_configure stored procedure) is the total allocated. In SQL Server 2005, to increase parallelism in the lock table, locks are allocated by NUMA (Non-Uniform Memory Architecture) node. When upgrading from SQL Server 2000 to SQL Server 2005, there is no warning of this change in SSMS (SQL Server Management Studio), the SQL Server 2005 Upgrade Advisor or the SQL Server installer, so it’s very possible that parameters which worked perfectly well in SQL Server 2000 will cause SQL Server 2005 to fail to start. The specified lock count value is allocated for each NUMA node. For example, if 1,000,000 locks are specified on a system with 3 NUMA nodes, 3,000,000 locks are actually allocated. The number of NUMA nodes recognized by SQL Server is determined using a special query.
Example - Determining number of NUMA (Non-Uniform Memory Architecture) nodes on server for SQL Server 2005 lock allocation
SELECT @NUMACount = COUNT(DISTINCT parent_node_id)
FROM sys.dm_os_schedulers
WHERE parent_node_id <> 32 --32 is the DAC scheduler
Divide the number of locks desired by the number of NUMA nodes to determine the allocation value for sp_configure (or to be set using SSMS).
To determine the total number of locks required, leave the lock allocation at zero (the default) and monitor the total number of requests for a large load representative of the expected load. Use the SQL Server Locks object counters in PerfMon to obtain this count. A good rule of thumb for BizTalk Server is to set the allocation at 15-20% above the peak number of lock requests; this leaves room for growth without allocating an excessive number of locks.
Setting the lock allocation is an iterative process. The first preallocation often results in a significant performance improvement, which then increases the number of lock requests requiring recalibration and another, higher preallocation value. Also, the RECONFIGURE command, while normally applying changes to the SQL Server configuration immediately, does not work for changes to the number of locks (even though the command output indicates that reconfiguration was successfully completed). For the change to take effect, the SQL Server service must be restarted.
| |
Difficulty Level
|
Action
|
Purpose
|
| | | Determine NUMA Nodes (SQL Server 2005). | The total number of locks allocated is multiplied by the number of NUMA nodes present in the server. |
| | | Use PerfMon to determine the number of lock requests/second while running a representative load. | The number of requests is an easy way to determine the correct preallocation size. |
| | | Set the number of locks at 15-20% above the observed maximum number of lock requests/second. | This is a “rule-of-thumb” – increase the 15-20% buffer if significant growth is expected. |
| | | Monitor and recalibrate as necessary. | PerfMon alerts or other, similar mechanism should be considered. |
Cache
SQL Server performance depends heavily on adequate cache memory. Inadequate cache can cause excessive checkpoint operations as well as a significant increase in I/O operations as pages are read from disk rather than memory. As with the lock table allocation, the dynamic memory management of SQL Server is adequate for many types of applications but problematic given the heavy load imposed by BizTalk Server. Therefore, the SQL Server memory should be set to as large a value as possible while still accommodating other applications on the server (including anti-virus and other “system” applications) and the Windows operating system. In general, a minimum of 1GB of free memory should be left available on the server. If other applications are later installed on the server, the amount of memory reserved by SQL Server may require adjustment.
Be sure to set the SQL Server memory after establishing the number of locks to be allocated, restarting SQL Server, and examining the amount of memory left available for fixed allocation by SQL Server. It’s always better to undershoot rather than overshoot the allocation, that is, establish a target for the desired amount of memory, allocate an amount 1GB below the target and restart the service (take the SQL Server resource off-line in a clustered environment). Check the working set allocation in the SQL Server error log (see below). If successful, raise the allocation and repeat.
Working Set
SQL Server allocates its working set at startup from the memory available after allocation of the fixed cache and locks (see above), Windows operating system, and other started applications. It’s important to check the SQL Server error log to ensure the working set is successfully allocated. This is vital in SQL Server 2000 installations, and recommended (although not always possible) in SQL Server 2005 installations.
Note |
|---|
|
Be sure at least 1GB of free memory remains on the server after the working set is successfully allocated. |
| |
Difficulty Level
|
Action
|
Purpose
|
| | | Evaluate available memory. | Set SQL Server to use dynamic memory management and measure the memory available after SQL Server startup. |
| | | Set Cache. | Set SQL Server to use a fixed amount of memory that will leave at least 1GB available for other applications, etc. |
| | | Check Working Set successfully allocated in SQL Startup Log. | If the working set cannot be allocated, reduce the fixed memory size. Note |
|---|
|
For SQL Server 2005, it is not always possible for the service to allocate its entire desired working set. |
|
| | | Adjust cache accordingly. | Account for new applications and change the fixed memory size as appropriate. |
TEMPDB
TEMPDB is used by SQL Server 2000 to store intermediate result sets, arbitrate object allocation, and to contain temporary tables. In SQL Server 2005, it is also used for row versioning and the SNAPSHOT isolation levels, in other words, its workload is significantly increased. It therefore is very important to optimize the performance of this critical resource.
One of the most important functions of TEMPDB is its use to arbitrate object management. By default, TEMPDB contains a single file and all SQL Server 2000 object creation is serialized through that file (using the SGAM block). SQL Server 2005 memory management is enhanced over SQL Server 2000, but TEMPDB is still a bottleneck and requires optimization for memory management operations.
The key optimization for TEMPDB is also very easy to accomplish. By simply adding files to the database, one per processor or processor-core (but not hyperthreads), significant parallelism is introduced. In ultra-high performance environments, these additional files can be placed on separate physical drives (LUNs in clustered environments, local drives in non-clustered systems). However, a large benefit is achieved even if all of the files reside on the same drive. In SQL Server 2005 this technique also improves the performance of TEMPDB for its many added roles.
Another optimization, critical to performance of SQL Server 2000 but still important in SQL Server 2005 is to turn on the 1118 trace flag. This flag enables (off) or disables (on) mixed extent objects, that is, objects of different allocation sizes. When set, while there is an increase in the amount of memory consumed by SQL Server, the amount of memory management arbitration is greatly reduced.
| |
Difficulty Level
|
Action
|
Purpose
|
| | | Create additional TEMPDB files, one per processor or processor core (not including hyperthreading). | Minimizes SQL Server memory allocation management contention by allowing more than one processor to access the critical SGAM and PFS blocks simultaneously. |
| | | Turn on the T1118 trace flag (optional for SQL Server 2005, but recommended). | Disable mixed extent memory allocation mechanism, eliminating use of the TEMPDB PFS page for object allocation. |
Other Parameters
There are a few other SQL Server tuning parameters useful in the environment. If a system is going to generate complex reports or other ad-hoc queries, it’s useful to increase the query timeout from the default of 600 seconds. A minimum of 900 seconds is recommended, but the actual value will depend on the nature of the queries involved. Long running queries should generally not be hosted on the BizTalk Message Box or Tracking database server(s), but rather on a separate application server(s). The reason for this is two-fold. First, the BizTalk Message Box and Tracking database servers need to function in near real-time for maximum performance and guaranteed message delivery. Second, the MAXDOP (Maximum Degree of Parallelism) on any server running a BizTalk Message Box database is set to 1 and cannot be changed, thereby constraining the efficiency of reporting or other types of long-running queries.
Similarly, NT Fibers (“Lightweight threading” in SQL Server 2005) can be enabled for application databases, but must not be turned on for BizTalk Server databases. The exception is for application databases that are backed-up using the BizTalk Server backup process; these databases must have Fibers disabled as well.
Lastly, consider activating the SQL Server query governor for application databases subject to ad-hoc queries. This prevents errant or intentional high-resource-consumption queries from effectively disabling a server by “hogging” all of its resources. If there are periods when high-consumption queries are allowed, use a SQL Server Agent schedule job to enable and/or disable the query governor (or modify the cost threshold) as appropriate. And as one might expect, neither the query governor nor high-resource consumption queries should be executed on the BizTalk Message Box or Tracking database servers.
| |
Difficulty Level
|
Action
|
Purpose
|
| | | Increase the query timeout to 900 seconds or greater (depending on the system load). | Set this to a higher value to accommodate low-priority reporting queries; 900 seconds is a good baseline, but a higher value may be required on a case-by-case basis; this should not be required for servers servicing only BizTalk Server operational databases (Message Box, Tracking, and management) but may be required for BAM server(s). |
| | | Turn on NT Fibers (non-BizTalk Server DATABASE Servers Only). | This should only be done on application database servers, servers running reporting services, etc. as it is not supported for BizTalk Servers and will cause the BizTalk Backup Job to fail and the databases to become fragmented. Note |
|---|
|
BizTalk Server database servers include servers being backed up as part of the BizTalk Backup Job. |
|
| | | Turn on the SQL Server query governor. | Determine the cost of the most expensive query normally run on the system and set the query governor to prevent queries more expensive by 25-50% - this prevents expensive, ad hoc queries from “taking over” a critical database server. |
Database Tuning
Database tuning occurs on (at least) two levels, application (e.g., queries) and infrastructure. There are many excellent tools and papers describing the tuning of SQL Server and queries, etc. and that topic is not addressed here. Instead, the focus will be on how the underlying infrastructure, especially DISK infrastructure, can be optimized to provide a high-performance environment.
SQL Server thrives on two types of resources, I/O bandwidth and memory. Memory optimization has been addressed in the previous section. This section will focus on how to maximize performance by tuning the underlying storage infrastructure supporting the SQL Server instance.
In the early days of large scale databases high-availability storage was achieved using RAID 5. This format was appealing because disk drives were relatively expensive on a cost/MB basis, and it offered a means of achieving high-availability with a minimum drive count. Unfortunately, RAID 5 has a number of immutable performance limitations that make it ultimately unsuitable for a high-performance database storage infrastructure.
The biggest issue with RAID 5 is the so-called “write penalty”. This penalty is incurred because any change to an individual drive in the array must be reflected in the drive’s checksum, which is stored on another physical drive in the set. While smart controllers can somewhat mitigate this performance penalty, there is no way to eliminate it. A second issue with RAID 5 involves the process of recovering a drive after a failure. RAID 5 uses a process called rebuilding, where the contents of the drive are reconstructed in a painstakingly slow manner using the checksum and other information stored on the other drives in the set. This process can literally take hours or longer, during which the performance of the entire RAID can be severely degraded (by up to 80% of nominal performance). Finally, depending on how the RAID 5 array is configured, simultaneous failure of two drives (three if a hot-spare is configured) can result in the loss of all data stored on the array. For more details on the perils of using RAID 5 for database applications, see Battle Against Any Raid Five.
A better choice, and the Microsoft recommended best practice for both maximum performance and high-availability is RAID 1+0 (a stripe of mirror sets). The mirror sets provide high-availability, while striping the data across the sets yields high-performance. Normally two-drive mirror sets are sufficient; however, for ultra-high availability a third drive can be added. The probability of all three drives in a single mirror set failing simultaneously is vanishingly small. The very small penalty incurred to write to each drive of the mirror set is virtually eliminated by smart RAID controllers, and the read performance is actually improved over other RAID formats because the controller can choose from any of the drives in the mirror set depending on load and head position. In any case, SQL Server does its own write caching using CHECKPOINT operations and the Write After Log paradigm, so optimized read performance for random disk access is the most important characteristic of a database storage subsystem. RAID 1+0 requires minimal overhead for drive recovery. This process, called “re-silvering”, is orders of magnitude faster than the RAID 5 rebuild because the new drive is merely a copy of the old, an operation executed in hardware by the smart array controller. The performance impact on the remainder of the array during recovery is negligible.
One of the arguments made against RAID 1+0 is that it requires more drives than RAID 5 to achieve the same amount of storage capacity. This is true. But RAID 1+0 can also realize a 30-50% performance improvement over RAID 5 depending on the characteristic of the load, and the additional cost for the hardware is insignificant provided there is sufficient SAN director and cabinet space available. As a compromise, some installations use a hybrid model where slower, infrequently accessed databases (e.g., the SSO database when it’s not used for extensive trading partner credential management) can be placed on the relatively cheaper RAID 5 configuration, while databases with higher performance requirements (e.g., Message Boxes) are deployed to a RAID 1+0 configuration.
Another critical performance enhancement to the storage infrastructure is the use of STORPort rather than SCSIPort drivers. The default for Windows Server 2003, where provided by the vendor, is to install STORPort drivers for all disk devices. However, some organizations have yet to approve this technology and prefer instead to use the legacy equivalent SCSIPort drivers when available. This is very detrimental to performance for two reasons. First, SCSIPort technology is limited to 256 simultaneous I/O operations. While 256 sounds like a large number, in a BizTalk Server system that may be processing hundreds or even thousands of messages per second it is woefully inadequate. Second, SCSIPort drivers allocate one I/O queue per HBA (Host Bus Adapter). This means that all SAN traffic, regardless of the source or number of LUNs allocated, is serialized through one I/O queue (or perhaps two if redundant HBAs are installed). It’s very difficult to fill the pipe of a 4Gb Fiber Channel fabric given these limitations. STORPort was developed by Microsoft to remove these bottlenecks. STORPort provisions an I/O queue for each LUN, and currently services up to 254 simultaneous requests per each. Therefore, the more LUNs employed in the I/O infrastructure, the higher the overall bandwidth and capacity.
Be aware that SAN operations are generally non-trivial. Only a knowledgeable administrator should attempt to create or manage a SAN configuration; it is easy to make mistakes that have negative performance and (perhaps) availability ramifications. The requisite training is generally not difficult, and is well worth the effort if not already available within the organization.
SAN Director Location
| |
Difficulty Level
|
Action
|
Purpose
|
| | | Multiple LUNs | Creates multiple I/O queues to improve utilization of the SAN fabric; when combined with multiple filegroups (which create multiple I/O threads in the database engine…see below) can yield very large performance improvements |
| | | Run the DiskPar utility on each LUN for proper partition alignment | Performance improvement of 20% or more may be realized by correctly aligning the LUN partition with the underlying disk cluster(s). |
Windows Location
| |
Difficulty Level
|
Action
|
Purpose
|
| | | MPIO enabled. | Multi-Path I/O provides both load balancing and multiplied bandwidth access to the SAN fabric (requires the STORPort driver). |
| | | DATA drives NTFS formatted at 16384. | SQL Server pages are 8192 bytes, so the default NTFS block size (4096) reads only ½ a page and effectively doubles the number of I/O operations. Also consider formatting the DATA drives in 64KB blocks, since SQL Server commonly does an eight page read-ahead to improve performance. |
| | | LOG drives NTFS formatted at 4096. | 4094 (4K) is the default Windows block size for a newly formatted drive. |
| | | SAN Drive Defragmented. | Any physical file in more than a few fragments is causing performance degradation, potentially very large degradation. |
| | | SAN Drive MFTs pre-allocated and defragmented. | Use a third-party tool to preallocate and (more importantly) defragment the MFT (Master File Table) of each. |
| | | STORPort Driver Configured for HBAs or RAID Controller (DAS). | The legacy Windows SCSIPort driver has a single queue per HBA (rather than a single queue per LUN as with the STORPort driver), and is limited to 256 simultaneous I/O operations. |
BizTalk Server Databases
There are two main techniques to increasing the performance of the BizTalk Server data store (which includes all of the various BizTalk Server databases), spread the component databases across multiple servers and/or add parallelism to the databases in place.
The first technique is familiar to anyone who has built a large, multi-server BizTalk Server installation. At a minimum, the primary message box should be placed on its own server. Preferably, the tracking database (DTA) should be placed on a separate server (or cluster in a high-availability environment) from the message box (this is very important for BizTalk Server 2004, less so for BizTalk Server 2006 unless Message Body tracking, Rule tracing, Debug tracing, Call Chain tracing, etc. is required). If there is a heavy BAM load, or if SSAS (SQL Server Analysis Services) is shared between BizTalk Server and other applications, SSAS should have its own server. Finally, all remaining databases, including the BizTalk Server management database, adapter databases (HL7 log, FIX log, etc.), TPM (Trading Partner Management, which should be separated from BizTalkMgmtDb if used extensively), SSO, etc. along with any application databases that have reasonably low usage can be provisioned on yet another server. In high availability environments, the servers mentioned above will be Active-Passive SQL Server 2005 Standard clusters (with the exception of the BAM server if high-performance analytics are required, in which case SQL Server Enterprise Edition is indicated). It is helpful to put the BAMPrimaryImport database on the BAM analysis server (if SSAS is installed on a separate server). A major benefit of using multiple servers, isolating the MessageBox(s) on its own server, is the MAXDOP (Maximum Degrees of Parallelism) restriction (see above) applies only to the MessageBox server.
| |
Difficulty Level
|
Action
|
Purpose
|
| | | Primary Message Box on a separate server/cluster. | This is the most important isolation for most systems. |
| | | Set MAXDOP to zero on non-MessageBox database servers. | MAXDOP = 1 is only required on the BizTalk Message Box database server(s), and must not be changed; all other servers hosting other BizTalk Server databases may return this value to 0 if set. |
| | | Tracking Database (DTA) on a separate server. | Less important for BizTalk Server 2006 than BizTalk Server 2004 unless Message Body Tracking, Rule execution tracing, etc. is enabled. |
| | | Management Databases, adapter databases, and low usage application databases on a separate server. | These can be grouped with the Tracking Database depending on the load (see above); they should generally not be grouped with the primary message box. |
| | | TPM (Trading Partner Management) data configured in a separate database from the BizTalk Server Management data (if used extensively). | As trading partner usage grows, the supporting database may need to be separated from the management (and possibly other) databases. |
| | | BAMPrimaryImport on BAM Database Server. | This should only be done on application database servers, servers running reporting services, etc. as it is not supported for BizTalk Servers and will cause the BizTalk Backup Job to fail and the databases to become fragmented. |
The second performance optimization is to spread the tables of the BizTalk Server databases across multiple filegroups. Each filegroup adds an I/O thread which, when coupled with a data file on a separate STORPort-driven SAN LUN (as described above) will greatly improve performance (gains from 100-1000% have been observed, depending on the load characteristics of the system). Not only is the table data moved into separate filegroups, but the LOB (TEXT/IMAGE) data and the non-clustered indexes as well. This maximizes parallelism (not within a single query on a Message Box Database server because BizTalk Server sets MAXDOP - Maximum Degrees of Parallelism – of the SQL Server instance to one) between queries and other database operations. Appendix B details a recommended mapping of BizTalk Server databases including filegroups, table assignments, log file locations, etc. Suggested starting sizes for the files in each group are also provided; it is strongly recommended that AutoGrowth be left on to accommodate different mixes of final sizes and extensions as necessary to support the characteristic of a particular system. There are a number of key concepts expressed in the appendix, including the priority with which the BizTalk Server databases should be distributed (especially in situations where the number of SAN LUNs and/or servers is limited). When using this guidance, it is important to understand the ramifications of moving table artifacts, and to consider carefully how separate data files can be shared when LUNs are scarce. Some performance increase can be achieved simply by adding multiple files on a single device to the same filegroup; SQL Server will stripe the data across the files and can achieve parallelism that way as well.
There is one other database optimization that can, depending on the type of load being processed, improve performance by up to 30%. Enabling the TEXT IN ROW option of two key message box (BizTalkMsgBoxDb) tables, Parts and Spool, allows small messages (by default total message payload of 256 bytes or less) to be stored directly in the row data (rather in a separate LOB table – or data file). In-row data does not require a separate I/O operation for retrieval. Unfortunately, testing demonstrates this is only effective for small messages. Although it is possible to set the option for sizes larger than 512 bytes, performance in testing dropped off significantly above this value because as the data rows become larger fewer of them can be stored per page of cache and/or disk, which increases the number of I/O operations overall. Appendix A has test results on the use of this option.
Note |
|---|
|
Performance in actual systems may vary based on their load characteristics. |
Example - Setting the text-in-row option for key tables in the message box
use BizTalkMsgBoxDb
sp_tableoption N'Parts', 'text in row', ‘512’
| |
Difficulty Level
|
Action
|
Comment
|
| | | Design a plan for allocating additional filegroups. | Use the information in Appendix B to determine which BizTalk Server databases will be optimized. |
| | | Create SAN LUNs to contain the files for the new filegroups. | Create required LUNs and optimize per the instructions above. |
| | | Create the filegroups, assigning the data files for each to the previously created LUNs. | A one-to-one relationship between data files and LUNs yields the best performance, but where LUN count is short sharing will still perform better than a single filegroup (the default). |
| | | Move the tables into the new filegroups. | The guidance in Appendix B has been tested on many systems, and should be the starting point for placement of BizTalk Server tables in filegroups. |
Application Databases
Application (non-BizTalk Server) databases can benefit from most of the above optimizations, as well. They should also be highly optimized using the ITW (Index Tuning Wizard) in SQL Server 2000 or the DTA (Database Tuning Advisor) of SQL Server 2005. Remember that it is common for backend databases and/or the services they support to become the bottlenecks to a highly optimized BizTalk Server system. Don’t forget to tune their infrastructure, too. And avoid putting them on the same SQL Server instance as a BizTalk Message Box.