If you currently have production-load arriving into another system (perhaps a system you are about to upgrade or replace) you may be able to replay an actual day’s captured load or route/duplicate the day’s normal messages over to the testing environment.
However, this approach may not allow for the control and repeatability that is often desired in a performance lab. To obtain a more scientific approach, load-generation tools can give you the ability to produce predictable and repeatable load patterns, measure throughput and latency with accuracy, and still closely simulate actual production volumes with precision.
The following sections offer some suggestions for establishing such a testing regime and producing load.
Running Automated Tests
Internally, the BizTalk Performance and Stress teams use a homegrown tool called LoadGen for their testing. In order to encourage this type of testing technique and help customers and partners in the field conduct these test runs, this tool was released to the web as a free download. The following sections provide some useful information about setting up and using this test application.
LoadGen Setup
LoadGen can be downloaded from: http://www.microsoft.com/downloads/details.aspx?FamilyId=C2EE632B-41C2-42B4-B865-34077F483C9E&displaylang=en
Note |
|---|
|
This tool should be used in a test environment only, and should not be used against a production environment. This tool is provided "as-is" and is not supported. |
As indicated on the download page, LoadGen requires the following prerequisites, so make sure these are installed on the box before attempting the LoadGen installation:
.NET Framework 2.0
.NET Framework 2.0 Software Development Kit (SDK)
Another important point to bring up is use of this tool with MSMQ. This transport is supported, but the LoadGen installer does not auto-register the MSMQ COM components during installation since the MSMQ runtime service may not be installed on everyone's machine. To use the MSMQ transport with LoadGen, you’ll need to manually register the MSMQTransmitter.dll and ComMsmqMonitor.dll files located in the <InstallDirectory>/Bins folder, from a command line as shown:
> regsvr32 MSMQTransmitter.dll
> regsvr32 ComMsmqMonitor.dll
If you do not register the components, and intend to use MSMQ, you will receive the following runtime errors:
Cannot Load Transport DLL C:\Program Files\LoadGen\Bins\MSMQTransport.dll for Section MSMQRxQTxn. Exception has been thrown by the target of an invocation.
LoadGen need not be installed on the same box as a BizTalk Host instance. In fact, it is generally a better practice to install and run LoadGen on a separate and dedicated box to externalize its processing impact from BizTalk Server.
LoadGen is not supported on a 64-bit operating system, so make sure the LoadGen client is running on a 32-bit operating system.
LoadGen Basics
Once installed, the command-line application can be run from the <Install Directory>/Bins folder where you will find LoadGenConsole.exe. This application takes as input an XML configuration file which specifies the load profile to be created. The documentation is quite comprehensive in this area, so be sure to read it. There are also samples included which are worth a look.
Within this configuration file, there is the notion of "Sections" and it's not obvious, but these are run in parallel with each other. This allows you to create intersecting load patterns. Something to keep in mind is that generating load does not come for free. Be sure to keep a close handle on your system resources during these tests and determine if it's more appropriate to have another LoadGen instance running on another machine to achieve the desired load.
LoadGen supports a number of the native BizTalk adapters and has extensibility options to allow for creation of your own adapter harnesses. Samples for using the included adapters and for using custom transports are included in the LoadGen documentation.
In addition, if you need to dynamically change the content of each message sent, you can use the Message Creator feature (scoped within each section) to change each schema instance differently. This could commonly be used to generate unique message identifiers, etc.
LoadGen Tips and Tricks
The following are some additional tips and tricks that may help with your testing of BizTalk solutions with LoadGen:
-
As previously explained, LoadGen is generally better run on separate and dedicated boxes apart from BizTalk Server. This practice will externalize LoadGen’s processing burden and better simulate production load.
-
Be sure to include LoadGen machines in your performance monitoring (with PerfMon) as well. These boxes may also be susceptible to resource limitations.
-
The following formula may be useful when deciding on how to specify the LoadGen configuration:
Figure 13: LoadGen Formula
Achieving Simulated Testing
Part of creating a sustainable BizTalk solution may involve a number of "housekeeping" tasks normally run on the live production system. These may include your default Archiving and Purging scheme, Backup and Restore procedures, SQL Agent Jobs, production logging, etc. All of the routine processes will need to be in place in a production system, so if the goal of the testing is to mimic a real-world production system, these tasks should also be included as part of the tests.
Using a Test Run Checklist
Before conducting a test run, it is also a best practice to develop a setup checklist to ensure no stone is left unturned and all runs begin with consistent configurations. The following example checklist might be used to prime the environment before each run. The tasks are arranged by the "layers" of the logical architecture from LoadGen test harnesses at the top to senders at the bottom.
Run-time Checklist
Equally important is consistently monitoring the environment during the runs. The following might be a number of manual work items to perform while the load testing is in flight.
BizTalk Servers
-
Check machine event logs for processing errors.
-
Monitor the BizTalk group with the Admin MMC Console for failures.
-
Monitor related PerfMon performance counters
SQL Servers
-
Check machine event logs for processing errors.
-
Monitor related PerfMon performance counters
Monitoring strategies are elaborated in the "Monitoring and Reviewing the Results" section to follow.
Conducting Throughput Testing
Depending on your application’s performance requirements, either throughput-based or latency-based, your testing methods can change dramatically.
The documentation article What and Wayne Clark’s blog entry Understanding BizTalk Server Throughput and Capacity provide approaches to conducting successful throughput testing. The advice is to attempt to find the maximum sustainable throughput (MST) of the current configuration by slowly increasing the message volume until signs indicate that you have an unsustainable system. If properly followed, this plan also helps to identify the points of bottlenecking, and after eliminating each, the pattern is again repeated, finding the next MST value. Eventually, you will push the throughput higher and higher until you arrive at a point where the current topology cannot sufficiently handle more volume without making significant remediating changes.
LoadGen is a good tool for throughput-centric testing. When a run completes, it will output the sending rates and statistics about the documents sent.
Using PerfMon to monitor a destination queue’s incoming rates (e.g. ‘MSMQ Service:Incoming Messages/sec’) on the other end of BizTalk Server, as an example, may be sufficient to see if BizTalk Server is keeping up with the inbound rate sustainably. You might also have luck monitoring the ‘BizTalk:Messaging: Documents Processed/Sec’ counters on each of the BizTalk send host instances in comparison to the sending rates of the load generator. Be warned that receive rates published by BizTalk Server (‘BizTalk:Messaging:Documents Received/sec’) will not be equal to the sending rates of the load generators, as these numbers include time for receive-side processing such as adapter, pipeline, and map execution. While these may be interesting to watch for bottlenecking, they should not be confused with the inbound throughput rates.
However determined, watching the outbound rates in comparison to the inbound rates (from a BizTalk Server perspective) will give you some idea of how BizTalk Server is performing. If flow rates in equal flow rates out, then you have a sustainable system. However, if your outbound rates are less, then it may be an indication that changes are needed somewhere in this messaging tier. Watching for backups at front-end queues, time-outs, or unrecoverable growth of BizTalk Server database tables will also indicate that the load is too high for the current configuration.
When in an overdrive condition, the Spool Size (i.e. ‘BizTalk:MessageBox:GeneralCounters:Spool Size’) will grow as messages are placed on the spool pending processing. Note that the Spool Size reported is per Message Box and needs to be summed across all Message Boxes to determine if/when spool-based throttling will occur.
The host queue length size (i.e. ‘BizTalk:MessageBox:HostCounters:Host Queue – Length’) can also be used to give a more granular view of the number of messages being queued up internally, by showing the queue depth for an individual host. This counter can be useful in determining if a specific host is bottlenecked. Assuming unique hosts are used for each transport, this can be helpful in determining potential transport bottlenecks.
Going any further would be duplicating the volumes of guidance already present in the core product documentation and would be imprudent, so the Performance and Capacity Planning sections of the core product documentation are your next stops on the road to conducting successful throughput testing.
Conducting Latency Testing
Latency testing has a much different approach than throughput testing. In throughput-centric applications, requirements typically come in the form of "BizTalk Server must be able to process 12 million messages within a 24 hour period sustainably", although this is oversimplified. Latency testing, on the other hand, typically has requirements which resemble the sample given below. Therefore, measuring the inbound and outbound throughput rates is not sufficient to determine if application performance requirements are being met.
| Description | Performance Requirements |
| Protocol/s: HTTP Throughput Rate: 100 msgs/sec Avg. Message Size: 10K Scenario: Requests and responses are asynchronous. | Average Latency: < 300ms Required Roundtrip Times: -
90% of messages in less than 500 ms
-
95% of messages in less than 1 sec
-
99.8% of messages in less than 2 secs
-
100% of messages within 5 secs
Roundtrip times are measured from the time the request enters BizTalk Server to the time the response leaves BizTalk Server and is sent back to the client minus the time it spent in the back-end server. |
In simple synchronous request-response scenarios, having a load generation client which is capable of time-stamping the initiating message and then time-stamping the correlated response message would be required. However, using high-resolution time stamps is still critical. After the run, aggregating all of the results will determine if SLAs have been met for the entire message set.
In more complicated scenarios, where asynchronous requests begin at one server and responses are returned to another, more sophisticated methodologies will have to be adopted. Exact time synchronization between these different machines can be a concern. Message correlation can also be a challenge. And one might also wish to snapshot timestamps at different stages in the end-to-end processing to analyze bottlenecks or to isolate BizTalk Server from other components susceptible to slowdown.
Alaeddin Mohammed and Kevin Lam’s Performance Tuning for Low Latency Messaging white paper offers some strategies for conducting a lab with such rigorous requirements. The paper and the following diagram illustrate how times could be captured and calculated by clients, BizTalk Server pipeline components, and test harnesses, and carried along by the messages.
Figure 14: Possible Latency Measuring Approaches
Other companies may wish to explore options with the Document Tracking and Administration (DTA) or Business Activity Monitoring (BAM) features of BizTalk Server, especially if already using these components in the scenario.
Remember, with latency-mindful solutions, try especially hard to isolate BizTalk Server components in the times being measured. If roundtrip delays are introduced by outside components or resources, you may spend unnecessary and possibly unfruitful effort trying to optimize the BizTalk Server platform for lower numbers.
Alaeddin Mohammed and Kevin Lam’s Performance Tuning for Low Latency Messaging white paper, although developed for BizTalk Server 2004, still remains the definitive source on low latency guidance with regards to BizTalk Server. The Troubleshooting Message Box Latency Issues topic in the BizTalk Server 2006 core documentation is also a must-read.
For further information about optimizing solutions for low latency, be sure to pursue the resources listed in the "Additional Resources" section of this paper.
Some tests will be architected to run for only short durations. Still others will run overnight, for 24 hours, or even longer for stress purposes. Developing a monitoring strategy up front, one which is based on a strong foundation of accuracy, is important for the team to agree upon.
Performance Monitoring
In an attempt to be as scientific as possible, complete performance metrics should be logged and backed up for every run using Microsoft Performance Monitor (PerfMon), or an equivalent. This should include metrics for all BizTalk Server boxes, SQL Server boxes, and other machines that are part of your solution.
Logging
You might create performance logs sampling at something like 15-second intervals, or perhaps longer in proportion to the run duration. Ideally, dedicating one separate machine for performance logging for the group is probably a good idea.
For repetitive counters across numerous machines, one tip is to add all counters for one of the BizTalk Server boxes, save off the logging configuration file as an .HTM file, and then manually edit the file (say, in WordPad) to add additional BizTalk servers that are part of the group. Be careful—the format is finicky with some fields maintaining fixed widths (see the figure below). Be sure to also update the CounterCount field after making changes. Also be aware that certain hosts may not be present on certain boxes in your topology, so unless you are adding all counters on all hosts (*), this may be an additional configuration step.
Figure 15: Quickly Editing the PerfMon Configuration
Another tip is to use the same line thicknesses for similar machine types or counters. This will allow for a more readable console.
Live Monitoring
You should also consider using one live console for ad-hoc monitoring during runs. The "View Graph" and "View Report" features of PerfMon are especially useful for in-flight monitoring. Creating multiple consoles, each specializing in some class of monitoring and zeroing in on similar counters (e.g., CPU across all boxes) will also improve readability.
What to Track
The BizTalk Server 2006 Performance Counters topic explains in detail each of the counters exposed by the adapters, the messaging engine, the Message Box, and other components of BizTalk Server, including BAM, BRE, BAS, etc.
For insight into what counters to track specifically, consult the BizTalk Server Performance blog at http://blogs.msdn.com/biztalkperformance/ and, of course, the core documentation. Based on your type of scenario and testing, there are recommendations for particular metrics to be watchful of and values that should raise eyebrows.
Engine throttling is also something to be aware of. More sophisticated engine throttling behavior has been introduced in BizTalk Server 2006 which prevents more unrecoverable situations from occurring by closely monitoring many facets of the BizTalk Server runtime and adjusting processing accordingly. Easily configurable throttling parameters exist on the host level allowing for tuning at a fine level of granularity. The Host Throttling Performance Counters topic explains how to watch for the appearance of host throttling and the significance of these events.
Custom Performance Counters
During performance labs it can often be useful to create custom performance counters against your own databases or even gather additional BizTalk Server database metrics.
As an example, one may wish to regularly measure the depths of the BizTalk Server Parts or PartZeroSum tables as a finer measure of sustainability. To enable logged measurement of these table depths, a simple stored procedure can be written and called periodically by a SQL Agent Job (say, once a minute), as the example below illustrates.
CREATE PROCEDURE UpdateUserCounters
AS
SET NOCOUNT ON
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
SET DEADLOCK_PRIORITY LOW
DECLARE @Parts int
DECLARE @PartZeroSum int
SELECT @Parts = count(*) FROM Parts WITH (NOLOCK)
exec sp_user_counter1 @Parts
SELECT @PartZeroSum = count(*) FROM PartZeroSum WITH (NOLOCK)
exec sp_user_counter2 @PartZeroSum
GO
For more on custom SQL user counters see the SQL Server, User Settable Object article in the SQL Server Books Online.
While this may help in performance lab diagnostics, this approach is not generally recommended for use in a production environment.
Other Tips
Restarting host instances at the beginning of each run will ensure performance counters (and hence aggregations) are properly reset. Clearing the PerfMon display will also ensure cached data is discarded.
BizTalk Server Monitoring
Monitoring the health of your BizTalk servers during and after a run is also necessary. Errors may be logged in the Windows® Application event logs, messages may become suspended, etc. Given that you will need access to many machines and the fact that it is hard to keep track of them all, it is a good idea to properly plan for this. Below are a few suggestions for how to reduce the management burden and multi-machine challenges.
Single Management Console
In multi-machine topologies, management of the many machine statuses can quickly become unruly. Creating a single Microsoft Management Console (MMC) incorporating several plug-ins will make conducting a lab much easier.
For starters, open MMC.exe and add a snap-in for BizTalk Server Administration. This will allow you to monitor the entire BizTalk group and benefit from features like its Group Hub Page, bulk suspend/terminate/resume operations, and bulk host instance restarts.
Thereafter, add an Event Viewer plug-in for each and every BizTalk Server machine in the group. Adding Event Viewers for the SQL Server boxes and other machines in the topology may also be handy. Plug-ins for other applications, such as Internet Information Services (IIS) and Microsoft Message Queuing (MSMQ), might also be a good fit for this MMC.
Figure 16: A Truly One-Stop-Shop MMC
When done, save this console .MSC file to the desktop of a monitoring machine, perhaps the same machine used to perform the performance counter logging. This one-stop-shop console will allow you to remotely monitor all boxes from one location for application errors and identify aberrations.
Single Terminal Services Console
If you don’t plan on spending your lab weeks in the confines of a server room, then using Microsoft Terminal Services (TS) is a common way to remotely administer machines in the operating topology. The Terminal Services team smartly offers its own MMC plug-in, called "Remote Desktops" which allows users to add multiple connections under one snap-in. This makes it very easy to context-switch between machines without having to repetitively open up the TS client.
Figure 17: A Terminal Services MMC
You may even want to incorporate these plug-ins into your single management console mentioned in the previous section.
SQL Server Monitoring
If you are using SQL Server 2005, then monitoring has become greatly simplified. The introduction of SQL Server Management Studio, a graphical and integrated environment for accessing, configuring, managing, administering, and developing all components of SQL Server, can be close to a one-stop-shop for all things SQL Server. The console even allows you to connect to multiple SQL Server instances so that all of your administration for the entire environment can be hosted from one box.
SQL Server Management Studio has a great feature called the Summary Page, which exists for various types of database objects. For databases themselves, the Summary Page’s Reports feature is very useful. It can create disk utilization reports which show, down to the table level, the physical growth of your databases. This can be helpful in identifying bottlenecks in BizTalk Server or custom databases.
Figure 18: SQL Server 2005 Summary Reports
When problems do arise at the data tier, viewing the servers’ application event logs and viewing SQL Server Error Logs can help diagnose a problem.
SQL Server Profiler is another helpful tool that can help to trace the server’s function, even under high load. You can aggregate job duration times, monitor stored procedure executions, etc. Just be advised that this will have a performance impact on the SQL engine, so use this tool sparingly and only when needed for troubleshooting.
For optimizing SQL Server Performance, the SQL Server 2005 Books Online contain volumes of information on the subject, so be sure to have your DBAs read this information.
Other Resource Monitoring
Since your entire solution consists of more than just BizTalk Server and SQL Server, monitoring of other third-party systems, network resources, disk subsystems, and other components of the environment may also be required. Consult your product documentation for each of these to determine the best monitoring strategies.
Many Microsoft developers are well versed in functional testing, but often neglect thorough nonfunctional testing before running applications in production. Conducting a well-executed performance lab is a critical component of this nonfunctional testing and is intended to uncover any problems with your current design, identify hardware limitations, and suggest stabilizing improvements to your overall architecture.
The results of each successive performance run may support decisions to debug your current test cases, deployment, application code, or test scripts. These conclusions may also support decisions to change your topology, upgrade current hardware, scale up or out, modify platform settings (e.g. .NET CLR runtime properties, IIS application pool configurations, operating system configurations), adjust BizTalk Server defaults (e.g. throttling parameters, batch sizes, polling intervals), tune disk subsystems, optimize network resources, explore other hardware alternatives, or refactor code or rewrite code. After each of these changes, the execution cycle may be restarted, returning to conduct further testing, reviewing the results, and removing still more bottlenecks.
For expert guidance on how to monitor environmental resources, detect and remove bottlenecks, modify BizTalk Server settings, and tune the platform for your applications, use the resources in the "Additional Resources" section of this paper. The BizTalk Server product group has realized the importance of providing prescriptive guidance for optimizing your BizTalk applications, but this paper is just the start. The referenced resources will walk you through the “forensics and surgery” of the performance lab and help to ensure that you conclude with a healthy platform that will support your business for years to come.