Developing Well Performing .NET Compact Framework Applications

.NET Compact Framework 1.0
 

Dan Fox, Jon Box
Quilogy

Contributions by:
Jonathan Wells
Microsoft Corporation

Jamie De Guerre
Microsoft Corporation

December 2003

Applies to:
   Microsoft .NET Compact Framework 1.0
   Microsoft Visual Studio .NET 2003

Download sample.


Summary: Learn how to measure and tune the performance of your Microsoft .NET Compact Framework-based applications. (29 printed pages)

Contents
Introduction
Prerequisite #1 - Measuring Performance
Prerequisite #2 - Performance Statistics
   The Statistics
   Creating the Statistics
   The Report
   Sample Program
   Scenarios
      Number of Objects Allocated, Current Number of Objects Allocated
      Collections
      GC Latency Time
      Bytes Pitched, Number of Methods Pitched
      Native Bytes Jitted, Number of Methods Jitted
Prerequisite #3 - Performance Issues within the Framework
   Data Binding
      Methodology
      Results
      Observations
      Recommendations
   Reading XML with XmlDocument and XmlTextReader
      Methodology
      Results
      Observations
      Recommendations
   Reading Local Data with DataSets and XmlTextReaders
      Methodology
      Results
      Observations
      Recommendations
   Reading Remote Data with DataReaders and DataSets
      Methodology
      Results
      Observations
      Conclusions
   Reading Remote Data with Web Services and SqlClient
      Methodology
      Results
      Observations
      Recommendations
   Reading Files Locally
      Methodology
      Results
      Observations
      Recommendations
   A Form with Many Controls
      Methodology
      Results
      Observations
      Recommendations
   String Handling
      Methodology
      Results
      Observations
      Recommendations
Prerequisite #4 - Performance Considerations
   Principle of Constraint
   Number of Function Calls and Function Size
   Maximize Object Reuse
   Avoid manual Garbage Collection
   Delayed Initialization and Control Invoker
   Asynchronous Opportunities
Analyze the Application
   Identify the ill performing areas of the application
   Gather the Performance Statistics
   Measure the Performance of Questionable Areas
   Apply Performance Enhancements

Introduction

Why do we need performance?

In today's world, time moves faster and businesses demand true productivity from their employees. This rule applies to all workers and is universal to every industry. Therefore, users do not want to wait for their devices to calculate answers or retrieve data in order to display on the screen.

Consequently, the .NET Compact Framework was created so that .NET developers can easily create applications that allow the work force to take data out of the office and record data from the field to bring it back to office. When working with distributed applications and moving data around across systems and constrained devices, there are complexities that inherently exist - it comes with the territory. So, .NET Compact Framework developers have to understand the needs of the users, the resources of the device, how the device can work with backend systems, and how the .NET Compact Framework actually performs on the device.

In this paper, performance will be the focus.

This is not a new issue in the practice of automation. From the beginning days of computing, there is a constant battle between the physical resources of a machine, the performance of an application, and the level of development effort to create a desired outcome. Let's face it: applications in the Smart Device arena do not have unlimited resources. While you're reading this document, the growth of connectivity and a plethora of data sources in your company increase at an exponential pace.

For some of those who are reading this, you have already built your first .NET Compact Framework application. You might have even deployed it. But, you may be finding that there are some places in the application that are just plain slow. If so, are you hoping that your client will not notice it?

In this new game of mobile application development, the developer is creating an application that has a higher expectation because it's on a new type of platform. The CEO and CIO had lunch the other day and now this application is going to save the day. Users are anticipating applications running at the speeds of their desktop counterparts and they will point the finger at you if their expectations are not met. It's time to roll up the sleeves and figure out how to make the application faster. What are the steps to accomplish this?

To start, we need to learn some .NET Compact Framework performance fundamentals. Some of this might be new, and some of it will be principles brought over from the .NET Framework but placed into the context of the .NET Compact Framework. Nevertheless, we have to learn some prerequisites before we can even begin attacking any performance issues.

To do this, we will look at four areas, otherwise known as the prerequisites: measuring execution time, determining the Framework's level of effort, looking at performance issues within specific Framework classes, and finally a set of performance considerations to program with.

After this knowledge foundation is laid, the paper will look at analyzing the application. In this part, the prerequisites will be applied in a simple methodology. With this knowledge, the developer will have the beginning skill set to diagnose application performance.

Let's get started.

Prerequisite #1 - Measuring Performance

When looking at the performance of an application, one of the basic tasks is to measure the duration of a function, whether in the Framework or in an algorithm written by the developer. In some cases, it will also be required to focus on parts of an application that call many functions. Therefore, a developer should know how to determine these durations for later comparisons between designs.

There are numerous time measurement techniques that Windows developers have had access to, thanks to the Win32 API. The basic techniques record the start time of an event, record the end time of the event, and then determine the duration with some type of math operation.

One commonly used technique is to call GetTickCount, a Win32 API function that returns an integer that represents a period of time in milliseconds that has passed since the device was booted. The .NET Framework and the .NET Compact Framework provide Environment.TickCount which is based on the Win32 GetTickCount function. While simple to use, it does have some disadvantages which include its size and its accuracy. The size is an issue for devices that have been running for a long period of time because the number goes negative which doesn't bode well for logic that depends on the start time having a smaller number than the end time. Also, since the task at hand is measuring execution duration, the accuracy of GetTickCount and therefore Environment.TickCount is an issue since the variance can be up to 500 milliseconds which make comparisons inaccurate.

Another option for .NET developers is to implement a similar mechanism but using System.DateTime types to mark the start and end times. The DateTime structure includes the Subtract method which is perfect for calculating duration and returns a System.TimeSpan type. This structure has properties that can return the duration in several different time measurements including seconds and milliseconds. However, it also suffers from an accuracy issue, although not as badly as GetTickCount does within the framework.

The most accurate method is to use a High-Resolution timer. While implementation of timers is OEM specific, QueryPerformanceCounter is guaranteed to be the most accurate timer available. In the case where a High-Resolution timer is not provided by the OEM, the QueryPerformanceCounter function defaults to GetTickCount which is evidenced by a frequency value of 1000 specified by QueryPerformanceFrequency.

This mechanism is not provided in the framework and requires calling a couple of Win32 API functions, QueryPerformanceFrequency and QueryPerformanceCounter. Since this technique is processor dependant, the former method returns a value that represents the amount of counts per second, or in other words, the frequency. So when using a counter, a developer would call QueryPerformanceCounter at the beginning and end of the code, subtract the end from the start to get the processor dependant duration, and then divide by the frequency resulting in a duration in seconds. Since some measurements can be short, the code should allow for conversion to milliseconds.

Since QueryPerformanceCounter is one of most accurate time measurement mechanisms available to the Windows developer, this is the chosen route for measuring execution time for this whitepaper. Below are examples of a class called Timer located in the AtomicCF namespace. The class will include declarations for the two API methods, which in the case of Windows CE will be exported from Coredll.DLL. This class implements three methods: the constructor which retrieves the frequency via QueryPerformanceFrequency, and two methods for starting and stopping the time measurement. At the beginning of the code to be measured, the developer would call the Timer's Start method which stores the point in time returned from QueryPerformanceCounter. At the end of the code to be measured, the developer would call the Stop method which would return a long that is the amount of time in milliseconds since the Start method was called. The class, implemented in two of our favorite languages, is below.

[C#]
using System;
using System.Runtime.InteropServices;

namespace AtomicCF
{
  class Timer
  {
    [DllImport("coredll.dll")]
    extern static int QueryPerformanceCounter(ref long perfCounter);
        
    [DllImport("coredll.dll")]
    extern static int QueryPerformanceFrequency(ref long frequency);

    static private Int64 m_frequency;
    private Int64 m_start;

    // Static constructor to initialize frequency.
    static Timer()
    {
      if (QueryPerformanceFrequency(ref m_frequency) == 0) 
      {
        throw new ApplicationException();
      }
      // Convert to ms.
      m_frequency /= 1000;
    }

    public void Start()
    {
      if (QueryPerformanceCounter(ref m_start) == 0) 
      {
        throw new ApplicationException();
      }
    }

    public Int64 Stop() 
    {
      Int64 stop = 0;
      if (QueryPerformanceCounter(ref stop) == 0) 
      {
        throw new ApplicationException();
      }
      return (stop - m_start)/m_frequency;
    }
  }
}

[VB]
Imports System.Runtime.InteropServices
Namespace AtomicCF

Public Class Timer
    <DllImport("coredll.dll", EntryPoint:="QueryPerformanceCounter")> _
    Public Shared Function QueryPerformanceCounter(ByRef perfCounter As Long) As Integer
        End Function

     <DllImport("coredll.dll", EntryPoint:="QueryPerformanceFrequency")> _
     Public Shared Function QueryPerformanceFrequency(ByRef frequency As Long) As Integer
        End Function

        Private m_frequency As Int64
        Private m_start As Int64

        Public Sub New()
            If QueryPerformanceFrequency(m_frequency) = 0 Then
                Throw New ApplicationException
            End If
            'Convert to ms.
            m_frequency /= 1000
        End Sub

        Public Sub Start()
            If QueryPerformanceCounter(m_start) = 0 Then
                Throw New ApplicationException
            End If
        End Sub

        Public Function [Stop]() As Int64
            Dim lStop As Int64 = 0
            If QueryPerformanceCounter(lStop) = 0 Then
                Throw New ApplicationException
            End If
            Return (lStop - m_start) / m_frequency
        End Function
    End Class
End Namespace

After adding this class to a project, it is very easy to utilize it in an application. There are three steps: create a Timer reference and instantiate, call the Start method, and call the Stop method storing the long value.

[C#]
AtomicCF.Timer oTimer = new AtomicCF.Timer();
oTimer.Start();
DoSomething();
long lDur = oTimer.Stop();
MessageBox.Show("DoSomething executed in " + lDur + "ms");
[VB]
AtomicCF.Timer oTimer = new AtomicCF.Timer()
oTimer.Start()
DoSomething()
Dim lDur as Long = oTimer.Stop()
MessageBox.Show("DoSomething executed in " & lDur & "ms")

So, add this class to your tool chest because it is a handy function to have for measuring time in a very accurate manner. Later in this paper, when looking at performance issues in Framework classes, this class will be used repeatedly.

Prerequisite #2 - Performance Statistics

There will be times when an application runs slow and the code looks fine. On a machine with the Windows O/S, Performance Counters could be interrogated to provide an indication of the strain created by the application; however, this facility does not exist in the Windows CE world. In times like this, wouldn't it be great if the Framework could give us a clue?

Well, there is a little known gem provided by the .NET Compact Framework. It's a report that has a list of performance statistics generated during a managed program's execution. With this information, a developer can analyze the types of load placed on the Framework during the lifetime of one application, and then look at which parts of the program are generating bad performance.

The following topics are covered in this section: the statistics, how to generate the report, an example of how a .NET Compact Framework-based application can assist in this process, and in conclusion, some scenarios to look for.

The Statistics

Execution Engine Startup Time Bytes In Use After Full Collection
Total Program Run Time Time In Full Collection
Peak Bytes Allocated GC Number Of Application Induced Collections
Number Of Objects Allocated GC Latency Time
Bytes Allocated Bytes Jitted
Number Of Simple Collections Native Bytes Jitted
Bytes Collected By Simple Collection Number of Methods Jitted
Bytes In Use After Simple Collection Bytes Pitched
Time In Simple Collect Number of Methods Pitched
Number Of Compact Collections Number of Exceptions
Bytes Collected By Compact Collections Number of Calls
Bytes In Use After Compact Collection Number of Virtual Calls
Time In Compact Collect Number Of Virtual Call Cache Hits
Number Of Full Collections Number of PInvoke Calls
Bytes Collected By Full Collection Total Bytes In Use After Collection

Creating the Statistics

There are only a couple of steps in the process of creating the report. These include setting a registry value that enables the performance gathering and running the program in question. At this point, the report is ready to be viewed. Let's looks at the steps a little closer.

To start, the first step is to create and set the registry key on the device. The registry key is:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\.NETCompactFramework\PerfMonitor

Under this key, there should be a value named Counters, which should be of type DWORD (which you'll need to create the first time). To enable statistics gathering, set the value to 1. To disable the performance statistics, set the value to zero.

There are several ways to create the registry key and set the value. There are plenty of free Windows CE registry editors. There is even one included in the Microsoft Embedded Tools package which can remotely edit the registry with a familiar REGEDIT GUI interface. Another alternative is to write your own interface, but this will require P/Invoking the registry functions in the Win32 API. In the sample covered, a Registry class exists which serves to wrap the API calls like RegOpenKeyEx and RegSetValueEx.

The Report

After enabling the report with the Counters value in the registry, run a managed program, terminate the process (i.e. on the PPC, don't just close the program), and then the report will be generated. The generated file is a text file named mscoree.stat and will be located in the root directory on the device.

It's important not to startup any other managed programs while doing this or the developer may be looking at a report of some other program's performance. Another rule to remember is to disable the key while running the questionable program, so that the results will not be overwritten. One recommendation to combat this is to run a program similar to the sample covered below which provides a way to work with the registry value and view the report. More on this will be covered below.

The mscoree.stat file looks like:

counter                                          value           n       mean      min      max
Execution Engine Startup Time                   100797146        0        0        0        0
Total Program Run Time                         2084653982        0        0        0        0
Peak Bytes Allocated                               147366        0        0        0        0
Number Of Objects Allocated                           438        0        0        0        0
Bytes Allocated                                     21256      438       48        8      576
Number Of Simple Collections                            0        0        0        0        0
Bytes Collected By Simple Collection                    0        0        0        0        0
Bytes In Use After Simple Collection                    0        0        0        0        0
Time In Simple Collect                                  0        0        0        0        0
Number Of Compact Collections                           0        0        0        0        0
Bytes Collected By Compact Collections                  0        0        0        0        0
Bytes In Use After Compact Collection                   0        0        0        0        0
Time In Compact Collect                                 0        0        0        0        0
Number Of Full Collections                              1        0        0        0        0
Bytes Collected By Full Collection                   3452        1     3452     3452     3452
Bytes In Use After Full Collection                  17504        1    17504    17504    17504
Time In Full Collection                                19        1       19       19       19
GC Number Of Application Induced Collections            0        0        0        0        0
GC Latency Time                                        19        1       19       19       19
Bytes Jitted                                         8963      197       45        1      791
Native Bytes Jitted                                 49374      197      250       36     3135
Number of Methods Jitted                              197        0        0        0        0
Bytes Pitched                                       37291      161      231       36     3135
Number of Methods Pitched                             185        0        0        0        0
Number of Exceptions                                    0        0        0        0        0
Number of Calls                                      1205        0        0        0        0
Number of Virtual Calls                               300        0        0        0        0
Number Of Virtual Call Cache Hits                     264        0        0        0        0
Number of PInvoke Calls                               187        0        0        0        0
Total Bytes In Use After Collection                 94463        1    94463    94463    94463

The MSCOREE.STAT file is a text file. Each line is terminated with a Line Feed (LF) character, and spaces are used for column formatting. To view the file in Microsoft Excel®, open it as a delimited file. In step 1 of the "Text Import Wizard" select "Fixed Width" and press the "Finish" button.

Each statistic has a column for "Value", "N", "Mean", "Min", and "Max". Some statistics in the report will just be a measurement from a single event and therefore will only have a value in the "Value" column. Consider "Execution Engine Startup Time" in the above example.

Others statistics will have multiple occurrences tracked and therefore will have values in each of the columns. The "Value" column will be a summation of the occurrences. "N" indicates the number of occurrences of the named statistic. "Mean" is the average value of the occurrences. "Min" and "Max" are obviously the minimum and maximum values, respectively, of the values.

For example, look at the "Bytes Pitched" statistic. Based on the above report, there are 161 instances of this event. With the average of 231 bytes being pitched, it is easy to surmise that around 80 (50% of 161) of the occurrences pitched less than 231 bytes. In this case, the maximum value would be of interest and it would be beneficial to know what is causing this scenario. More information on this will be covered in the Scenarios section below.

Sample Program

PerfStatsCS is a C# sample that implements two functions: (1) enables the gathering of statistics by editing the counters registry value and (2) displays the statistic report. Upon startup, the program will gather the statistics from the MSCOREE.STAT file and display them on screen. Using this program, a developer can test another application repeatedly and view the results.

PerfStatsCS has a menu that reflects the two main functions of the program. The menuitem labeled "PertStats" has a checkbox which indicates the current value of the Counters registry value. Clicking the item toggles the registry value.

The second menuitem, "LoadReport" will load the report. This functionality is based on a function, LoadPerfStats, which uses a StreamReader instance to read each line in the MSCOREE.STAT file. The StreamReader class has a handy function, ReadLine, which reads text until a Line Feed, or Line Feed - Carriage Return combination is found. Other helper functions are called to extract the statistic's name and other columns. These are then displayed in a ListBox. The ListBox's SelectValueChanged event looks at which statistic was selected and then displays the details of the statistic.

Scenarios

Now that we can see all of these statistics, what do we do with them? There will be other documents that will cover this in greater detail. For now, let's look at a few that really affect performance. As you read this, you will see that these are all related to each other.

The performance counters are most useful for understanding the impact of Garbage Collection and Just-In-Time compilation on the application performance. One can often utilize statistics based on these issues to determine where application optimizations can occur.

Number of Objects Allocated, Current Number of Objects Allocated

Having a high value for this statistic is not necessarily bad. However, it can indicate a possible strain on the .NET Compact Framework if a lot of objects are alive at any one time. Furthermore, if a lot of objects are instantiated during the lifetime of an application, the .NET Compact Framework will spend a lot of time managing their memory and at some point the GC will have to remove them from memory. If the GC is disposing of a lot of objects, it will take more time to do this.

In general, tightly managing object allocation and memory usage is still an important technique with managed code development even though you have the Garbage Collector to help you with the task.

Collections

To get a better understanding of performance, you need a background in the .NET Compact Framework CLR's GC, JIT, and Memory Pools management. Here's a short version.

There are three basic types of collections: Simple, Compact, and Full. The performance statistics for these collection types have a set of four statistics: Number of Collections, Bytes Collected by Collection, Bytes in Use after Collection, and Time in Collection.

The Simple Collection is the Mark & Sweep collection algorithm where memory not actively being pointed to is freed from memory.

Over time as memory is fragmented from allocations and simple collections, the memory needs to be compacted or defragmented so there is a need for a compact collection. This collection includes a simple mark and sweep, but additionally this is where existing data is moved into one contiguous block in order to free up the heap so that bigger chunks of memory will be available for future allocations.

If there comes a time when the application severely needs memory, then a full collection occurs. At this point, after a mark & sweep and a compact collection, all JITted code is released or pitched (see Number of Methods Pitched and Bytes Pitched below). This also occurs on a Pocket PC when an application is moved to the background. When the application is returned to the foreground, needed code will be re-JITted as needed.

In an ideal world, there will be no Full Collections. An application should have more simple and compact collections than full collections. The ratio will move towards full collections with memory pressure.

GC Latency Time

Representing the total milliseconds of application suspension due to garbage collection, this statistic is directly related to the "Number of Objects" statistic. The reason that the GC Latency Time statistic is of importance is that collections will take up processing time thus slowing down the application. If your application appears to hang for no reason and resume later, verify this statistic.

Bytes Pitched, Number of Methods Pitched

One of benefits of the .NET Compact Framework is code pitching which provides for the removal of compiled code when running out of memory. While this does help in allowing the program to continue, if the pitched code is needed again, it must be compiled again which slows the application.

If there is a lot of pitching, then it is likely that the application is allocating too much memory and possibly keeping objects alive unnecessarily. Examine the application and work to decrease the number of live objects. If this can't be done then it is possible the device is not sufficient for the application (i.e. add memory or check to see if all of the memory is being used up by storage or other applications).

Native Bytes Jitted, Number of Methods Jitted

This is the amount of code generated in response to the managed code calls in the application. These numbers can be surprising due to the fact that application code gets JITted and then calls framework classes which also need JITting. These numbers should point to areas in an application that should have their algorithm evaluated. In other words, check to see if there is a more efficient set of calls to perform the same task.

Prerequisite #3 - Performance Issues within the Framework

As mentioned in the introduction, the .NET Compact Framework provides multiple solutions to a programming task. In this section, we want to look at some of these areas and discover the best performing techniques. While it would be easy to just list the winning performers, there are many tradeoffs in each case beyond just the speed at which the task was completed. Additionally, a good developer should look at these findings and the analysis process in order to start an understanding of how to apply discovery techniques to other future development efforts and to tradeoff different implementation choices. Consequently, this topic will apply to any development environment.

These performance tests were executed on an IPAQ, Model 3640, running Pocket PC 2002, with 32MB of Memory. Therefore, your testing will vary, but the principles will apply. Moreover, proper testing requires that your application be performance tested on multiple devices to verify execution times.

Data Binding

Back in earlier VB days, binding was introduced to the Windows developer. Although it looked interesting because the developer didn't have to write a lot of code, applications that used binding suffered due to the slow performance of binding. Well, binding is back in again with .NET development and this includes the .NET Compact Framework. However, how does it perform?

Methodology

Bind-VS-Load is a .NET Compact Framework application that will look at different ways to get data into a control for display in the GUI. This project contains a class called Widget which has some basic properties. At the beginning of the program, two ArrayList objects are built. The first ArrayList, called SmallArray, contains one hundred elements that each contain a single string. The second ArrayList, called LargeArray, contains one hundred entries which contain an object instantiated from a class called LargeWidget. The LargeWidget class is made up of private members that include an integer, five strings, and a short. The class has properties for each private member and a ToString override.

The application has four command buttons to execute the tests. The tests represent the two different ways to load data into a ListBox with the two different ArrayLists. The tests that say "Bind" use the new Data Binding mechanism which will have very little code. The tests that have "Load" in their name implement the old fashioned way of enumerating an array and loading the items into the ListBox individually.

The code to bind an ArrayList is very simple.

listBox1.DisplayMember = "Name";
listBox1.DataSource = SmallArray;

Another point to note is that there are two "Bind Large" tests. The first does not set the DisplayMember, where the second does set this property.

The code to load the ArrayList directly into the ListBox requires only slightly more effort.

for (int pos=0; pos < SmallArray.Count; pos++)
    listBox1.Items.Add( SmallArray[pos] );

In order to see a realistic set of tests, each test is executed four times in a row. This is done in order to see how the times will fluctuate. For example, the first test will typically take longer due to the JIT compiler converting the MSIL to the platform specific code for the Pocket PC.

Results

See the results below which only include working with the control and not the creation of the data. What do you notice?

Test Name 1st Test 2nd Test 3rd Test 4th Test Average
Load Small 613 580 574 569 584
Load Large 660 612 614 596 620.5
Bind Small 1768 1864 1937 1945 1878.5
Bind Large(1) 1770 1944 1866 1896 1869
Bind Large(2) 4927 5388 5146 5469 5232.5

Observations

  • Loading directly into the ListBox is faster than binding to a data source.
  • Binding to a specific column (DisplayMember) is much slower than defaulting to the ToString method.
  • LoadSmall and LoadLarge take roughly the same amount of time. BindSmall and BindLarge(1) also take about the same period time.

Recommendations

For now, it might be better to avoid data binding in many scenarios. In this example, it was definitely very simple to do the binding logic manually. In other cases, such as binding a DataSet to a DataGrid, it may be more involved, however, so you will need to tradeoff that work with the performance hit as seen here.

Reading XML with XmlDocument and XmlTextReader

XML is a great invention. XML's value starts with the simplicity of added easy-to-read structure to the data in a way that does not take foreknowledge. This principle is the core of XML. And it is hard to find a developer today who does not use it in one form or another. As a matter of fact, most applications today have to read XML.

.NET provides the XmlDocument class that is extremely easy to use. But this simplicity is where the challenge begins. Since one half of the world is shipping XML data, the other half of the world is parsing XML data. While it is easy to parse the data, consuming large XML documents is a challenge in any environment because most XML parsers read the entire file or stream into memory in order to provide a complete tree of nodes. Most machines today can handle a 1K XML document, but what about a one terabyte XML document? This would put a strain on the majority of machines out there today, so another way to read the data is required so that the entire file does not have to be processed at once.

The industry answer is to go away from a tree model like the Document Object Model (DOM) and go to an event based mechanism. One popular implementation of the event model is SAX, which means "Simple API for XML". By treating the XML data as a stream and reacting to events like a starting tag of an element or attribute value, a large XML file can be processed without choking the machine. But this has its challenges as well due to being a "push" model, in other words, the parser pushes events to the application. This results in logic that is more complicated.

The .NET architects at Microsoft came to the rescue by creating the XmlReader abstract class. This is an event based mechanism that uses a "pull" model, which means the application pulls node information from the parser. For a good simple explanation, look at Comparing XmlReader to SAX Reader in the MSDN documentation. Nevertheless, this model is a better way to handle large XML documents.

The purpose of this comparison is to look at the factors that help a developer decide between the XmlDocument and the XmlTextReader, a concrete implementation of the XmlReader. This discussion will apply to the .NET Framework as well as the .NET Compact Framework.

Methodology

In the XmlDoc-vs-XmlTextReader application, an XmlDocument instance and an XmlTextReader instance will read XML files from the device. The test is setup so that we can see how performance is affected by the size of the XML file; therefore, files of 308 bytes, 30K, and 450K are utilized.

XmlTextReader

Dim reader As XmlTextReader
Dim t1Load, t1Read As Long
Dim t1 As AtomicCF.Timer = New AtomicCF.Timer
Dim strFile As String = GetFileName()

t1.Start()
reader = New XmlTextReader( _
    AtomicCF.MiscFunctions.GetAppPath() + "\" + strFile)
t1Load = t1.Stop()

t1.Start()
Do While reader.Read()
    ‘Do Nothing
Loop
t1Read = t1.Stop()

XmlDocument

Dim doc As New XmlDocument
Dim t1Load, t1Read As Long
Dim t1 As AtomicCF.Timer = New AtomicCF.Timer
Dim strFile As String = GetFileName()

t1.Start()
doc.Load(AtomicCF.MiscFunctions.GetAppPath() + "\" + strFile)
t1Load = t1.Stop()

t1.Start()
TraverseChildren(doc.DocumentElement)
t1Read = t1.Stop()

TraverseChildren

Private Sub TraverseChildren(ByVal parentNode As XmlNode)
    Dim nodePos As Integer
    For nodePos = 0 To parentNode.ChildNodes.Count - 1
        TraverseChildren(parentNode.ChildNodes(nodePos))
    Next
End Sub

Results

Test Load Enum Total
XmlTextReader(1) 14.125 23.375 37.5
XmlDocument(1) 49.875 2.625 52.5
XmlTextReader(2) 14.75 82.625 97.375
XmlDocument(2) 180.25 23.875 204.13
XmlTextReader(3) 23.25 678 701.25
XmlDocument(3) 1579.3 301.63 1880.9
XmlTextReader(4) 14.625 9542.8 9557.4
XmlDocument(4) 23145 14083 37228

Observations

  • Without surprise, the XmlTextReader class outperforms the XmlDocument class in every case.
  • The Load time is where the difference really shows up.
  • The Enum time is usually won by the XmlDocument until the fourth test. This is probably due to the large amount of RAM required to hold the tree.
  • As the XML Document size increases, the performance gap between the two classes increases as well. Consider that the XmlDocument instance required 1.5, 2.1, 2.8, and 3.9 times the duration of the XmlTextReader, respectively.

Recommendations

Unless the XML file is small (3k or less) or the activity can be hidden easily from the user, the XmlTextReader is generally the better solution. If the data needs to be searched multiple times, then the XmlDocument might be a better choice if the files are in the 30K range or less in size. But for large files, the XmlTextReader is the only logical choice.

Reading Local Data with DataSets and XmlTextReaders

Hopefully by now, most .NET developers have looked over the ADO.NET functionality and have become familiar with the class model. As you probably already know, the DataSet is a database-independent container of relational data and can be used to store changes to the data and later apply those changes to the data source.

One interesting principle of the .NET architecture is how the world of relational data is connected to the world of structured data. So, the DataSet has methods that work with XML including the ReadXml method which imports an XML file into memory or, conversely, easily stores data in a DataSet to an XML file.

Since the XmlTextReader was the better performer in the previous section, this section will look at how it compares to the DataSet when reading local XML data.

Methodology

DS-vs-XmlTextReader is a C# application to look at reading a XML file from the local device. In this exploration, the XmlTextReader is compared to the DataSet class in two basic ways: loading the data, and then reading the data.

To look at the effects under different file sizes, the tests are run against a small file (13 lines of text, containing 2 publishers, with a size of 381 bytes) and a larger file (991 lines of text, containing 196 publishers, with a size of 38K). Each publisher is represented by a Table Element (2 lines dedicated to start and ending tags) which has three embedded elements that include a PubId, PublisherName, and DateCreated (in other words 3 more lines of text). Therefore, each publisher element represents five rows of text in the files.

In the first set of tests, the scenario involves an XML file that is loaded and then every node is enumerated. The results in milliseconds are listed below.

XmlTextReader

XmlTextReader reader = null;
long t1Load, t1Read=0;
AtomicCF.Timer t1 = new AtomicCF.Timer();

string strFile;
if (radioButton1.Checked == true)
    strFile="publishers1.xml";
else
    strFile="publishers2.xml";

t1.Start();
reader = new XmlTextReader(AtomicCF.UtilityFunctions.GetAppPath() + 
        @"\" + strFile);
t1Load = t1.Stop();

t1.Start();
while (reader.Read()) 
{
    //do nothing
}
t1Read = t1.Stop();

DataSet

long t1Load, t1Read=0;
DataSet ds = new DataSet();
AtomicCF.Timer t1 = new AtomicCF.Timer();

string strFile;
if (radioButton1.Checked == true)
    strFile="publishers1.xml";
else
strFile="publishers2.xml";

t1.Start();
ds.ReadXml(AtomicCF.UtilityFunctions.GetAppPath() + 
      @"\" + strFile); 
t1Load = t1.Stop();

t1.Start();
ReadDsTables(ds);
t1Read = t1.Stop();

ReadDsTables

private void ReadDsTables(DataSet ds)
{
  DataTable dt;
  DataRow dr;
  int posTable, posRow, posColumn;
  object[] oArray;
  for(posTable=0; posTable < ds.Tables.Count; posTable++)
  {
    dt = ds.Tables[posTable];
    for(posRow=0; posRow < dt.Rows.Count; posRow++)
    {
      dr=dt.Rows[posRow];
      oArray = dr.ItemArray;
      for(posColumn=0;posColumn < oArray.Length; posColumn++)
      {
        // do something
      }
    }
  }
}

Results

Enum Load Enum Total
XmlTextReader, Small File 19.4 50.29 69.7
DataSet, Small File 104 1 105
XmlTextReader, Larger File 22.4 658.4 681
DataSet, Larger File 3968 54.43 4023

In the second test, the XML file is loaded and then a search for a specific publisher is done. The publisher exists deep into the list for the large file, but is the second of two publishers in the small publishers XML file.

Search for a node Load Search Total
XmlTextReader Small 13.7 36.29 50
DataSet Small 146 1.143 147
XmlTextReader Large 16.6 367.3 384
DataSet Large 3866 26.57 3893

Observations

  • The XmlTextReader is much quicker than the DataSet.LoadXml method. This is obviously due to the issue of a DataSet reading the entire file into an object structure.
  • The DataSet is much quicker at retrieving and searching the data due to the file being brought totally into RAM.
  • In all cases, the XmlTextReader performed the test faster (the total column is the sum of the first column and second column together).

Recommendations

To state a simple rule for which class to use is difficult. It comes down to three issues: data size, the need to update the data, and how often the data will be read and searched.

The primary issue is the size of the data. If the data is large, the XmlTextReader is a good solution. If the data is very large, it may be necessary to store the data elsewhere, such as on a database server, or in a local database such as SQL Server CE.

If the data needs to be updated, a DataSet is a good way to go due to its ability to cache the update changes and send them back to the database. Again, the issue is the size of the original data combined with how much data has to be stored until it is sent back. In a situation where the DataSet is persisted, the startup time, when the DataSet is loaded from a file, will be the penalizing factor.

So, even though the XmlTextReader performs faster for the load and enumeration tests, there are times when the application is better served with a DataSet.

Reading Remote Data with DataReaders and DataSets

In this section, the DataSet will be compared to the DataReader in terms of performance. Since the DataSet was described in the previous section, let's look at the DataReader class. The DataReader is a stream of data from a data source that has to be read like it is an assembly line, in other words, forward-only and read-only.

The .NET Compact Framework includes two data providers: (1) SqlClient, used for reading from remote SQL Server Databases, and (2) SqlClientCE, which gives a managed interface to SQL Server CE.

Methodology

In this comparison, the test uses the SqlClient provider which gives an application a way to access a SQL Server database on a network connection. Via 802.11b, the testing device will access the LAN where the database server resides.

The same query (strSQL) and connection string (strCN) will be used for both sets of code, as shown below. The query used is "select isbn from products" which results in a set of two-hundred seventy rows from a database used in our Atomic .NET course. Another twist in this comparison is the use of two timings. The first is the amount of time to get the data. In the case of the DataReader, this is the amount of time to get the first pieces of data. The second timing is the amount of time to read the entire result set. This is interesting due to the the DataSet retrieving all of the data in the first timing, and the DataReader receiving all of its data during the second timing. For more background on this, see Dan Fox's book Sams Teach Yourself ADO.NET in 21 Days (Sam's Publishing).

DataSet (VB.NET)

Dim ds As New DataSet
Dim cn As New SqlConnection(strCN)
Dim da As New SqlDataAdapter(strSQL, cn)
Dim dt As DataTable
Dim X As Integer
Dim oTimer As New AtomicCF.Timer
Dim lRetrieve, lRead As Long
Dim o As Object

oTimer.Start()
da.Fill(ds)
lRetrieve = oTimer.Stop

oTimer.Start()
dt = ds.Tables(0)
For X = 0 To dt.Rows.Count - 1
   o=dt.Rows(X)(0)
Next X
lRead = oTimer.Stop
cn.Close()

DataReader (VB.NET)

Dim dr as DataReader
Dim cn As New SqlConnection(strCN)
Dim cm As New SqlCommand(strSQL, cn)
Dim oT1 As New AtomicCF.Timer
Dim lRetrieve, lRead As Long
Dim o As Object

oT1.Start()
cn.Open()
dr = cm.ExecuteReader()
lRetrieve = oT1.Stop

oT1.Start()
Do While dr.Read
   o = dr.GetValue(0)
Loop
lRead = oT1.Stop

dr.Close()
cn.Close()

Results

These tests were run seven times and the averages are shown below.

Test Load Enum Total
DataReader 83.25 202.25 285.5
DataSet 487.63 53.5 541.13

Observations

  • The DataReader appears to load the data faster than the DataSet. However, this is due to the DataSet receiving the entire result set initially and the DataReader only receiving enough for the first record of the result set.
  • This also is the reason why the DataSet does so well in the second timing (the Enum column in the test table).
  • The DataReader out performs the DataSet in total time. The DataSet time was almost twice that of the DataReader.

Conclusions

In the section "Reading Local Data with DataSets and XmlTextReaders", an attempt was made to point out considerations when looking at the slower performing DataSet. The same considerations apply here. Therefore, the DataReader is good in a lot of scenarios but sometimes the DataSet is the better choice. The end decision will need to be based on a tradeoff between performance and ease of development and based on the actual scenario in which the application will be used.

Reading Remote Data with Web Services and SqlClient

In today's distributed applications, data can reside in a lot of different places but typically include database servers. Two challenges existed in the past for distributed applications, connectivity and interoperability.

The first issue, connectivity, deals with the network communication issues between the device and the data source, but that barrier has been removed with the maturing of the devices, networking peripherals, and the Internet itself.

The second challenge is the issue of the data residing on a non-friendly platform. While Internet protocols existed to address this, challenges still existed. Over time, more advanced solutions have been built in application specific and O/S based mechanisms like RPC and DCOM. Ultimately, these architectures either forced client machines to be of a similar platform as the server or the applications suffered from a lack of extensibility due to the effort in the areas of communication plumbing.

Consequent to these two distributed challenges, the latest evolution in the computing industry is Web services. Now there exists a way for different platforms to exchange data in a messaging mechanism that can be extended easily and is not stopped by firewalls.

With this addition to the distributed application toolbox, this section will look at the performance of Web services. A challenge in considering this performance area is choosing what else to compare it to. TCP/IP (Sockets) and HTTP could have been used, but that goes back to the second challenge which includes the lack of extensibility. (This could be argued if XML is applied to the messages.) Therefore, we will focus on using the SqlClient provider so that an easier parallel can be made, though the SQL Client does not solve many of the issues that Web services solve. When using SQLClient, the client application needs to have a direct connection to the server and must be inside the firewall.

Methodology

Consider a situation where an SQL Server database exists on the Internet and an application requires that data. As mentioned in the previous test, the SqlClient namespace provides the ability for a Smart Device to access data from SQL Server databases. Consequently, this testing will re-use the earlier test. This implies that the Atomic database is used and the same query will be used to pull back a DataSet of two-hundred and seventy ISBN's.

For the comparison, a .NET Compact Framework client will consume a Web service that executes the same exact code which fills (no pun intended) the DataSet using the same query. To keep the comparison fair, the Web service will reside on the same server as the database.

In order to improve the performance of the call to the Web service, the oService reference is executed in the form constructor.

[VB.NET]

Dim ds As New DataSet
Dim dt As DataTable
Dim x As Integer
Dim oTimer As New AtomicCF.Timer
Dim lRetrieve, lRead As Long
Dim o As Object

oTimer.Start()
ds = oService.GetDS()
lRetrieve = oTimer.Stop

oTimer.Start()
dt = ds.Tables(0)
For x = 0 To dt.Rows.Count - 1
    o = dt.Rows(x)(0)
Next x
lRead = oTimer.Stop

Results

Just like the previous test where DataSets were compared to DataReaders, this table is an average of seven tests.

Test Load Enum Total
SqlClient 487.63 53.5 541.13
WebService 4128.3 49.25 4177.5

Observations

  • The Load time indicates that the DataAdapter Fill method is a much better performer than the call to the Web service.
  • The Enum column indicates that iterating over the DataSet is basically equivalent in performance.
  • The Total time is reflective of the difference in the Load times.
  • There are several unmentioned factors in this test. These include the network bandwidth and throughput, the performance of the server, the performance of the database, and the layer of IIS and ASP.NET for the Web service.

Recommendations

Depending on where the server exists, this comparison might be unwarranted. If this was a data collection application where the device and the server are behind the firewall, SqlClient is the preferred way to go since the Web service takes over eight times the amount of time that SqlClient.DataAdapter takes. However, if the database exists behind a firewall, if the data is not stored in a SQL database or if non-Microsoft devices need to access the data, then Web services are the correct solution.

Reading Files Locally

In some scenarios, an application will need to read a text file stored on the device. The .NET Compact Framework provides several ways to resolve this requirement.

Methodology

In this experiment, two classes will be compared. The StreamReader and the FileStream class will be directed to read two files of 10K and 200K in their entirety from the application directory.

StreamReader (VB.NET)

sr = New StreamReader(strFileName)
Do
  line = sr.ReadLine()
Loop Until line Is Nothing
sr.Close()

FileStream (VB.NET)

Dim fs As FileStream
Dim temp As UTF8Encoding = New UTF8Encoding(True)
Dim b(1024) As Byte
fs = File.OpenRead(strFileName)
Do While fs.Read(b, 0, b.Length) > 0
    temp.GetString(b, 0, b.Length)
Loop
fs.Close()

Results

Test Average
FileStream (sm) 24.1
StreamReader (sm) 37.6
FileStream (lg) 476.7
StreamReader (lg) 606.8

Observations

  • FileStream is obviously faster in this test. It takes an additional 50% more time for StreamReader to read the small file. For the large file, it took an additional 27% of the time.
  • StreamReader is specifically looking for line breaks while FileStream does not. This will account for some of the extra time.

Recommendations

Depending on what the application needs to do with a section of data, there may be additional parsing that will require additional processing time. Consider a scenario where a file has columns of data and the rows are CR/LF delimited. The StreamReader would work down the line of text looking for the CR/LF, and then the application would do additional parsing looking for a specific location of data. (Did you think String.SubString comes without a price?)

On the other hand, the FileStream reads the data in chunks and a proactive developer could write a little more logic to use the stream to his benefit. If the needed data is in specific positions in the file, this is certainly the way to go as it keeps the memory usage down.

FileStream is the better mechanism for speed but will take more logic.

A Form with Many Controls

Visual Studio .NET 2003 provides a visual designer that allows a .NET developer to layout controls on the form very easily as opposed to having to manually layout controls from code. However, the forms designer does not necessarily have special regard for a device with limited resources. There may be times when an application needs to create a lot of controls on a form; consequently, there will be performance issues for this extreme case. To complicate this issue, nested control hierarchies (like with the usage of panels) will further slow down the start of the application.

There are some improvements to the default form designer code-spit that can be made to improve the performance of the application in these types of scenarios.

Methodology

As controls are placed on the designer, Visual Studio .NET is busily generating the code in the InitializeComponent procedure. A developer can find all of their activities performed on the designer and properties window converted to programming instructions by looking in the InitializeComponent section of the code.

BigForm is a .NET Compact Framework-based application written in Visual Basic It has two panels with 40 buttons each and another 20 buttons that exist directly on the form. No logic exists other than the instantiation of controls, connecting controls to their parent, and setting typical properties like the text on a button and its location. The number of controls comes to a grand total of 80 buttons, 2 panels, and a menu object. This somewhat extreme example takes some time to execute.

In the second part of the test, the InitializeComponent is modified to use the following principles:

  • Reduce the number of method and property calls on controls during startup. For example, Controls.Bounds is a better option than calls to Control.Location and Control.Size.
  • Create the form from the top down. In nested control hierarchies, set the parent property of containers (using the above rule) before adding controls to the container. As in the BigForm application, the panels had their parent property set to the form before the 40 controls were connected to the panel. If further containers exist lower in the hierarchy, the same changes should be applied.

Results

Test Average
Default 4421
Hand-optimized 1938

Observations

  • Clearly, the hand-optimized InitializeComponent runs quicker. Since only the second principle is applied, this is an important step.
  • This optimization is a benefit in extreme situations. In normal sized forms, the startup time will not be a 50% difference.

Recommendations

If the application has a large number of controls or has a nested hierarchy, then check this technique out. It's easy to apply the rules and the Visual Studio .NET designer does not choke on the changes.

This will not fix situations where the application is waiting for large amounts of data or waiting for responses from remote machines. See the Delayed Initialization and Control Invoker section below for more information on this angle.

String Handling

If you've been around in .NET land, you have no doubt discussed the String class and heard that the concatenation functionality is "sloooooow". In almost every conversation on strings, a knowledgeable .NET developer mentions the word "immutable". What is this and why is it not in my vocabulary?

Immutability or immutableness means that something has the characteristic of not being able to change. The implication in the context of the .NET String class is that internally the string is a write-once read-only mechanism. Fortunately, the class provides methods that change the string. At least, that is what we assume.

The String functions actually make a new copy of the sequence of Unicode characters to do different tasks, such as concatenation. This is the issue that causes slow performance.

To combat this, the .NET designers created another class for building strings called StringBuilder.

Methodology

In this test, there is a defined string constant with the value of "This is a test". The two tests start with an empty string and concatenate the constant 100 times.

Results

Test Duration
String 71.4
StringBuilder 3.5

Observations

It's not too often that you need a Unicode string of 1,400 characters, but it's apparent that the StringBuilder is a much better performer.

Recommendations

If the building of strings happens frequently in a .NET Compact Framework-based application, the creation of new copies of the expanded string not only fragments memory but also cause garbage collections which further impact performance. Therefore, use the StringBuilder class wherever possible.

Prerequisite #4 - Performance Considerations

In the previous section, we compared the performance of different options for completing similar tasks in the .NET Compact Framework. While these are specific cases to watch out for, there are some fundamentals that a .NET Compact Framework developer should follow. Working against these principles is a sure way to degrade performance; therefore, add these to your mental checklist of good programming practices.

Principle of Constraint

After my learning some of what goes on under the hood with memory management, .NET Compact Framework developers have to continually consider this issue: developing for a Windows CE device today is not the same as developing for a server. This is a simple statement, but put it on a plaque on your desk. Size does matter!

Number of Function Calls and Function Size

Minimize number of function calls, make chunkier functions

Maximize Object Reuse

In several areas discussed in the Performance Statistics report, the number of objects came up several times. Frequently creating objects will lead to performance degradation due to the frequent cleanup of objects and fragmentation of the Heap. So, if the application can keep an object around, this will help the issues around .NET Compact Framework memory management.

Avoid Manual Garbage Collection

Don't call GC.Collect. There is no better way to slow down the application. Most of us are not smarter than the framework about when to kick-off collections, so it is better to let the GC do its thing.

Delayed Initialization and Control Invoker

In most Windows applications, developers routinely put a lot of code in the Form_Load event to load the first screen. In the case of .NET Compact Framework applications, this may not be the best thing to do since users will get tired of waiting for an application when it appears to freeze the device.

A simple alternative is to use a Timer control with a small interval. In the Form_Load event, enable the Timer so that the application will get a Tick event. In the Tick event, disable the timer and call the screen initialization logic. This will allow the GUI to initially display to the user.

A second alternative is to use a Control Invoker. In this scheme, another thread does the loading of the data. The issue here is that forms are not thread safe; thus, this second thread which is busily loading data could have an issue when it comes time to put data into the controls owned by the form (and a different thread). "So just use the Form.Invoke!" Well, the .NET Compact Framework version of Invoke does not support the passing of parameters so it takes some extra effort to do this.

Some workarounds create an Invoker class which provides the background thread the ability to queue parameters and allows for the form on the correct thread to read the parameters. This is a nice mechanism because it occurs on a different thread, whereas the first alternative would still suspend the application.

Asynchronous Opportunities

As mentioned earlier, the developer has a greater responsibility when employing extra threads. But what if the work is done for you and you have to do is basically call the appropriate method?

The .NET Framework provides asynchronous methods that serve as complements to certain synchronous methods. Although not as widespread, the .NET Compact Framework implements a variety of asynchronous methods, mainly in the System.Net namespace. If your application often waits for networking functions, then this is an opportunity to free up that Form thread and place the data transmission on a different thread. And because these methods are built into the framework, the complexity is lessened.

Analyze the Application

We have learned the necessary prerequisites to get us started analyzing the application. So let's see if we can upgrade the performance. The steps are relatively simple but require methodical effort to ensure quality. But when executed well, success is ours.

Identify the ill performing areas of the application

There are a lot of ways to start analyzing a poor performing application. However, the easiest is to gather client feedback especially if you do not know the application yet. Second, spend some time to experience the application personally in order to understand the client's message. The end result of this step is to have a list of areas with unacceptable performance.

Gather the Performance Statistics

Use the mechanism described in the second prerequisite, Performance Statistics. Knowing this information before analyzing the code can help in narrowing the focus during the hunt for bad performing code. For example, if the application does not have run time errors but seems to freeze a lot, then the statistics report will support or remove Collections as one of the reasons.

Additionally, another set of clues can be gained by running previously identified poor performing areas while creating the statistics report. By isolating parts of the application, this technique can shed light into the possible issues of performance in the tested area.

This technique can also prove that a questioned part of the application is not a problem. In this case, look at parts of an application run before the blamed part of the application. It is possible that you are looking at an innocent victim procedure instead of the real culprit.

Measure the Performance of Questionable Areas

Although a simplistic idea, measuring the execution time of certain application functions is not just recording a number. The purpose of this step is to confirm that the code in question is causing the performance problem. After applying the mechanism in the first prerequisite, Measuring Performance, consider the following practices:

  • Isolate exactly what is being measured
  • Repeat tests several times and ignore the first time which is affected by JITting
  • Track the results in order for later comparisons and review
  • Ensure comparison of Apples to Apples
  • Use real code when possible
  • Test multiple designs and strategies - Understand the differences or variation
  • Refer to the list in Prerequisite #3, Performance Issues in the Framework, and look for framework usage that has a better performing alternative
  • Consider best practices – refer to the list in Prerequisite #4, Performance Considerations, looking for practices that cause bad performance

Apply Performance Enhancements

Once areas of an application have been identified and proven as performance culprits, then its time to "make a change for the better". Find a solution while following the previously discussed principles.

In the haste of creating a solution, the developer will undoubtedly attempt different techniques. Be creative! However, an important task is to compare to the execution times of each revision. Furthermore, a smart process should include the tracking of each attempt and a focus on small code changes.

Additionally, there should be enough detail to move back to previous tests which includes documenting the source changes. This is due to the possibility that new code is introduced into the problem solving process that slows down the area under scrutiny. Then, it will be very important to back up to the introduction of the bad code and this could go back multiple iterations. Therefore, be methodical!

After the final revision is implemented, it is wise to start over with the analysis process and confirm the performance gain with the entire application execution. Not only is it good to get the user's approval, be able to back it up with real performance numbers from your testing.

When all of the bad areas are improved with acceptable times, it's time to celebrate while waiting for that upcoming job promotion.

Show: