Export (0) Print
Expand All

Performance Comparison: Data Access Techniques

 

Priya Dhawan
Microsoft Developer Network

January 2002

Summary: Compares performance of data access techniques in common application scenarios. Applies to Microsoft .NET Framework Beta 2 and Microsoft SQL Server 2000. (23 printed pages)

Contents

Introduction
Test Scenarios
Test Tool and Strategy
Machine Configuration
Performance Test Results
Conclusion

Introduction

Architectural choices for data access affect performance, scalability, maintainability, and usability. This article focuses on the performance aspects of these choices by comparing relative performance of various data access techniques, including Microsoft® ADO.NET Command, DataReader, DataSet, and XML Reader in common application scenarios with a Microsoft SQL Server™ 2000 database. In these comparisons, a reference set of methods is executed against a reference set of Customer, Order, and OrderDetail data under a range of user load conditions.

Code examples showing the techniques compared here are available in a related set of articles discussing ADO.NET data access techniques. These examples cover access to single value, single row, multiple row, and hierarchical data using ADO.NET.

Test Scenarios

Performance of any data operation depends on a number of factors:

The construction of objects used for data access and the population of those objects impose various amounts of overhead. For example, there is more overhead associated with the instantiation and population of the ADO.NET DataSet objects than with instantiation of the ADO.NET DataReader or XMLReader objects.

Data access techniques vary in the load they place on the database. For example, the DataSet and DataReader use database connections differently while the application is reading the data. Data access techniques that use stored procedures reduce the load on the database as compared to approaches that use dynamic SQL expressions. Similarly, the transformation of relational data to and from XML uses server resources.

The number of round trips made to the database server is also a factor, especially where locks and transactions span multiple round trips. The amount of data sent over the network is another factor, with data represented as XML being larger than comparable data in other formats.

We compared the ADO.NET data access choices using operations that are common in business applications, such as getting a customer row, getting orders of a customer, and inserting an order. To make the test as realistic as possible, the database was loaded with over 100,000 rows of Customer accounts, 1 million rows of Orders (10 orders per Customer), and over 5 millions rows of Order details (5 details per Order). The data is in a SQL Server 2000 database and the SQL Server .NET data provider to connect to SQL Server. Some of the approaches compared here use the XML features of SQL Server 2000.

The methods described next were used to compare the data access techniques.

GetOrderStatus

The GetOrderStatus method accepts an OrderId and returns an integer representing the status of the order.

GetCustomer

The GetCustomer method accepts a CustomerId and returns a single row containing information about the customer.

GetCustomers

The GetCustomers method accepts a CustomerId and a parameter to specify the number of customer rows you want to read. The top n rows with CustomerId greater than the CustomerId passed to the Web Service method are read.

We performed tests with paging through a large set of customer rows with different page sizes: 100, 500, and 1000.

GetOrders

The GetOrders method retrieves a hierarchy of orders and their details from the database. This method accepts an OrderId and a parameter to specify the number of orders you want to read. The top n rows with OrderId greater than the OrderId passed to the Web Service method and all the corresponding details are read.

We performed tests by paging through a large set of orders and their details with different page sizes of 10 orders (and 50 details), 50 orders (and 250 details), and 100 orders (and 500 details).

InsertCustomer

The InsertCustomer method accepts customer data, inserts a customer row into the database, and returns a CustomerId as an integer.

InsertCustomers

The InsertCustomers method accepts data for multiple customers and inserts multiple customer rows into the database.

InsertOrder

The InsertOrder method accepts data representing an order with multiple details and makes the appropriate insertions into the Order and OrderDetails tables in the database. The tests were performed by inserting a single order header and varying number of details.

Test Tool and Strategy

For the purpose of our tests, we used Application Center Test (ACT), which is designed to stress test Web servers and analyze performance and scalability problems with Web applications, including ASP pages and the components they use. Refer to ACT documentation for details on how to create and run tests. Using Application Center Test to test data access techniques behind a Web server made sense because it provides many useful features for performance testing. It can simulate a large group of users by opening multiple connections to the server and rapidly sending HTTP requests. It also allows us to build realistic test scenarios where we can call the same method with a randomized set of parameter values. This is an important feature whereby users are not expected to call the same method with the same parameter values over and over again. The other useful feature is that Application Center Test records test results that provide the most important information about the performance of the Web application.

Although we would get better throughput and better response times if we were to test data access techniques directly, rather than behind a Web server, testing in a stateless environment is realistic for common application scenarios; moreover, because we are basically comparing relative performance of these data access techniques, the overhead of testing in a stateless environment (that is, behind a Web server) will be same in all the cases.

All the data access scenarios we discussed earlier were implemented as a .NET Framework assembly. To generate user load against the assembly using Application Center Test, we implemented wrapper .aspx pages, to which all the client requests were sent and in turn made calls to the assembly. The methods in the assembly implement a data operation using ADO.NET techniques. They are simple SUB procedures that do not return the data to the .aspx pages. After retrieving data rows from the database, the methods iterate through the rows and assign the column values to local variables. By introducing a delay while reading the data off the ADO.NET objects, we simulate the cost of doing something with the data.

The test scripts were written in Microsoft Visual Basic® Scripting Edition (VBScript). We randomized the requests to get a different Customer or Order, depending on the method being executed from within the test script. For example:

Dim URL
Dim UB, LB
' Set the upperbound for Orders list
UB = 1000000
' Set the lowerbound for Orders list
LB = 1
' Set the URL
URL  = "http://myServer/DataAccessPerf/DataReader.aspx"
' Use the Randomize funtion to initialize the Rnd function
Randomize
Test.SendRequest(URL & "?OrderId=" & int((UB – LB + 1)*Rnd + LB))

Machine Configuration

The following tables provide a brief summary of the test bed configuration used to perform the tests.

Table 1. Client Machine Configuration

# of ClientsMachine/CPU# of CPUsMemoryDiskSoftware
1Dell Precision WorkStation
530 MT
1694 MHz
1512 MB16.9 GB
  • Microsoft Windows® XP
  • Application Center Test

Table 2. Web Server Configuration

# of ServersMachine/CPU# of CPUsMemoryDiskSoftware
1Compaq Proliant 400 MHz 4640 MB50 GB
  • Windows 2000 Advanced Server SP 2
  • .NET Framework Beta 2

Table 3. Database Server Configuration

# of ServersMachine/CPU# of CPUsMemoryDiskSoftware
1American Megatrends Atlantis
800 MHz
21 GB28 GB
  • Windows 2000 Advance Server SP 2
  • SQL Server Enterprise Edition SP 2

Performance Test Results

Throughput and latency are the key performance indicators. For a given amount of data being returned, throughput is the number of client requests processed within a certain unit of time, typically within a second. Because peak throughput may occur at a response time that is unacceptable from a usability standpoint, we tracked latency, measured as response time using the report generated by Application Center Test for each of the test run, and capped the testing of a given method once the response time exceeded 1 second.

GetOrderStatus

Here we compare the behavior of various data access techniques when getting a single value from the database.

ms978388.bdadotnetarch031_01(en-us,MSDN.10).gif

Figure 1. GetOrderStatus: throughput and latency

Notes

  • All approaches use stored procedures.
  • In the ExecuteScalar approach, the single value is returned using the ExecuteScalar method of the command object.
  • In the Output Parameter approach, the single value is returned as an output parameter of the command object.
  • In the DataReader approach, the DataReader is used to retrieve the single value.
  • The XmlReader approach specifies a SQL query with a FOR XML clause to retrieve the single value as XML in the XmlReader.

As shown in Figure 1, performance of the ExecuteScalar, Output Parameter, and DataReader approaches is very similar across the entire range of user load when a single value is being retrieved.

The ExecuteScalar approach requires less code than the other approaches and may therefore be the best option from a code maintainability perspective.

The XMLReader approach produced lower peak throughput than the other approaches and the stored procedure containing a FOR XML query took longer to execute than the query used by the other approaches.

GetCustomer

Here we compare the behavior of the data access techniques when getting a single row from the database.

ms978388.bdadotnetarch031_02(en-us,MSDN.10).gif

Figure 2. GetCustomer: throughput and latency

Notes

  • All approaches use stored procedures.
  • In the Output Parameters approach the single row is returned as output parameters of the command object.
  • In the DataReader approach the DataReader is used to retrieve the single row.
  • The XmlReader approach uses a SQL query with a FOR XML clause to retrieve the single row as XML in the XmlReader.
  • The DataSet approach populates a DataSet with the single row.

As shown in Figure 2, the Output Parameters and DataReader approaches perform very similarly across the entire range of user load and produce better peak throughput than the other two approaches. The XmlReader approach produced slightly better peak throughput and smaller response time than the DataSet.

In the XmlReader approach the SQL query using FOR XML took longer to execute than similar queries for the other approaches.

Overhead associated with creation of the DataSet object is responsible for lower throughput in this scenario.

GetCustomers

In this section we compare the performance when fetching multiple rows. We performed tests with a result set of 100 rows, 500 rows, and 1000 rows to see how the amount of data returned impacts performance.

ms978388.bdadotnetarch031_03(en-us,MSDN.10).gif

Figure 3. GetCustomers (Customers=100): throughput and latency

Notes

  • All approaches use stored procedures.
  • In the DataReader approach the DataReader is used to retrieve the rows.
  • The XmlReader approach specifies a SQL query with a FOR XML clause to retrieve the rows as XML in the XmlReader.
  • The DataSet approach populates a DataSet with the rows.

As you would expect, getting more rows from the database will lower the requests per second, as now there are more rows that need to be processed and sent.

Figure 3 shows that the DataReader approach nearly doubles throughput over the other two approaches. The DataSet and XmlReader approaches offer similar performance, although the XmlReader maintains a slight lead in throughput over the DataSet.

ms978388.bdadotnetarch031_04(en-us,MSDN.10).gif

Figure 4. GetCustomers (Customers=500): throughput and latency

The requests per second achieved in this test for all the approaches further drops when we increased the number of rows returned by the database to 500.

At 500 customers the DataReader widens its lead in throughput over the other approaches with more than double the throughput. Although still similar in throughput, the DataSet approach has slightly overtaken the XmlReader approach.

ms978388.bdadotnetarch031_05(en-us,MSDN.10).gif

Figure 5. GetCustomers (Customers=1000): throughput and latency

With 1000 rows returned the DataReader now produces three times the throughput of the other approaches.

Testing for the DataSet and XmlReader approaches were capped at 50 users because their response time exceeded the threshold of 1 second, as can be seen in the Response Time vs. User Load graph.

GetOrders

In this section we compare the performances when fetching a hierarchical result set. We performed tests with a result set of 10 orders (and 50 details), 50 orders (and 250 details), 100 orders (and 500 details), and 200 orders (and 1000 details).

ms978388.bdadotnetarch031_06(en-us,MSDN.10).gif

Figure 6. GetOrders (Orders=10, Details=50): throughput and latency

Notes

  • All approaches use stored procedures.
  • In the DataReader approach the DataReader is populated by a stored procedure that returns two result sets—one containing the orders and one containing details. Because the DataReader is forward only, orders details were not associated with their orders as they were read from the DataReader.
  • The XmlReader approach specifies a SQL query with a FOR XML EXPLICIT clause to retrieve orders and their details as hierarchical XML. Orders and details were associated as they were read from the XmlReader.
  • The DataSet approach uses the same stored procedure as the DataReader approach to populate the DataSet. A parent/child relation is established between the orders and details DataTables and the orders and details were associated as they were read from the DataSet.

We started with a result set of 10 orders and 50 details. As shown in Figure 6, the DataReader reached its peak throughput at a load of 50 concurrent users with a slightly better throughput than the XmlReader, which levels off at 20 concurrent users.

The DataSet produced lower peak throughput with greater latency. In this case, it incurs an overhead of creating two DataTables to store order and detail rows and other sub-objects associated with the DataTables.

ms978388.bdadotnetarch031_07(en-us,MSDN.10).gif

Figure 7. GetOrders (Orders=50, Details=250): throughput and latency

As clearly shown in the graphs in Figure 7, the DataReader offers the best throughput, although, as just noted, the orders and details are not associated.

Although the XmlReader and DataSet approaches have similar throughput, the FOR XML EXPLICIT query used in the XmlReader approach causes significantly greater CPU utilization on the database server than the DataSet approach, which causes greater CPU utilization on the Web server.

ms978388.bdadotnetarch031_08(en-us,MSDN.10).gif

Figure 8. GetOrders (Orders=100, Details=500): throughput and latency

As shown in Figure 8, the DataReader approach widens its throughput difference to double that of the other two approaches. Remember, however, that the DataReader approach leaves more orders and details to be associated elsewhere in an application.

DataReader vs. DataSet

In all of the preceding tests, we saw that DataReader outperformed DataSet. As mentioned earlier, the DataReader offers better performance because it avoids the performance and memory overhead associated with the creation of the DataSet.

The DataReader is a better choice for applications that require optimized read-only and forward-only data access. The sooner you load the data off the DataReader, close the DataReader, and close the database connection, the better performance you get. Because the DataReader holds a database connection that cannot be used for any other purpose while the application reads data, it could limit scalability if the application holds the DataReader long enough for contention to occur. The DataSet only needs to hold a connection while it is being populated. Once it is filled, the connection may be closed and returned to the pool. Delaying reading the data from the DataSet will not cause contention to occur, as the connection might have been returned to the pool already.

In all of the preceding tests, the maximum connection pool size was 100 (default size), and because the number of simultaneous users never exceeded 100, there was no connection contention as such. We wanted to see how the DataReader and the DataSet would behave if we introduce a little delay while loading off the data, which in turn will introduce connection contention.

In the following test, we introduced a delay of 20 milliseconds (ms) while reading the data off these objects, and also set the maximum pool size to 10. This caused contention to occur.

ms978388.bdadotnetarch031_09(en-us,MSDN.10).gif

Figure 9. GetCustomers (with contention): throughput and latency

Notes

  • In the DataReader approach, the DataReader is used to retrieve hierarchical data containing orders and their details.
  • The DataSet approach populates a DataSet with orders and their details.

When the number of concurrent users reached 20, contention for database connection started to occur and, hence, the performance of the DataReader started degrading, as shown in Figure 9. Note that the DataSet outperformed the DataReader beyond 20 concurrent users. Also note that the response time of the DataReader exceeded that of the DataSet beyond 20 concurrent users.

In the next test we introduced a delay of 50 ms while reading the data off these objects, and set the maximum pool size to 12.

ms978388.bdadotnetarch031_10(en-us,MSDN.10).gif

Figure 10. GetCustomers (with contention): throughput and latency

Notes

  • In the DataReader approach, the DataReader is used to retrieve hierarchical data containing orders and their details.
  • The DataSet approach populates a DataSet with orders and their details.

As shown in Figure 10, the performance of the DataReader started degrading beyond 20 concurrent users when contention for database connection actually started.

Note that the response time of the DataReader exceeded that of the DataSet beyond 20 concurrent users.

InsertCustomer

After getting an idea how the data access techniques compare when reading data, we move on to compare them when writing data. Here we inserted a single customer in the database.

ms978388.bdadotnetarch031_11(en-us,MSDN.10).gif

Figure 11. InsertCustomer: throughput and latency

Notes

  • The Input Parameters and InsertCommand approaches use stored procedures.
  • In the Input Parameters approach the input parameters of the command object are populated with new customer's information and the ExecuteNonQuery method is executed to execute the stored procedure.
  • In the AutoGen approach, the Insert command of the DataAdapter is automatically generated based on its Select command when the Update method is called.
  • In the InsertCommand approach, the InsertCommand property of the DataAdapter is specified.

In the Autogen approach, when the Update method is called on the DataAdapter, the CommandBuilder object associated with the DataAdapter automatically generates the Insert command, depending on the Select command you specified for the DataAdapter. This explains why it's slower than the other two approaches. The benefit of this approach is that it simplifies and reduces code, as you don't need to specify your own insert, update, and delete commands. Compare the performance of this approach with the InsertCommand approach, where we explicitly specified the InsertCommand of the DataAdapter.

In the Input Parameters approach, the input parameters of the command object are populated with the new customer's information and the ExecuteNonQuery method is called to execute the SQL stored procedure. This approach outperforms the other two approaches, as is clearly shown in Figure 11. The InsertCommand approach incurs the performance and memory overhead of creating the DataSet while the Output Parameters approach avoids this overhead and achieves better performance.

InsertCustomers

We conducted this set of tests to see how inserting multiple rows of customers affects performance of the ADO.NET objects. We varied the number of customer rows to increase the amount of data being inserted.

ms978388.bdadotnetarch031_12(en-us,MSDN.10).gif

Figure 12. InsertCustomers(n=5): throughput and latency

Notes

  • The InsertCommand approach uses stored procedures.
  • In the AutoGen approach the Insert command of the DataAdapter is automatically generated based on its Select command when the Update method is called.
  • In the InsertCommand approach, the InsertCommand property of the DataAdapter is specified.

As expected, the InsertCommand approach—wherein we specified the InsertCommand property of the DataAdapter—is much faster than the Autogen approach. It offers smaller response time and better throughput across the entire set of users load.

ms978388.bdadotnetarch031_13(en-us,MSDN.10).gif

Figure 13. InsertCustomers(n=10): throughput and latency

As shown in Figure 13, the InsertCommand approach still produces greater throughput than the AutoGen approach when more data is inserted, but the difference between the approaches is reduced as the cost of the auto-generated command is amortized across more data.

InsertOrder

This last set of tests compares performance when inserting a hierarchy of an order and its details. We varied the number of detail rows to increase the amount of data being inserted.

Because our Orders table has an auto increment column (OrderId) as its primary key, it was not possible to insert hierarchical rows in the Orders (parent) and OrderDetails (child) tables using an automatically generated Insert command for the DataAdapter associated with the Orders table. This is primarily due to the problem with auto-generated inserts, wherein the primary key Id for an Identity column is not returned to the DataSet. Therefore, the Autogen approach is actually a hybrid, where the InsertCommand for the Orders table is a stored procedure (which returns OrderId) and the Insert command for the DataAdapter associated with the OrderDetails table is auto-generated.

In the OpenXml approach, the InsertCommand property of the DataAdapter associated with the Orders table is set to a stored procedure, which accepts an XML representation of Order and its details. After inserting rows in the Orders and OrderDetails tables of the DataSet, the XML representation of the data stored in the DataSet is extracted and passed to a stored procedure, which using the sp_xml_preparedocument system stored procedure and the OPENXML method makes appropriate insertions into the Orders and OrderDetails table in the database.

ms978388.bdadotnetarch031_14(en-us,MSDN.10).gif

Figure 14. InsertOrder(Order=1, Details=2): throughput and latency

  • The InsertCommand and OpenXml approaches use stored procedures.
  • The AutoGen approach is a hybrid using a stored procedure for the Order insert and an automatically generated Insert command for OrderDetails based on its Select command.
  • In the InsertCommand approach we specified InsertCommand properties of both of the DataAdapters.
  • In the OpenXml approach, XML representing the Order and OrderDetails is passed as NVarChar to a stored procedure, where it is parsed and the appropriate insertions are made.

As shown in Figure 14, the InsertCommand approach outperforms the hybrid AutoGen approach, which outperforms the OpenXml approach.

The overhead associated with automatically generating the Insert command at execution time and the efficiencies of stored procedures over dynamic SQL explains why it's slower than the InsertCommand approach.

Even though the OpenXml approach requires fewer round trips to the database than do the other approaches, this is not the most important factor affecting performance in this test, when only three insertions are performed.

ms978388.bdadotnetarch031_15(en-us,MSDN.10).gif

Figure 15. InsertOrder(Order=1, Details=10): throughput and latency

Next we tested with inserting 1 order and its 10 details. As Figure 15 shows, the relative performance of the OpenXml approach gains substantially over the other approaches as the cost of round trips becomes a significant factor. Here the single round trip used by the OpenXml approach replaces 11 round trips required by the other approaches.

ms978388.bdadotnetarch031_16(en-us,MSDN.10).gif

Figure 16. InsertOrder(Order=1, Details=100): throughput and latency

Note

  • In the OpenXml approach, XML representing the Order and OrderDetails is passed as NText to a stored procedure where it is parsed and the appropriate insertions are made.

And, finally, we tested inserting 1 order and 100 details. Here the single round trip used by the OpenXml approach replaces 101 round trips required by the other approaches. As Figure 16 shows, the throughput of the OpenXml approach is now far better than the other approaches. Also note that the AutoGen and InsertCommand throughput is almost the same in this test due to amortization of the auto-generation costs.

Conclusion

Performance and scalability are among the many things to consider when choosing data access techniques. As these comparisons have shown, throughput can often be multiplied by choosing one data access technique over another, yet no single approach performs better in all scenarios. Because overall performance is affected by so many factors, there is no substitute for performance testing using realistic scenarios.

Show:
© 2014 Microsoft