Comparing XML Performance
.NET 2.0 Beta2, .NET 1.1, and Sun Java 1.5 Platforms
Introduction
XML Mark is an XML parsing benchmark originally created by Sun Microsystems to demonstrate the relative performance of Java XML parsing vs. .NET (C#) XML parsing. The original Sun XML Mark 1.0 benchmark kit can be downloaded from Sun’s Website at: http://java.sun.com/developer/codesamples/webservices.html#Performance.
Sun’s original accompanying whitepaper can be downloaded at: http://java.sun.com/performance/reference/whitepapers/XML_Test-1_0.pdf
With the release of .NET 2.0 Beta2 on MSDN, we used Sun’s XML Mark benchmark to compare the XML parsing performance of:
- Sun J2SE (JVM 1.5) with Sun’s latest Java Web Services Developer Pack (JWSDP) version 1.5
- .NET 1.1
- .NET 2.0 Beta 2 release (publicly available on MSDN)
Sample Code can be downloaded from:
http://www.microsoft.com/downloads/details.aspx?FamilyId=0267A5DD-D188-4295-A1B4-D94CD3541865&displaylang;=en
XML Mark 1.1 vs. XML Mark 1.0 Benchmark Kit
Sun concluded in their original tests (using JVM 1.4/JWSDP1.3 vs. .NET 1.1) that the Java platform significantly outperformed .NET in both the pull/streaming tests (SAX tests), and the XML build tests (DOM tests) included in the XML Mark kit. While Sun’s JVM 1.5 does outperform .NET 1.1 in the benchmark tests, the difference is exaggerated based on two significant issues that are now fixed in our XML Mark 1.1 implementation:
- The java code included in their kit has a bug that results in no serialization (save to disk) occurring in the java tests even when the run properties are set to serialize the XML test document, while the .NET implementation properly saves to disk when the serialization parameter is set.
- We also found that Sun is using the GetElementsByTagName method in C#, and this results in a mismatch of the C# and Java versions. Although both harnesses use XmlElement.GetElemensByTagName(), the C# implementation of this method keeps track of a live list of nodes which is negatively impacted by some of the DOM test scenarios that have edits. A better method for element selection in the C# harness is XmlElement.SelectNodes() which matches the Java harness in functionality.
XML Mark 1.1 fixes both of these issues in the benchmark kit which otherwise represents a good test harness for both Java and C# for XML parse performance. In this paper, we publish the results of testing .NET 1.1, .NET 2.0 Beta2 and Sun’s J2SE 1.5 platform with this new XML Mark 1.1 benchmark kit. The results show some very significant performance improvements in XML parsing between .NET 1.1 and .NET 2.0.
As with other .NET benchmarks, we are providing full source code, build commands and install/run instructions for both the Java and C# implementations, and it’s an easy test for any customer to quickly download, install and run on the hardware of their choosing.
XML Mark Benchmark Description
The following description of XML Mark is taken directly from Sun’s original whitepaper:
XML Test is an XML processing test developed at Sun Microsystems. It is designed to mimic the processing that takes place in the lifecycle of an XML document. Typically that involves the following steps:
- Parse- Scan through the XML document processing elements and attributes and possibly build an in-memory tree (for DOM parsers). Parsing is a pre-requisite for any processing of an XML document.
- Access- Extract the data from the elements and attributes of parts of the document into the application program. For example, given an XML document for an invoice, the application might want to retrieve the prices for each item in the invoice.
- Modify- Change the textual content of elements or attributes and possibly also the structure of the document by inserting or deleting elements. This does not apply to streaming parsers. As an example, an application might wish to update the prices of some of the items in an invoice or may want to insert or delete some items.
- Serialize- Convert the in-memory tree representation to a textual form that is written to a disk file or forwarded to a network stream. This makes sense only for tree building parsers and is necessary in the cases where the XML document has been modified in memory.
XML Test simulates a multi-threaded server program that processes multiple XML documents in parallel. This is very similar to an application server that deploys web services and concurrently processes a number of XML documents that arrive in client requests. Since we wanted to concentrate on XML processing performance, rather than use some sort of web container, we designed a standalone multi-threaded program implemented in both Java and C# that processed XML document files. To avoid the effect of file I/O, the documents are read from and written to memory streams. XML Test measures the throughput of a system processing XML documents. The notion of an XML transaction here corresponds to a complete lifecycle of an XML document. For tree building parsers this requires the four steps of parse, access, modify and serialize
while for streaming parsers it just involves parse and access. XML Test reports one metric: Throughput - Average number of XML transactions executed per second.
The XML documents used with XML Test are based on a business invoice document. Each invoice has a fixed length header and summary and a variable number of lineitems. The schematic of the invoice schema is given below in XML schema form.
<complexType name="InvoiceType">
<complexContent>
<sequence>
<element name="Header" type="InvoiceHeaderType"/>
<element name="LineItem" type="InvoiceLineItemType" minOccurs="1"
maxOccurs="unbounded"/>
<element name="Summary" type="InvoiceSummaryType"/>
</sequence>
</complexContent>
</complexType>
Though this schema can be used to generate an invoice document of almost any size, we made use of two particular document sizes in our experiments. Streaming parsers are typically used to process large XML documents, and so to compare SAX and the pull parser, we used an invoice with 1000 lineitems (about 900 KB). Tree-building DOM parsers have to operate under memory constraints since they construct a representation of the whole document in memory. Therefore for the DOM parsers we used a smaller invoice document containing 100 lineitems (about 90 KB).
Based on the above schema the lifecycle of an XML document as applicable to XMLTest can be defined as follows.
- Parse- Build a complete document tree in memory (in the case of DOM). In the case of streaming parsers, scan through the XML document.
- Access- Retrieve the Currency attribute and the PriceAmount for the number of LineItems specified.
- Content Modification- Increase the PriceAmount for all lineitems by 10%. Not implemented for SAX / pull parser.
- Structure Modification- Delete a specified number of lineitems and insert the same number at the end of the set of LineItems. The new lineitems must have LineIDs in the properly increasing sequence. Not implemented for SAX / pull parser.
- Serialize- Convert the entire in-memory tree to a serialized XML form written to a stream. Not implemented for SAX / pull parser.
Test Details
XML Test has been used to characterize these XML parsers:
Java .NET
Streaming SAX Pull
Tree Building DOM DOM
XML Test can be configured using the following parameters:
- Number of threads- This is tuned to maximize CPU utilization and system throughput.
- StreamUsage- Whether stream parsers are being tested.
- DOMUsage- Whether DOM parsers are being tested.
- Selection- The percentage of lineitems retrieved in the access phase.
- ContentModification- The percentage of lineitems whose textual content is modified.
- Structure Modification- The percentage of lineitems inserted or deleted from the DOM tree.
- RampUp- Time allotted for warmup of the system.
- SteadyState- The interval when transaction throughput is measured.
- RampDown- Time allotted for rampdown, completing transactions in flight.
- XmlFiles- The actual XML documents used by XML Test - XML Test reads these properties at initialization into an in-memory structure that is then accessed by each thread to initiate a transaction as per the defined mix. To keep things as simple as possible, XML Test is a single-tier system where the test driver that instantiates an XML transaction is part of each worker thread. A new transaction is started as soon as a prior transaction is completed (there is no think time). The number of transactions executed and the response time is accumulated during the SteadyState period and is reported at the end of the run.”
The XML Mark 1.1 Benchmark Platform
In the test results published below, we used a Hewlett Packard HP DL585 AMD Opteron machine running 32-bit Windows Server 2003. The machine is configured as follows:
- 2 x 1.8 GHZ AMD Opteron Processors
- 3 GB of RAM
- 36 GB SCSI Drive array (10 ms access time)
- Windows Server 2003 Standard Edition
- .NET v1.1.4322 (included with Windows Server 2003)
- .NET v2.0.50215 (public Beta 2 which can be downloaded from MSDN for free)
- Sun J2SE 1.5 Update 2 (JVM version 5.0.20.9)/Sun JWSDP 1.5
Test Results
.jpg)
DOM Build Test Results with No XML Serialization
In these tests, Sun’s sample .INI files for DOM1, DOM2 and DOM3 were run unmodified as follows:
DOM1.INI:
[CONFIG]
Agents = 4
PullParserUsage = 0
SAXUsage = 0
DOMUsage = 100
Selection = 25
Validation = 0
ContentModification = 0
StructureModification = 0
Serialization = 0
#Rampup time for the run in seconds
RampUp = 180
#SteadyState time for the run in seconds
SteadyState = 300
#Rampdown time for the run in seconds
RampDown = 30
#Total number of XML documents for processing
XMLDocCount = 1
#Actual xml file names with complete path
XML_FILE_1 = ..\\..\\docs\\inv100.xml
SchemaFile = ..\\..\\docs\\sm-inv.xsd
OutputDir = ..\\..\\results
OutputFilePrefix = XmlMarkSummary
DOM2.INI:
[CONFIG]
Agents = 4
PullParserUsage = 0
SAXUsage = 0
DOMUsage = 100
Selection = 50
Validation = 0
ContentModification = 0
StructureModification = 0
Serialization = 0
#Rampup time for the run in seconds
RampUp = 180
#SteadyState time for the run in seconds
SteadyState = 300
#Rampdown time for the run in seconds
RampDown = 30
#Total number of XML documents for processing
XMLDocCount = 1
#Actual xml file names with complete path
XML_FILE_1 = ..\\..\\docs\\inv100.xml
SchemaFile = ..\\..\\docs\\sm-inv.xsd
OutputDir = ..\\..\\results
OutputFilePrefix = XmlMarkSummary
DOM3.INI:
[CONFIG]
Agents = 4
PullParserUsage = 0
SAXUsage = 0
DOMUsage = 100
Selection = 100
Validation = 0
ContentModification = 0
StructureModification = 0
Serialization = 0
#Rampup time for the run in seconds
RampUp = 180
#SteadyState time for the run in seconds
SteadyState = 300
#Rampdown time for the run in seconds
RampDown = 30
#Total number of XML documents for processing
XMLDocCount = 1
#Actual xml file names with complete path
XML_FILE_1 = ..\\..\\docs\\inv100.xml
SchemaFile = ..\\..\\docs\\sm-inv.xsd
OutputDir = ..\\..\\results
OutputFilePrefix = XmlMarkSummary
DOM Build Test Results with XML Serialization
.jpg)
In these tests, Sun’s sample .INI files for DOM4, DOM5 and DOM6 were run unmodified as follows:
DOM4.INI
[CONFIG]
Agents = 4
PullParserUsage = 0
SAXUsage = 0
DOMUsage = 100
Selection = 50
Validation = 0
ContentModification = 25
StructureModification = 0
Serialization = 100
#Rampup time for the run in seconds
RampUp = 180
#SteadyState time for the run in seconds
SteadyState = 300
#Rampdown time for the run in seconds
RampDown = 30
#Total number of XML documents for processing
XMLDocCount = 1
#Actual xml file names with complete path
XML_FILE_1 = ..\\..\\docs\\inv100.xml
SchemaFile = ..\\..\\docs\\sm-inv.xsd
OutputDir = ..\\..\\results
OutputFilePrefix = XmlMarkSummary
DOM5.INI
[CONFIG]
Agents = 4
PullParserUsage = 0
SAXUsage = 0
DOMUsage = 100
Selection = 50
Validation = 0
ContentModification = 0
StructureModification = 25
Serialization = 100
#Rampup time for the run in seconds
RampUp = 180
#SteadyState time for the run in seconds
SteadyState = 300
#Rampdown time for the run in seconds
RampDown = 30
#Total number of XML documents for processing
XMLDocCount = 1
#Actual xml file names with complete path
XML_FILE_1 = ..\\..\\docs\\inv100.xml
SchemaFile = ..\\..\\docs\\sm-inv.xsd
OutputDir = ..\\..\\results
OutputFilePrefix = XmlMarkSummary
DOM6.INI
[CONFIG]
Agents = 4
PullParserUsage = 0
SAXUsage = 0
DOMUsage = 100
Selection = 50
Validation = 0
ContentModification = 25
StructureModification = 25
Serialization = 100
#Rampup time for the run in seconds
RampUp = 180
#SteadyState time for the run in seconds
SteadyState = 300
#Rampdown time for the run in seconds
RampDown = 30
#Total number of XML documents for processing
XMLDocCount = 1
#Actual xml file names with complete path
XML_FILE_1 = ..\\..\\docs\\inv100.xml
SchemaFile = ..\\..\\docs\\sm-inv.xsd
OutputDir = ..\\..\\results
OutputFilePrefix = XmlMarkSummary
SAX/Pull Test Results
.jpg)
In these tests, Sun’s sample .INI files for SAX1, SAX2 and SAX3 were run unmodified as follows:
SAX1.INI
[CONFIG]
Agents = 4
PullParserUsage = 100
SAXUsage = 100
DOMUsage = 0
Selection = 25
Validation = 0
ContentModification = 0
StructureModification = 0
Serialization = 0
#Rampup time for the run in seconds
RampUp = 180
#SteadyState time for the run in seconds
SteadyState = 300
#Rampdown time for the run in seconds
RampDown = 30
#Total number of XML documents for processing
XMLDocCount = 1
#Actual xml file names with complete path
XML_FILE_1 = ..\\..\\docs\\inv1000.xml
SchemaFile = ..\\..\\docs\\sm-inv.xsd
OutputDir = ..\\..\\results
OutputFilePrefix = XmlMarkSummary
SAX2.INI
[CONFIG]
Agents = 4
PullParserUsage = 100
SAXUsage = 100
DOMUsage = 0
Selection = 50
Validation = 0
ContentModification = 0
StructureModification = 0
Serialization = 0
#Rampup time for the run in seconds
RampUp = 180
#SteadyState time for the run in seconds
SteadyState = 300
#Rampdown time for the run in seconds
RampDown = 30
#Total number of XML documents for processing
XMLDocCount = 1
#Actual xml file names with complete path
XML_FILE_1 = ..\\..\\docs\\inv1000.xml
SchemaFile = ..\\..\\docs\\sm-inv.xsd
OutputDir = ..\\..\\results
OutputFilePrefix = XmlMarkSummary
SAX3.INI
[CONFIG]
Agents = 4
PullParserUsage = 100
SAXUsage = 100
DOMUsage = 0
Selection = 100
Validation = 0
ContentModification = 0
StructureModification = 0
Serialization = 0
#Rampup time for the run in seconds
RampUp = 180
#SteadyState time for the run in seconds
SteadyState = 300
#Rampdown time for the run in seconds
RampDown = 30
#Total number of XML documents for processing
XMLDocCount = 1
#Actual xml file names with complete path
XML_FILE_1 = ..\\..\\docs\\inv1000.xml
SchemaFile = ..\\..\\docs\\sm-inv.xsd
OutputDir = ..\\..\\results
OutputFilePrefix = XmlMarkSummary
Conclusions
The XML Mark 1.1 Benchmark Kit demonstrates that major improvements have been made with .NET 2.0 over the .NET 1.1 runtime for XML parsing. These improvements will impact both client and server programs that do any amount of direct XML manipulation. Furthermore, the results indicate that .NET 2.0 performs significantly better than the latest Sun Java 1.5 platform in XML parsing scenarios. We encourage customers to download the XML Mark 1.1 benchmark kit and test the scenarios for themselves.