XML and www.microsoft.com
September 18, 2000
An interview with Mike Moore, the director of development of the fourth-largest Web site in the world.
I met Mike about a year and a half ago, when he and his team first started to play with the early versions of MSXML 2.0. Since then, they've done some amazing things, so I wanted to take this opportunity to sit down with Mike and get his story. If you are into building Web sites that use XML technologies, I think you will find this amazing, exciting, and extremely relevant stuff.
Mike says that XML/XSL now drives everything for which his team is directly responsible. This includes the following:
|Download Center||Mostly XML/XSL|
For a brief demonstration of how one of these applications works, let's take a look at the product catalog, which is localized into 38 languages. For example,
displays the US English product information for Money 2000. This page loads the following XML documents:
The Active Server Pages (ASP) code aggregates all this XML into one master IXMLDOMDocument object, and then runs the transformNode method to dynamically generate the fully localized HTML.
The XSL that it uses is broken down into pages that correspond to the Overview, Features, and System Requirements links on the left-hand side. For example, the overview page is generated by using http://www.microsoft.com/catalog/overview.xsl
The nice thing about this is that it's easy to repurpose the XML. For example, the site for France has put their own look and feel on the same information.
Chris: How did you get started with XML?
Mike: Some new developers joined my team in late 1998, and they gave us a demonstration of an XML dialect they had created for the Knowledge Base using a beta version of MSXML. Seeing MSXML survive processing the entire Knowledge Base gave us great confidence in the stability of the MSXML component. In early 1999, we were working on a new product catalog application and were running into some very real operational limits. We had a back-end system that was generating about 100,000 fully localized HTML pages. Managing this quantity of data across a Web farm of about 30 U.S. machines—and a total of 20 machines outside the U.S. that had slower connections—together with very high churn, was next to impossible.
We were about two weeks away from going into production with this new system when we decided XML was a much better way to do this. Internationalization was a big factor. We did some performance analysis on MSXML and were very impressed with the fact that it ran rock solid for 20 hours with no memory leaks (using our typical 20-KB XML test file), so we decided to completely scrap the old system and start over with XML. One guy sat down and prototyped something in a couple days using XML for the data and ASP script code that loaded the XML by using MSXML and rendered the HTML page.
Chris: How much of a slip did this cause in your schedule?
Mike: That's the amazing thing: Our product manager at the time almost had a heart attack when we changed technical direction and adopted unproven technology weeks prior to ship, but we actually shipped on time. In two weeks, we completely rebuilt the system using XML and ASP.
Chris: How many people did you have working on this?
Mike: One and a half devs, brand new to XML. This is why we decided not to use XSL in the first version; we didn't have the time to learn it. The interesting thing we found was that during this process, more things that we hadn't anticipated were also being moved into XML. The UI itself contained a bunch of localizable strings for menus and buttons and so forth, and so the developers moved all this out of the ASP code and into separate localizable XML files.
Chris: How many XML files did you have in the end?
Mike: About 9,000—reducing the number of files by more than a factor of 10. We learned some interesting things here, also. We were not sure whether to have one file per language or to put all the languages in one file. The latter did not work at all. The files were too big, and so the performance was terrible. We've learned that there is an optimal file size for XML files that are loaded in page-scope, which is up to about 50 KB. (We use larger XML files in application scope.) The nice thing about this is that it also works much better for our localization process. Having separate files for each language means they can be worked on in parallel.
Chris: When did you start playing with XSL?
Mike: We started to use XSL on internal systems last June to educate ourselves on how to leverage it, and our newsletter system and registration system shipped using it January 2000. The product catalog went live with XSL about two months ago and is perceivably faster as a result.
Chris: What are some of the benefits of XML and XSL that you are experiencing?
Mike: It just works. The ability to make lots and lots of granular updates to the data in an incremental way, without any risk of breaking any code, no additional testing needed, works great for our process. And since we have fewer XML files than HTML files, each update results in much less replication out on our Web farm, making this much more manageable also—with a smaller window of opportunity for the Web farm to be in an inconsistent state.
We also found that ASP pages with code and HTML in them were unsupportable because of the risk of breaking the code. So we sort of made a transition from static pages to ASP applications to completely data driven ASP. I think this is a process that a lot of other major Web sites are going through.
Chris: What is your experience with XSL performance?
Mike: We've found it to be notably faster than the equivalent ASP/XML DOM code, from two to three times faster. On the catalog, we are seeing about 25 requests per second.
Chris: What are your thoughts on XSL?
Mike: It's good. It has quite a ramp-up time, though. It takes about two weeks to train a new guy in it, and the XSL tools are still lacking. We've learned a lot of things along the way. For example, we are really careful to narrow down the scope of our XPath queries as much as possible.
To be honest, I was blown away by the performance of the new XSL-based catalog when it shipped. We could literally see that the pages were a lot snappier. On every ASP request, we are loading from four to six localized XML files. Then we're applying XSL on the fly to dynamically generate a fully localized HTML page, and we're doing about 30 of these transformations per second per machine.
It really exceeded our expectations. The whole code/data/presentation separation thing is a huge win. A lot of people don't realize how big a win this is. We didn't even realize it going in, but now we are finding new opportunities all over the place for re-using the content in ways we would have never thought of. We've even found that some other groups inside Microsoft are already reusing our XML content. They never even talked to us. They just looked at the XML, understood the schema, and ran with it. We had no idea until we stumbled onto it!
This is one of the amazing things about XML. It is self-documenting. The data is the API, and people just consume it.
Chris: Is this causing a problem?
Mike: Well, we just didn't realize when we published XML that we were publishing an API. When we change the XML schema we are going to break a bunch of people's code. So far, we haven't had to do that.
Chris: During the project, how many times did you have to change the XML schema?
Mike: Not many, actually. We added new things as we went along, but the core schema has hardly changed at all in two years. But we make no guarantees. The XML up on that site could change at any time. Use at your own risk.
Chris: What hardware do you use?
Mike: Quad processor, 500 MHz machines with 2-GB RAM, and 80-GB hard drives running Windows 2000. By the way, Windows 2000 was another huge performance win for us. It gave us over 200percent improvement and 20 times better reliability. Internal measurements show that our servers went from 99.8 percent reliability to over 99.99 percent.
Chris: How many machines?
Mike: We have about 30 machines in our main cluster, each handling about 100,000 users per day. You have to understand that we hammer Windows 2000 really hard. We have about 75,000 different ASP pages on www.microsoft.com and a churn of up to 150,000 page changes per day, so the operational maintenance side of this site is huge. We are using stock standard Windows 2000 SP1 with MSXML 2.5. We're evaluating MSXML 3.0 right now.
Chris: Where do you go from here?
Mike: Our main focus is now on Web Services and the .NET Framework. We were finding ourselves building our own custom "Web Services," so we're thrilled to see this in the .NET Framework. We're also playing with the SOAP toolkit.
We are going to move everything to .NET Web Services. We are an early adopter of the .NET Framework and will be going live with the .NET Framework stuff real soon. The Web Services layer is an exciting way to break the tight coupling between the application and the actual data, but still get great caching and performance.
For example, we've had a lot of third-party OEM type people ask for ways to get information about new device drivers. This way, companies like Dell can ping our Web Service behind the scenes and get the information that their customers need, while keeping their users in one consistent, simple Web site. They won't have to point the user off to our site and hope that they don't get lost along the way. So we are going to do this.
Chris: Do you think all this Web Service stuff will increase the load on your servers?
Mike: Yep, probably. That's what we want! We don't know for sure, yet.
Chris: Anything else we haven't covered?
Mike: Well there are lots of things you haven't asked me about. We're looking into offloading the XSL processing onto the Internet Explorer 5.x clients. We're using the new SQL Server 2000 XML support. This is one of the most compelling things I've seen in a while. We are using XML caching for high scalability of non-user-specific application state. We're getting up to speed on the new Xml classes in the .NET Framework. The XmlNavigator class looks awesome. C# is awesome, with the power of C, ease of development, and great performance.
At this point I ran out of time. The future of XML at microsoft.com is clearly a whole series of articles! Mike is passionate about what he's doing and his team is doing amazing things. It's exciting to see how XML technology is solving real world problems and revolutionizing the Web.
Chris Lovett is a program manager for Microsoft's XML team.