Bungie.net Technical Case Study
Summary: The Bungie.net site is the online companion to the wildly successful Halo 2 video game for Xbox, released in November 2004 by Microsoft. The site also acts as the community hub for all things related to Bungie games. Built with the Microsoft .NET Framework, Bungie.net serves up more than 4 million pages per day, accumulating 300 gigabytes of online game statistics per month from more than 1 million games played daily. Deemed "Most Innovative Design" by IGN Entertainment in 2004, the site provides innovative ways for users to view game statistics and details as well as interact with each other through forums and team Web pages. The release of the Bungie.net site represented a milestone in online console game play. This case study provides insight into this accomplishment. (25 printed pages)
3. Business Drivers
4. Technical Approach
5. Project Plan and Effort
7. Solution Architecture
8. Development Approach
9. Test Approach
10. Execution and Operations Architecture
11. The Outcome
Bungie.net has been around since 1996, and in that time it has continued to serve its community role, and even served as the gameplay destination for Bungie RPG Myth's dedicated online players. Bungie.net continues to be a place to find news, events, and technical information about Bungie projects past, present, and future.
Although Bungie.net is the go-to place for Halo 2 stats, its primary function is as the community hub for everything Bungie Studios-related. That means it's a place for fans of all Bungie games, past and present, to gather and discuss their shared interests. It's a forum for discussion on what they like and what they don't like about current projects and it's a place to learn everything there is to know about Bungie lore, culture, and legend.
Bungie.net leverages the Microsoft .NET Framework running on Microsoft Windows 2003 and Microsoft SQL 2000 Servers to serve up over 3 million page views per day and accumulating over 300 GB of data a month of online game statistics from the almost 1 million online games played every day. Not only is Bungie.net built to scale, but its design and inventive features have not gone unnoticed, since it was rated as the "Most Innovative Design" by IGN Entertainment (http://bestof.ign.com/2004/xbox/14.html in 2004. The site also exceeds a 99 percent up-time ratio even through peak usage periods such as the week of the Halo 2 release. Clearly, the release of the Bungie.net site defines a new milestone in the era of online game play. This case study provides insight into this accomplishment.
Bungie was formed in 1991 by two guys in a garage to publish a game called "Operation: Desert Storm." In 1992 they released "Minotaur: The Labyrinths of Crete." Bungie's future successes became apparent with "Pathways into Darkness" and "Marathon," which were constructed for the Macintosh platform and later ported to the PC platform as "Marathon 2" in 1995. Bungie first started into the game console space with a game called "Oni" that targeted the Macintosh, PC, and Playstation2 platforms. In 1999, Take 2 Interactive, a leading provider of game console games such as Grand Theft Auto, helped to take Bungie to the next level of gaming development by providing Bungie with financing and a substantive distribution channel.
Around 2000 the idea of Halo started to take hold within Bungie as a game concept that would allow players, in a real-time 3-D environment, to control marines and vehicles such as the futuristic jeep-like item that would become known as the Warthog. In 2000 Microsoft acquired Bungie and with it Halo in order to bring the game exclusively to their Xbox platform. On November 15, 2001 Halo was finally released with the Xbox. Halo was a huge success and a Halo sequel was inevitable.
There are obvious differences in game play between Halo and Halo 2; however, the most important aspect is its online experience provided by leveraging the features of Xbox Live. This experience is further extended by allowing users to view not only their own and their team's accumulated online play stats, but to interact with other players outside of the gaming world through discussion forums and groups. Plus, Bungie.net exposes an RSS feed for game stats that online gamers have been using in ways that the Bungie team could have not imagined.
Bungie's primary focus was the release of Halo 2 on the much anticipated and well publicized release date of November 9, 2004. This left all other release objectives, such as the Bungie.net Web site integration with game statistics, as secondary. The Bungie team, however, saw an opportunity to enhance the overall gaming experience of Halo 2 by providing players with a unique view of their gaming world from outside the typical Xbox console. The goal was to have the Web site be an extension of the Halo 2 online experience.
Other objectives included:
- Lead the gaming industry in online console game experience by leveraging Xbox Live services and accumulated data.
- Develop an environment that online players can use to remain engaged and active in Halo 2 game play outside of the Xbox console.
- Develop an integrated community where players can more easily and readily interact with one another to form teams and to share experiences and compete.
- Develop a Web site that meets the functional and scalable needs of the online community without failure. If stability could not be achieved under load, then the online gaming services that are to be provided for Halo 2 would not be developed.
- Do not negatively impact the release of Halo 2 in any way.
- Provide valued services and a significant "cool factor" to online players of Halo 2.
Prior to Halo 2, the Bungie Web site was built with Perl running on a Microsoft Windows platform. However, over time the Bungie team found that this Web solution was not very scalable and would likely not be sufficient for the release of Halo 2. Because of this, one of the first steps Bungie undertook was the redevelopment of the existing site to take advantage of the capabilities of the Microsoft .NET Framework, with the eventual goal of hooking its features into Halo 2.
One of the key factors that prevented the old site from scaling was its highly dynamic approach to content. The existing site was extremely flexible and run-time configurable, providing an extremely user-centric experience. However, Bungie's existing Web site needed to support only about 50,000 hits per day, which was nowhere near the anticipated capacity the site would need after Halo 2 was released.
Bungie decided to take the following architectural approaches with the Bungie.net site:
- Clearly separate static from dynamic content on the site.
- Reduce the amount of dynamic content displayed to users to optimize caching.
- Focus on caching and cache utilization, minimizing the need for unnecessary I/O.
- Provide Xbox Live online game statistics to users of the Web site as quickly as possible, depending on the overall load of the system.
- Create a solution architecture with distinct layers and simply defined services interfaces.
- Design for horizontal scalability, providing the ability to add front-end Web servers as well as back-end statistical processing servers as load demands without modifying code or design.
- Focus on performance over features.
- Factor features to ensure a loosely coupled approach to functionality, allowing features to be independent of one another.
- Maintain clear separation between view and data.
4.1. Technical Challenges
During the week of the Halo 2 launch, the Bungie.net Web sites received over 5 million page views per day and close to 3 million of those were player game statistics requests. This fact alone emphasizes that scalability and performance were the most critical challenges placed on the Bungie site.
When it comes to scalability and performance of Web-based solutions, not many features are as important as caching. Bungie spent a considerable effort in perfecting its approach to caching, which leverages the caching features of the .NET Framework, ASP.NET, and browser-level caching.
Other technical constraints included:
- Integration with Xbox Live: Bungie.NET needed to integrate with existing Xbox Live Web services in order to request game statistical data as well as integrate with Xbox Live features such as groups and Live Friends.
- Statistics Processing: Over one million online games were played the day after the release of Halo 2. Game details for each game are transferred to Bungie.net, where they are further decomposed, stored, and eventually presented to users of the Bungie.net Web site. This process needed to be extremely scalable, reliable, and recoverable and complete in a timely manner (most game statistics are available within 30 seconds to 1 min of game play).
- Passport Integration: The Bungie.net Web site used Passport authentication and mapped a user's Passport account to their Xbox Live account, which allowed the system to pull and aggregate user-specific game details from Xbox Live Web services.
- Maintainability: The resulting system needed to be extremely manageable in a production environment, easily accommodating reconfiguration at run-time, allowing changes to the physical distribution of servers and services with minimal operational impact. Additionally, the functionality of the Web site needed to allow features to be turned off at run time, such as statistics and forums, through the use of configuration entries.
- Forums Integration: Bungie did not write the forums component of the Bungie.net site, instead relying upon a vendor for this functionality. Integration with the forum, however, needed to be seamless, and the architectural solution of the forums had to be congruent with Bungie.net with regard to scalability and reliability.
- Reliability: The goal of the system was to achieve or exceed 99 percent uptime. Bungie achieved a high level of reliability through its horizontal application architecture and the loose coupling of critical services. For example, the statistical processing engine could completely fail without jeopardizing the reliability of the main site.
- Physical Architecture Abstraction: Bungie did not want its architecture to be bound by the physical architecture of the production solution. Because of this, Bungie chose to leverage a service-oriented architecture and Web services to separate the layers of its application architecture.
- Recoverable: Failure of any aspect of the statistical processing engine could not result in the loss of data. If any aspect of the statistical processing engine failed, upon recovery the system would continue where it left off when the failure occurred.
The Bungie.net site was delivered using a phased approach, beginning with architecturally significant features followed by less significant features and increased scalability revisions. The following list demonstrates the process Bungie took to develop Bungie.net
Research (Began July 2003)
Description: Researching how the Microsoft.NET platform could be leveraged to replace the features of the existing Bungie site
Duration: 1 month
Effort: 1 Senior Developer
Requirements, Design, and Prototyping
Description: Prototyping the architecturally significant features of the Web site and mapping features from the existing Web solution.
Duration: 2 months
Effort: 1 Senior Developer
Release 1.0 (Completed April 12, 2004)
Description: Focused on the reimplementation of most of the existing Perl based site on the .NET Framework. Some features, however, were left as Perl implementations.
Duration: 6 months
Effort: 3 Senior Developers
1 front-end Developer
2 part-time Content Providers
1 full-time and 1 half-time Artists
Description: Redeveloped the Groups functionality using Microsoft .NET technologies including ASP.NET, ADO.NET, and C#.
Duration: 1 month
Effort: 2 Senior Developers
Release 2.0 (Delivered November 9, 2004)
Description: Include full game statistics features to
Duration: 2 months
Effort: 4 Senior Developers
1 front-end Developer
2 half-time Content Providers
Note The basic stats features/pipeline of Bungie.net were available during the beta of Halo 2 and stabilized until the Halo 2 release on November 9, 2004.
Total project duration: 12 months
3 months of research and design
7 months development of site
2 months for statistics integration
Total Project Effort:
1 front-end Developer
1.5 Graphic Designers
2 part-time Content Developers
The Forum features available on the Bungie.net site were outsourced to a company called Ideal Science (www.idealscience.com).
6.1. Functional Requirements
The following diagram depicts the main functional areas of Bungie.net and its primary subsystems.
Figure 1. The main functional areas of bungie.net
This refers to features on the Web site that are managed using a more traditional approach to Web-based content management. Such content is relatively static and changes are applied to the Web site through specific content management practices.
- News and Top Stories: A weekly look at online game play and news events pertaining to Bungie and Halo 2.
- Screen Shots and Game Information: A collection of information, screen shots, desktop wallpaper, artwork, FAQ, and audio/video from Bungie's succession of games, including Halo 2, Halo for the PC/Mac, and the original Halo for the Xbox.
- Weekly Update Pages: A weekly editorial from Bungie discussing various game-related questions and concerns, as well as a discussion of relevant weekly events with the company.
- Inside Bungie: A collection of articles, job postings, Web cameras, letters to the Webmaster, FAQ, and Team information.
- The Bungie Store: The Bungie Store (www.bungiestore.com) is not part of the Bungie.net Web site, as it was constructed and managed by a separate organization. The Bungie Store sells Bungie- and Halo-related merchandise including clothing, bags, and action figures.
- Passport Authentication: Everyone who visits the Bungie.net Web site can see all the static information on the Bungie.net Web site. Additionally, users can view overall game-play statistics, including high-level statistics such as the number of players currently online, the total number of players, and the number of logged matches. Anonymous users can also see the statistics of other online players and read the twelve major discussion forums. When users register with the Bungie.net site with their Passport, they can link their Xbox Live gamer tag to their Passport and publish their game statistics on the site and gain full access to the forum.
Dynamic content refers to site content that is either published by online users through features such as Forums, or from game statistics that are continuously refreshed from Xbox Live.
- Forums: Bungie.net provides twelve main discussion forums for registered Bungie users to post thoughts, comments, tips, and tricks related to various topics. Users can also post Web polls in their discussion topics and have results automatically gathered and displayed. Forum participation relies upon Passport authentication and completely integrates with the security model of the Bungie.net site.
- Groups: Bungie.net Groups are a way for groups of players to interact with one another through the Web site. Members have the ability to create or join groups and each group has its own News, Forums, Announcements, Guest Book FAQ, Articles, Custom Fields, and Links. Each group has an administrator that can control settings such as member approvals, blacklists, security role management, and management of news items. Group participation requires site membership and a Passport account.
- Messages: The Messages center, accessed from game player details pages, allows players to view messages that have been sent to them from other users of the Web site.
- Live Friends: The Live Friends feature allows you to see the status of Xbox Live friends. Live Friends will list all your registered Xbox Live friends and display their Xbox presence.
- Player Statistics: The game statistics area provides details on the games played by online players registered with Bungie.net. The Player Statistics area is broken into two areas: Overview (providing aggregate summaries of games played, games won, and level per game type); and Games (providing a list of games played, the date of the game, the map type, and results). Player Statistics are available on any registered Xbox Live player who has mapped their Passport account to their Xbox Live gamer tag. Viewers of the statistical information can drill into the stats of other players, specific games, and game types. Figure 2 and figure 3 depict how the statistics are presented to users of the site.
The following screen captures demonstrate this functionality.
Figure 2. Player statistics
Figure 3. Player statistics: games list
- Game Window: The Game Window provides details on individual games that registered site users play while on Xbox Live. The Game Window will provide you with game-specific statistics such as the individual scores of players that participated in that game, the number of kills for each player, the number of times each player killed each other player, medals awarded during that game, as well as the hit percentage (number of shots hit over the number of shots fired). Users can navigate to the Game Window through the Games list on the Player Stats page.
- Online Stats and Leaderboards: Users can view general statistic information about online game play through the Online Stats and the Leaderboard features. Online Stats provide information about online game play accumulated over the last 24 hours. Statistics provided are number of unique players, number of logged matches, and number of current players online. Leaderboards display the top ranking players in each game type.
Figure 4. Leaderboads
Also note that the Game Viewer also displays the player emblem of the gamer setup within the game during the creation of the player profile.
Figure 5. The Game Viewer
- Statistic RSS Feed: Bungie.net provides access to a user's online game statistics through an RSS feed. RSS stands for "Really Simple Syndication" and provides an XML representation of a user's online game statistics. These RSS feeds can be used with standard RSS 2.0 aggregators (such as NewsGator or SharpReader).
Since the RSS feed is in XML format, this data can be consumed by virtually anything that understands XML. For example, Duncan Mackenzie (Festive Turkey in the screenshots above) consumes his statistics through an RSS feed that allows him to summarize his game play on his Web site located at http://www.duncanmackenzie.net/halo2.aspx.
Figure 6. Displaying the RSS feed in a blog
Another interesting use of the RSS feed provided by Bungie.net was created by Joseph Chirilov and Sam Radakovitz from Microsoft. They created an Excel 2003 spreadsheet that consumes the RSS feed as a data source and uses the embedded data to display their online game statistics in a rich format.
Figure 7. Consuming the RSS feed in Excel
The XML and List processing features built into Microsoft Excel allowed these developers to create this solution in 3 hours. Microsoft Excel 2003 has the ability to read the XML directly from the RSS feed. The developers of the spreadsheet used Excel formulas to extract the data contained within the XML provided by the RSS feed. This extracted data was then used to produce the charts, graphs, and pivot tables shown in Figure 7. It is important to note that no code was written, in VBA or .NET, in order to consume the raw RSS feed and to prepare the results.
This Excel 2003 Spreadsheet can be found at: www.Isamrad.com
6.2. Operational Requirements
Halo 2 was expected to be hugely popular, so the Bungie.net site needed to operate through peak periods, such as those experienced during the launch, as well as on an ongoing basis.
The goal of the Bungie.net Web site and associated game services was to achieve 99 percent uptime. The actual availability of the site has exceeded this metric through effective operational planning coupled with fault-tolerant system architecture.
The Bungie.net system promotes manageability with the following features:
- Ability to disable features of the Bungie.net Web site, such as forums or statistics, through configuration files.
- Ability to integrate with MOM (Microsoft Operations Manager) 2005, allowing for effective monitoring and rule-based alerting features across all primary server instances, including Microsoft Internet Information Server and Microsoft SQL Server 2000 Enterprise.
- Ability to modify implementation of major architectural elements without impacting the rest of the system through the use of Web service interfaces and a service-oriented approach to design and system integration.
The scalability of Bungie.net needed to be addressed differently at different times. During beta testing, for example, the load on the Bungie site was known and manageable. Immediately following the launch of Halo 2 the site received over 5 million page requests and over 1 million online games were played.
Scalability was addressed with the following features:
- Microsoft Network Load Balancing that comes with Windows Server 2003 was employed across many Web servers. This allowed Bungie to cluster front-end servers and balance Web requests across all Web servers participating in the cluster. As Web traffic load increased, additional servers can be added to the cluster without modifying the application. Similarly, over time, load on the Web site may decrease to a point where servers are removed from the load-balanced cluster when they are not needed without causing any impact to the overall architecture of the system. Network Load Balancing also helps to ensure greater reliability by ensuring a degree of fault tolerance between the Web servers.
- The ability to trickle statistical information from Xbox Live based on load characteristics was extremely important to the design of Bungie.net. Statistics from Xbox Live, such as Leaderboards and Friends, is accessed in real time and is not persisted within the Bungie.net. Other statistical information from Halo 2, which does not come directly through Xbox Live, is stored within the Bungie.net databases and can generally be accessed by users within 30 seconds.
- In a clustered Web environment, it was important to distinctly manage state and caching information because connection affinity (where subsequent Web requests go to the server that first answered the originating Web request) cannot be guaranteed. In fact, the site was released with connection affinity turned off.
- The ability to handle large amounts of data for features such as Forums and Statistics was vital. One month after the release of Halo 2, Bungie.net housed over 2500 forums, comprised of the 12 major discussion forums and group-level forums. Discussions within these forums are represented by over 1 million rows of data in the Forums database and is growing rapidly. Game-level statistics are added at a rate of almost 300 GB per month. Microsoft SQL Server is the foundation for all data storage for Bungie.net, and is used in clustered and non-clustered configurations.
7.1. Conceptual View
The following diagram depicts the Bungie.net site at the conceptual level.
Figure 8. The Bungie.net site at the conceptual level
Functionally, Bungie.net is self-contained with the exception of the integration of Passport authentication and Xbox Live. Xbox Live communication is accomplished over Web services and provides two sets of features. First, to retrieve Xbox Live game statistics; and second, facilitated through the Bungie.net Xbox Live Web Services, to interact with the Xbox Live Web services to retrieve information such as Xbox Live Friends lists, clans (groups), and Leaderboard information.
7.2. Detailed Architecture
Here are the implementation specifications:
- Load-balanced front-end Web servers provide the main interface to users of Bungie.net. Load balancing is accomplished with Microsoft Network Load Balancing System. During the launch there were seven front-end Web servers needed to support the initial traffic load of around 5 million page views per day.
- Active/Passive Microsoft SQL Server 2000 Cluster used directly from the Web servers to serve up user account and content information. A SAN (Storage Area Network) is used as the shared quorum drive in the SQL Cluster.
- Forum data is stored on a single Microsoft SQL Server 2000 computer.
- Halo 2 statistics are all stored on a single Microsoft SQL Server 2000 computer.
- Bungie.net Xbox WebServices Servers are used to abstract and isolate calls to the Xbox Live service and are called directly by the Bungie.net application engine running on the Web servers.
- For Xbox Live game statistics-processing the following servers are used:
- Halo 2 Stats Collector Server: These servers act as passive staging areas for game stats deposited by the Title Servers. Files are stored into the file system and wait here to be processed by the Processor Servers. Essentially, the Collector servers act as the primary asynchronous queuing mechanism for the entire statistics import process. This server gets its information in the form of XML from Halo 2 Stats Servers that run within a secure environment.
- Halo 2 Stats Processor Servers: Processing Servers pull files from the Collector servers at regular intervals, always pulling the oldest files first to help insure a FIFO (First In, First Out) processing queue. The processing servers perform additional processing of the files so that game statistics can be easily inserted into the Halo 2 Stats Database Server.
- Halo 2 Stats Database Server: Once processing is complete, the Stats Processing Servers update the main Stats Database Server through a Web service call. It is from this location that statistics are requested through a separate set of Web service calls, passed onto the primary Bungie.net application engine and finally presented to users.
- Note the statistic gathering process is illustrated in igure 9.
- Microsoft Operations Manager 2005 is used in the production environment to monitor all managed servers.
Figure 9. The statistic gathering process
7.2.1. Significant Subsystems
Bungie.net is a composition of a number of subsystems. The most significant subsystems include:
- The Content Display and Management Subsystem
- The Bungie.net Application Engine
- The Game Stats Engine
- The Caching Subsystem
- RSS Feed
The Content Display and Management Subsystem
One of the challenges of the Bungie.net Web site was to maintain a clear separation between static and dynamic content. Additionally, as with most large Web sites, there was a strong requirement to be able to preview, launch, or roll back content in a consistent and managed way that does not interfere with other aspects of the site.
Text portions of the Web site are stored as local plain text files on the Web servers and integrated with templates managed by the .aspx pages at run time, then cached to speed up subsequent requests.
Images are stored in file directories on each of the Web servers and rely solely upon the built-in Microsoft Internet Information Server 6.0 cache to increase performance. The Web servers provide browsing of the large, hierarchical image gallery by interpreting an XML file that provides navigation and maps full-sized images to their thumbnails. Bungie made a design decision not to dynamically create the thumbnail galleries to reduce server processing and aid in scalability. The creation of the thumbnail navigation XML document and the thumbnail images themselves is managed by a custom script created by Bungie.
The Bungie.net Application Engine
The brains of the Bungie.net Web site lie in the application engine. All user requests are passed to the application where the engine validates the user credentials and determines what the user is allowed to see and do on the site. For example, anonymous users cannot create new post items in forums or see game statistics. The application engine also acts as a dispatcher for data retrieval operations and interacts with custom caching components to help improve scalability. The application engine manages all database access and performs all authentication and authorization processes. The engine is implemented as a DLL and exists on each of the front-end Web servers and is instantiated directly via an ASP.NET code-behind model.
The Game Stats Engine
The Game Stats engine is responsible for retrieving game statistics from Xbox Live and making this data available in a form that can be presented by the Bungie.net application engine and Web servers. The process of retrieving statistics from Xbox Live is detailed in section 7.2.
The Game Stats engine is recoverable such that failures that may occur during the processing of game statistics data can be recovered in a way that the chances of data loss are minimized due to the queued architecture of the solution and the Collector servers. Game statistic packages are never removed from the Collector servers until they have been successfully processed. Additionally, the Game Stats engine naturally adapts to load by providing a layered Web service interface between processing steps. Additional load is managed at each level by scaling horizontally as necessary.
The Caching Subsystem
The Bungie.net team realized that the key to Web site performance and scalability was the effective use of caching. Not only does Bungie.net leverage the built-in caching and compression features of IIS 6.0 for site content text and images, but the Bungie team also developed their own caching engine based on the features of ASP.NET. Outside of Web pages and images, many other aspects of the Bungie.net site are cached including game statistics, Passport/Gamer Tag linking, and other reference data to populate list boxes and fields."Without output caching our site would die." — Zach Russell, Engineering Lead Forums
Because of timelines, Bungie decided to completely outsource the Forums subsystem to a company called IdealScience. This posed a slight risk to the project because Bungie would need to guarantee that the forums component was as scalable as the rest of the Bungie.net site. For this reason, Forums remained a separate entity whose functionality can be turned off dynamically yet whose architecture matched that of the main Bungie.net application. The forums component even has its own database to ensure physical segmentation of functionality. The forums component does rely upon the Bungie.net application engine for its instantiation and user profile and authentication information. Originally the team was considering a Web service interface around the Forums component to be consumed by the Bungie.net Web site, but this option was dropped for performance reasons in favor of a .NET Framework-based component which is instantiated directly on each Web server.
The RSS Feed
The RSS feed is an extension to the game statistics area of the Web site. Like the rest of the game statistics features, the RSS feed relies upon a link between the online gamer tag in Xbox Live and Passport, which is accomplished outside the Bungie.net solution. The process of gamer tag referencing is asynchronous, since Bungie.net cannot necessarily rely upon a live connection to Xbox Live. The asynchronous process first goes out to retrieve the gamer tag and updates a cookie with linking information that is later cached for performance. All in all, this leaves some room for inconsistencies of communicating over the internet.
Identity-linking information is the key to all game-level statistics. With this information Bungie.Net can retrieve lists of recent games, display the gamer Emblem associated with the Halo 2 player profile, retrieve game lists, and detailed game information. The RSS feed will provide an XML representation of this information limited to a 7-day window. The RSS feed essentially encapsulates a Web service call to the Bungie.net back-end that retrieves 10 games worth of game statistical information from the stats database passing in unique gamer tag information of the requesting user.
7.2.2. State and Session Management
No custom development of state or session management was performed by the Bungie.net team. All ASP.NET state is managed through the default facilities provided by ASP.NET, which are configurable at run time.
7.2.3 Logging, Exception Handling, and Monitoring
In order to track and trace events, the Bungie.net team developed their own exception processing component that logs all exception information into a Microsoft SQL Server database for future reference.
The exception management component extracts all relevant information of the event, including common properties such as exception type, version of the assembly, and stack trace, as well as application state information such as current user information and functional area where the exception was triggered.
To view the exceptions generated by the exception management component the Bungie.net team wrote their own exception viewing tool. In addition to simply listing all discovered exceptions this tool also provided the ability to correlate exceptions by source objects, assembly versions, server name, and frequency of exception.
All serious exceptions are written to the system event log of the servers. Microsoft Operations Manager (MOM) 2005 monitors server status in the production environment, leveraging the alerting tracing model provided by the product. MOM 2005 also monitors all operating system and server statuses such as Microsoft Internet Information Server (IIS) and Microsoft SQL Server events.
Microsoft Performance Monitor integration was used sparingly in Bungie.net because of the relative performance impact and was not relied upon to provide ongoing heartbeat information. Instead, Mercury SiteScope is used by the operations team to help gather detailed performance and use data from not only the Web site but from virtually all areas of the production infrastructure.
The Bungie.net team addressed security in a number of different ways.
- Site authentication: The Bungie.net site relies upon Passport Authentication to manage the user authentication process. Every user who wishes to participate with Forums or player statistic features of the Web site must register with the Bungie.net site. Site registration involves the process of registering a user's Passport account with a profile maintained on the Bungie.net site. User profile information includes:
- Geographic location
- Instant Messenger ID
- E-mail address
- Display name
- Real name
- List of Bungie games played
- Signature (used in Forums)
- Avatar (used in Forums)
- Xbox Live settings
Passport integration is required for integration with Xbox Live statistics and general access to personalized areas of the site.
- Security Review: The Bungie team employed the services of an outside organization to perform a security review/analysis on the Bungie.net Web site and associated services.
- Operational Security: The Bungie.net operations team continually watches for security threats during ongoing maintenance and support. The data center that hosts the Bungie.net servers is responsible for monitoring and detecting specific attacks such as Denial of Service and has facilities in place to throttle and block traffic attempting to exploit this common vulnerability.
- Internal Code Review Process: The Bungie.net team met continually to conduct code reviews. During these times only particular threats were targeted. More common threats, such as exploits that leverage buffer overrun problems, do not exist using the .NET Framework. The design of the Bungie.net Web site also makes it generally difficult to exploit for the following reasons:
- Minimal level of end-user data entry. Virtually all new data comes into the site in the form of profile information, group interactions (such as news), forums, or statistical data. All discrete data entry forms, such as those used during the configuration of user profile information, validate all data prior to any persistence operations.
- Bungie controls the process of Xbox Live interaction. The architecture of the Bungie.net site physically restricts and controls access to the set of Xbox Live services used to obtain statistical information and to query Xbox Live data for overall game statistics and leaderboard information.
- Use of Microsoft SQL Server stored procedures and parameterized queries used for all data access minimizes the likelihood of SQL injection attacks.
The Bungie.net team used an agile, milestone-driven approach to the construction of Bungie.net, initially focusing their attention on the most significant architectural elements. The Bungie.net team worked in a close physical area, relying upon constant and continual communication between peers, yet able to employ expertise when required specifically during the requirements and design phases and later on when significant technical challenges were uncovered. The Bungie.net developers also worked extremely closely with the testers to ensure effective communication and fast defect resolution velocity. Having a strong test team was essential to making this site work.
Most of the project was treated like a research project and relied upon continual and gradual refactoring of code and design as the overall requirements and design were realized throughout the .NET Framework learning curve.
Tools used during the construction of Bungie.net include:
- Microsoft Visual Studio.NET 2003
- Microsoft Application Test Center (ACT): used to perform load testing
- Microsoft SQL Server 2000
- Microsoft Internet Explorer 6.0
- BeyondCompare 2.0 (www.scootersoftware.com): Used to compare files and folders as well as aids in folder synchronization and change management
- SQLCompare (www.Red-Gate.com): Used to compare and synchronize Microsoft SQL Server database structures
- Microsoft Visio Professional: Used to model site navigation
- Source Depot: Microsoft-internal source control system
- Microsoft Word 2003: Used to record specifications and requirements
- Mercury SiteScope for load testing
Additional facts regarding the Bungie.net development approach:
- Source Control: The team maintains three distinct Source Depot databases
- Development Depot: This depot is used to house all source code that is considered "in-development."
- Pre-Production Depot: This depot is a staging area for code that must be tested prior to deploying to a production stage.
- Production Depot: This depot contains the exact source code of the Bungie.net site in production.
- Scripts are used to transfer code from one depot to the next (from Pre-Production to Production). These scripts run a DIFF operation on the source and destination depots to create a change list. From this change list the script can determine exactly what files need to be transferred between the source and destination depots. A tool called BeyondCompare was also used extensively to transfer code from the Development to Pre-Production depots.
- The developers made extensive use of source labelling to manage interim and major code releases within source control.
- Managing Builds:
- Code at the developer level was built manually through the Visual Studio.NET IDE.
- All production-level builds are built via an automated build process and work off the Production Depot. After code has been built off of Production code, scripts automatically deploy the compiled solution to test servers for testing.
- Managing Deployments:
- Non-development deployments were performed with scripts making extensive use of the XCopy deployment features of .NET Framework applications in pre-production test environments.
- The modular nature of Bungie.net allowed for deployments that did not cause any downtime of the Bungie.net main site. Prior to deployment, the features being upgraded would be "turned off" through configuration files. Deployment scripts were able to perform deployment activities and then re-enable the features once complete.
- The deployment to the Production environment is slightly more complicated compared with pre-production and testing environments. Once built using the Gold version of the code from the Production Depot, compiled artifacts are copied to a shared directory where they are then copied to the appropriate production servers through file replication.
- Learning the Microsoft .NET Framework:
- The majority of the team had little to no experience developing solutions on the .NET Framework or in C#.
- The team was comprised of developers and it did not take much effort for them to learn the C# syntax as well as the .NET Framework itself.
- A great deal of effort was spent at the beginning of the project learning and understanding how the .NET Framework would be used to eventually design and construct Bungie.net.
- Defect and Requirements Tracking: The team used a Microsoft internal tool called Project Studio to track all defects and system requirements. This tool also facilitated collaboration by being able to make task assignments and run reports on team progress.
The Bungie.net testing process can be broken down into the following test cycles:
- Developer Testing: Developers have the ability to run a copy of the Web site on their local development machines and can perform developer-level builds and testing at any point. Developers would regularly check code out of the Development Source Depot Database, make appropriate changes or additions, and then smoke test the functionality on their local machines prior to promoting any changes. Developers used a shared development/test database instead of managing local development databases.
- Functional Testing: When developers completed and tested their functionality and checked changes back into the Development Source Depot, a combination of manual intervention and scripts were used to transfer all new development changes to the Pre-Production Source Depot where binaries were built and deployed to the test servers. It is here that the testing team performed their functional testing of the solution and recorded defects in Project Studio. Functional testing was usually broken into smart testing, where only the modified features for a particular build were tested, and full testing, where all features of the site were tested and validated.
- Stress Testing: This form of testing was conducted on an entirely different set of test servers that did not participate in the scripted automated build and deploy process. This site was referenced through through an alternative URL and tested with tools such as Microsoft Application Center Test and Microsoft Performance Monitor.
The entire team relied heavily on Microsoft's internal project collaboration tool called Product Studio for all defect tracking, triages, task assignments, and overall tracking and reporting. In addition, developers spent time developing their own testing scripts that were used for both unit and functional testing.
The development of Bungie.net did not go without its challenges. Mid way through the deployment process a memory leak was discovered when considerable load was being placed on the production Web site. Significant effort was made to determine where the memory leak was coming from, using tools such as WinDBG with the SOS extension to allow debugging of managed code. Isolated stress testing by the test team was able to easily duplicate the error and after 2 days of effort by one developer the problem was found. The memory leak was not caused by any bug in the .NET Framework but by a bug in application logic that managed a queue of jobs. The developers used a multithreaded approach to execute jobs that resided in the queue and found that under load conditions the threads would enter into a deadlock state. This deadlock state meant that no job items were being removed from the queue and hence the queue grew and continued to consume managed memory.
All the servers that run Bungie.net are managed by a separate data center located in California. The operations team spent 2 months preparing a stable and scalable infrastructure environment to house the resulting Bungie.net solution. The data center provides substantial network redundancy and automatic failover capabilities. The data center also monitors overall network load and traffic to ensure optimal performance through peak periods, such as during the release of Halo 2, and ongoing.
The Bungie.net system runs almost exclusively on HP Servers with the following configurations:
- Bungie.net Web Servers:
- Dual Processor with Hyper-threading
- 4 GB of RAM
- Halo 2 Stats Database Server
- Quad Processor with Hyper-threading
- 4 GB of RAM
- Forums Database Server
- Dual Processor with Hyper-threading
- 4 GB of RAM
- Website Database Server
- Active/Passive SQL Server Cluster
- Dual Processor with Hyper-threading
- 4 GB of RAM
- Halo 2 LSP Servers
- Dual Processor with Hyper-threading
- 1 GB of RAM
The Bungie.net operations team based all of their operational and readiness procedures on the Microsoft Operations Framework. Additionally, leading up to the release of Halo 2, the operations team worked closely with the Bungie.net development team to aid in the architecture and design of the server farm, outline dependencies of Bungie.net, to create policies and escalation procedures for the entire team, and to define the initial and ongoing monitoring and deployment strategies for all the servers and software.
As expected, the weeks following the release of Halo 2 proved to be critical, and because of this the operations staffed a tier-three support team of system engineers on a 24 by 7 basis (3 shifts per day). During the first number of weeks this operational team provided 3 operation reports per shift that detailed overall site performance, availability statistics, and game statistics upload summaries. All systems were closely monitored for performance and stability and in the event of failure or unanticipated load peaks the operational team was ready to react, following their predetermined policies and emergency procedures.
In the weeks that followed the release of Halo 2, load on the servers began to stabilize and the operations team now focused on the ongoing support issues of the site, including occasional software updates and continual load monitoring. The operational team will also be responsible for scaling back the server farm if the load does not demand the resources to make the most effective use of hardware. Additionally, the operations team will be involved in the process of automating the game statistics archival process to manage the ongoing storage load that game statistics place on the storage systems.
"Without the ease of .NET we wouldn't have even started" . . .
The Bungie.net team achieved all of its goals and exceeded their own expectations of Bungie.net. In summary:— Zach Russell, Engineering Lead
- The Bungie.net site was able to manage the load of the release of Halo 2 without any major problems or downtime.
- The availability of the Web site has exceeded the 99 percent uptime goal.
- The Bungie.net site is extremely modular and service-oriented, allowing for ease of distribution and scale
- Over a month after the release of Halo 2, Bungie.net continues to log an average of nearly 1,000,000 online games per day
- The Service-Oriented Architecture approach taken by the team during the design and construction of Bungie.net, combined with the use of XML throughout, has lead to a solid and reusable design that has withstood load and provided the flexibility needed to extend and enhance in the future.
- The consumption of the RSS features of Bungie.net continues to be ground-breaking as online gamers use this data in inventive and interesting ways that the Bungie.net team could not have predicted when it conceived of the feature. You can read more about this on Bungie News.
- Archiving of Statistics Data: One of the growing concerns is the amount of data gathered from Xbox Live in the form of game play statistics. At the time of this case study no archival strategy was in place to deal with this. In order to deal with this load, only game details for games within the previous 2 weeks old will be retained. All overall game statistics, player ranking, and game summary information will still be available.
- Scaling back front-end Web servers: During the launch of Halo 2 it was important that the Bungie.net solution could absorb over 5 million page views per day and for this reason it used 9 front-end Web servers to support the load. At the time this case study was conducted the load has significantly leveled off to almost 3 million page views per day and only requires 7 front-end Web servers. This reduction in load will allow the Bungie.net operations team to further scale down the number of front-end Web servers required to serve up content to users.
Bungie.net was built entirely on Microsoft .NET Framework technologies and clearly demonstrates a well-architected and highly scalable Web solution. Bungie.net was constructed by only a handful of developers and testers in a relatively short period of time using an agile software development approach that also took into consideration the time and effort required to master the Microsoft .NET Framework development platform. Bungie.net was not only built on best practices, the Web site's operational practices are based on the Microsoft Operations Framework, which is itself based on ITIL (IT Infrastructure Library). These facts combined have enabled Bungie to not only break new ground in the online gaming community but to surpass all expectations of functionality, delivery time, reliability, and scaleability.
See you online.