MongoDB is an open source, document-oriented database designed with both scalability and developer agility in mind. In this article, I introduce both MongoDB and Windows Azure, and then explore the technical details and benefits of deploying MongoDB to the Microsoft cloud. I also address the pros and cons of using one of two possible Windows Azure cloud services—Windows Azure cloud service worker roles and Windows Azure virtual machines (VMs) with MongoDB.
Figure 1 Sample document stored in MongoDB
Nor does MongoDB use SQL syntax for queries. Instead, BSON enables developers to map easily to object-oriented languages.
Many types of use cases are viable for MongoDB, including those that have data storage with schema-less or schema-light source data. This type of data is sometimes called unstructured or semi-structured. Some real-world examples of applications that are built on this type of data include log files, health records and manufacturing event data. This flexibility in schema simplifies application development and reduces the impedance mismatch between .NET objects and persisted data that sometimes occurs when using a traditional relational database management system (RDBMS).
MongoDB also scales easily, through sharding and replica sets. In MongoDB, sharding is the partitioning of data among multiple machines while preserving the data in order, which allows for quick and easy scalability. Replica sets are a form of asynchronous master/slave replication that allow for high availability via automatic failover and automatic recovery of member nodes.
MongoDB was built for virtualized and cloud environments. Cloud services, such as Windows Azure, are a natural fit for MongoDB. By coupling the easy-to-scale architecture (for example, through sharding and replica sets) of MongoDB and the elastic cloud capacity of Windows Azure, you can quickly and easily grow your infrastructure as application demands increase.
If you’re new to MongoDB, you can find many resources on both the main MongoDB and other sites. In particular, you might want to download the quick reference cards (which are free) from the 10gen site. These cards include information about setup, commands, queries, indexing and T-SQL-to-MongoDB language mapping. On YouTube, you can also watch the set of five-minute screencasts I made on getting started with MongoDB.
NOTE 10gen is a company that not only develops MongoDB but also provides support and services, such as training for MongoDB. Among other community activities, 10gen supports local MongoDB user groups. 10gen also offers free online courses, available from its learning portal.
If you want to try MongoDB without installing anything, you can use an online interactive tutorial to get a sense of how to work with MongoDB from the command line. Figure 2 shows an example of the interactive Web interface for trying out MongoDB.
Figure 2 Trying out MongoDB from the command line
MongoDB novices can also grab one of the many GUI tools available to get up and running with simple administration quickly and in a familiar fashion. The tool I’ve been using is MongoVUE, which has a look and feel similar to SQL Server Management Studio. There is a free version of MongoVUE, but I prefer the licensed version (which costs $35 for a single user) because I use the included features. I particularly like the simple import from SQL Server that’s included in this version. A complete list of features shown via screenshots is on the MongoVUE Web site. Figure 3 shows the tree view of the server, databases, collections, indices and so on from a MongoDB server.
Figure 3 Tree view of MongoDB server in MongoVUE
When thinking about using MongoDB either on premise or in the cloud, you need to understand the basics of how to implement scaling with MongoDB. A sample configuration that uses both sharding and replica sets will help illustrate.
In MongoDB, within a particular collection of BSON documents, an individual document is limited in size to 16 TB. MongoDB also includes support for storing files even larger than the document limit via the GridFS. Files stored through the GridFS can also be sharded.
Figure 4 shows the high-level architecture of a scaled MongoDB solution. Along the top are four shards, each containing a distinct replica set. Each shard is a subset of the data, split via a shard key, similar to table partitioning in a RDBMS. For example, shard 1 in replica set A contains three copies of data documents with ID values 1–50; shard 2 in replica set B contains three copies of data documents 51–100; and so on. Each shard in a replica set consists of one primary node and (in this case) two secondary nodes, which serve as failover nodes in case the primary node goes down. Together, these three nodes serve as a replica set. You can configure up to seven voting replica set members.
Figure 4 Scaling MongoDB
The 2.2 release (from September 2012) of MongoDB has several substantial updates, including an aggregation framework (which provides a simpler way to write and execute queries), concurrency improvements (which include the ability to take locks at more granular levels than that of the entire database), an updated C# driver and more. You can think of the new aggregation capability as a simpler way to create Map/Reduce jobs. Many new operators, shown in Figure 5, support this capability.
igure 5 Aggregation operators added to MongoDB 2.2
As of release 2.2, MongoDB also includes a new monitoring service called MMS (for MongoDB Monitoring Service). This service is a cloud-hosted dashboard service that shows you the status of your MongoDB cluster. New monitoring commands include db.currentOp and db.serverStatus.
In addition, MongoDB 2.2 supports tag-aware sharding, which allows for more finely grained control over sharding configurations, such as the ability to geo-tag particular shards hosted in cloud data centers. This release also introduces TTL (time-to-live) collections. These collections can be expired after a configurable time value. For more details about the improvements in the MongoDB 2.2 release, visit mongodb.org/display/DOCS/v2.2.
Before considering how to implement MongoDB on Windows Azure, let’s look at applicable Windows Azure cloud services for cloud-based scenarios. When implementing MongoDB on the Microsoft cloud, you can consider a couple types of Windows Azure service architectures and products for your deployment. Let’s first review the relevant services.
Windows Azure VMs are the Microsoft Infrastructure as a Service (IaaS) offering. Similar to the Amazon Elastic Compute Cloud (EC2), Windows Azure VMs give you access to elastic, on-demand virtual servers. You can use preconfigured VMs with either Windows or Linux, configure the VMs based on your own preferences or the specific needs of your application, or build and deploy VMs from scratch. You manage the VMs yourself, including scaling, installing security patches, and ongoing performance monitoring and management. Windows Azure VMs also give you some control over their environments. (This VM service is currently in preview (beta) for Windows Azure.)
After you install MongoDB on a Windows Azure VM, you can develop MongoDB applications in any language MongoDB supports. Supported languages include .NET languages, Java, Node.js, PHP and more. You can deploy developed MongoDB applications to any location, including public clouds such as Windows Azure or Amazon Web Services (AWS), and on any private cloud.
Windows Azure worker roles are the Microsoft Platform as a Service (PaaS) offering. Worker roles provide prebuilt, preconfigured instances of computing power. In contrast to Windows Azure VMs, you don’t have to configure or manage Windows Azure worker roles. Windows Azure handles the deployment details—from provisioning and load balancing to health monitoring for continuous availability. Using worker roles can help developers who prefer not to manage their applications at the infrastructure level. The level of control that users have over their environments is restricted. Windows Azure Web roles are also part of the Microsoft PaaS offering. They are used to host applications such as Web sites. Another consideration when using Windows Azure Cloud Services includes the ability to use fault and upgrade domains, which I talk more about in the next section.
Windows Azure storage containers are one of several data storage choices in the Windows Azure suite. They are inexpensive and easy to use for both application data storage and data backups. Windows Azure storage scales automatically and without user involvement. The container limits are 100 TB per storage account, 1 MB per entity and 255 properties per entity.
The Windows Azure services mentioned in this article are just a subset of all possible Windows Azure services. I have only included services of interest for core MongoDB deployments. You might want to use additional cloud services, such as Windows Azure Media Services, in your particular application architecture.
Given that MongoDB runs on either Windows Azure VMs (IaaS) or Windows Azure worker roles and storage containers (PaaS), you need to consider the different capabilities and implementation details of each type of service in determining which product makes the most sense for your application.
After requesting (and being granted) access to the preview functionality for Windows Azure VMs, you can install MongoDB on your particular Windows Azure instances. Alternatively, you can use the recently released MongoDB Installer for Windows Azure to set up a MongoDB replica set quickly and easily on Windows Azure VMs. This tool is available for download at mongodb.org/display/DOCS/MongoDB+on+Azure+VM+–+Windows+Installer.
This installer is built on top of Windows PowerShell and the Windows Azure command line tool. It contains a number of deployment scripts, as shown in Figure 7. The tool is designed to help you get either single node or multinode MongoDB configurations up and running quickly on Windows Azure.
NOTE The MongoDB Installer for Windows Azure is designed to run on a user’s local machine—not directly on a Windows Azure VM—and then to deploy the output to Windows Azure VMs. The current build of preconfigured Windows Azure VMs with Windows doesn’t allow you to run PowerShell scripts on those VMs, so this installer won’t work if you attempt to run it directly on a preconfigured Windows Azure VM.
When using the MongoDB Installer for Windows Azure, installing and configuring a MongoDB replica set on Windows Azure VMs requires only two steps. To start the process, you should first download the publish settings file from the preceding download link. Next, you should run the installer from the command prompt. Figure 6 shows the files created by unzipping the downloaded deployment wizard.
Figure 6 Deployment files from the MongoDB Installer for Windows Azure deployment wizard
Using a Windows Azure VM, you also have the option to host your MongoDB instance on an OS other than Windows. In the current preview, you can create your own VMs or create a VM instance from one of several preinstalled OS configurations, which include various versions of Windows or Linux, as shown in Figure 7. The Windows Azure portal includes a step-by-step guide to installing MongoDB on a Windows Azure VM running Linux.
Figure 7 Windows Azure VM OS selection options
There are pros and cons to deploying MongoDB on Windows Azure VMs. These advantages and disadvantages are generally consistent with those related to using deploying solutions on IaaS more generally.
Using MongoDB on Windows Azure VMs has the following advantages:
Here are some disadvantages of using an IaaS implementation:
You can also select an alternate Windows Azure configuration for MongoDB. In this configuration, MongoDB servers run as Windows Azure worker roles and use Windows Azure storage containers for data storage. You can download and use a package (the MongoDB Replica Set Azure wrapper) to quickly set up this configuration. To see a detailed list of setup steps, visit mongodb.org/display/DOCS/MongoDB+on+Azure+Worker+Roles.
To perform the setup using the steps in the document just referenced, you must first download the MongoDB Replica Set Azure wrapper package from GitHub. Then you use the downloaded package to create the MongoDB replica set members. Each replica set member runs as a separate Windows Azure worker role instance. As mentioned earlier, MongoDB data files are stored in a Windows Azure storage instance that is mounted as a cloud drive.
The MongoDB Azure wrapper is delivered as a Microsoft Visual Studio 2010 solution with associated source files. This package contains a Visual Studio solution with the MongoDB binaries, sample code and sample applications. The package is designed to work with 64-bit versions of Windows with Internet Information Services and the June 2012 Windows Azure SDK installed. By default, the MongoDB C# driver is included as part of the package. You can also use other application languages, such as PHP and Node.js; you simply need to download those drivers.
You can also choose to configure Windows Azure upgrade domains manually in this deployment type. Fault and upgrade domains affect application availability and fault tolerance. Fault and upgrade domains are defined as follows:
The disadvantages to this configuration are as follows:
Text Box: Worker RoleText Box: Worker RoleText Box: Worker RoleText Box: Worker Role Figure 8 shows an application that leverages both Windows Azure services and MongoDB. In addition to the two daemons (instances of the MongoDB service, that is, “mongod”) that are shown in green as “mongos,” notice the five worker role instances. One worker role hosts the MongoDB config servers, and each of the other four worker roles hosts a replica set. Each replica set hosts one (of the four total) data shard. Within each replica set are one primary and two secondary instances of mongod. Not shown on the diagram is the fact that each replica set connects to an instance (endpoint) of Windows Azure container storage. In this scenario, each instance of cloud storage would be mounted as a Windows Azure cloud drive.
Figure 8 Scaling MongoDB on Windows Azure
Figure8 is just one sample configuration. MongoDB and Windows Azure services offer many options around scalability and availability, depending on your budget and application requirements.
Windows Azure provides an attractive cloud platform for hosting MongoDB. Table 1 summarizes the strengths and weaknesses of using Windows Azure worker roles and Windows Azure VMs with MongoDB.
Table 1 Pros and Cons of Windows Azure Worker Roles and Windows Azure VMs
|IaaS – Windows Azure VMs|
|PaaS – Windows Azure Cloud Services Worker Roles|
Given the ease of use, flexibility and scalability of MongoDB, it’s a natural choice for developers. You can quickly implement scalable, reliable and agile database solutions with MongoDB on Windows Azure.
Lynn Langit is a consultant and an author and a trainer for DevelopMentor on SQL Server 2012 topics. She is also a MongoDB Master. Prior to her current work, she founded and served as lead architect of a development firm that created BI solutions and was a developer evangelist for the Microsoft MSDN team for four years.