MongoDB is an open source, document-oriented database designed with both scalability and developer agility in mind. In this article, I introduce both MongoDB and Azure, and then explore the technical details and benefits of deploying MongoDB to the Microsoft cloud. I also address the pros and cons of using one of two possible Azure cloud services—Azure cloud service worker roles and Azure virtual machines (VMs) with MongoDB.
Figure 1 Sample document stored in MongoDB
Nor does MongoDB use SQL syntax for queries. Instead, BSON enables developers to map easily to object-oriented languages.
Many types of use cases are viable for MongoDB, including those that have data storage with schema-less or schema-light source data. This type of data is sometimes called unstructured or semi-structured. Some real-world examples of applications that are built on this type of data include log files, health records and manufacturing event data. This flexibility in schema simplifies application development and reduces the impedance mismatch between .NET objects and persisted data that sometimes occurs when using a traditional relational database management system (RDBMS).
MongoDB also scales easily, through sharding and replica sets. In MongoDB, sharding is the partitioning of data among multiple machines while preserving the data in order, which allows for quick and easy scalability. Replica sets are a form of asynchronous master/slave replication that allow for high availability via automatic failover and automatic recovery of member nodes.
MongoDB was built for virtualized and cloud environments. Cloud services, such as Azure, are a natural fit for MongoDB. By coupling the easy-to-scale architecture (for example, through sharding and replica sets) of MongoDB and the elastic cloud capacity of Azure, you can quickly and easily grow your infrastructure as application demands increase.
If you’re new to MongoDB, you can find many resources on both the main MongoDB and other sites. In particular, you might want to download the quick reference cards (which are free) from the 10gen site. These cards include information about setup, commands, queries, indexing and T-SQL-to-MongoDB language mapping. On YouTube, you can also watch the set of five-minute screencasts I made on getting started with MongoDB.
NOTE 10gen is a company that not only develops MongoDB but also provides support and services, such as training for MongoDB. Among other community activities, 10gen supports local MongoDB user groups. 10gen also offers free online courses, available from its learning portal.
If you want to try MongoDB without installing anything, you can use an online interactive tutorial to get a sense of how to work with MongoDB from the command line. Figure 2 shows an example of the interactive Web interface for trying out MongoDB.
Figure 2 Trying out MongoDB from the command line
MongoDB novices can also grab one of the many GUI tools available to get up and running with simple administration quickly and in a familiar fashion. The tool I’ve been using is MongoVUE, which has a look and feel similar to SQL Server Management Studio. There is a free version of MongoVUE, but I prefer the licensed version (which costs $35 for a single user) because I use the included features. I particularly like the simple import from SQL Server that’s included in this version. A complete list of features shown via screenshots is on the MongoVUE Web site. Figure 3 shows the tree view of the server, databases, collections, indices and so on from a MongoDB server.
Figure 3 Tree view of MongoDB server in MongoVUE
When thinking about using MongoDB either on premise or in the cloud, you need to understand the basics of how to implement scaling with MongoDB. A sample configuration that uses both sharding and replica sets will help illustrate.
In MongoDB, within a particular collection of BSON documents, an individual document is limited in size to 16 TB. MongoDB also includes support for storing files even larger than the document limit via the GridFS. Files stored through the GridFS can also be sharded.
Figure 4 shows the high-level architecture of a scaled MongoDB solution. Along the top are four shards, each containing a distinct replica set. Each shard is a subset of the data, split via a shard key, similar to table partitioning in a RDBMS. For example, shard 1 in replica set A contains three copies of data documents with ID values 1–50; shard 2 in replica set B contains three copies of data documents 51–100; and so on. Each shard in a replica set consists of one primary node and (in this case) two secondary nodes, which serve as failover nodes in case the primary node goes down. Together, these three nodes serve as a replica set. You can configure up to seven voting replica set members.
Figure 4 Scaling MongoDB
To learn more about replica sets in MongoDB, go to mongodb.org/display/DOCS/Replica+Sets. To find out more about scaling options in MongoDB, see mongodb.org/display/DOCS/Sharding+Introduction.
The 2.2 release (from September 2012) of MongoDB has several substantial updates, including an aggregation framework (which provides a simpler way to write and execute queries), concurrency improvements (which include the ability to take locks at more granular levels than that of the entire database), an updated C# driver and more. You can think of the new aggregation capability as a simpler way to create Map/Reduce jobs. Many new operators, shown in Figure 5, support this capability.
igure 5 Aggregation operators added to MongoDB 2.2
As of release 2.2, MongoDB also includes a new monitoring service called MMS (for MongoDB Monitoring Service). This service is a cloud-hosted dashboard service that shows you the status of your MongoDB cluster. New monitoring commands include db.currentOp and db.serverStatus.
In addition, MongoDB 2.2 supports tag-aware sharding, which allows for more finely grained control over sharding configurations, such as the ability to geo-tag particular shards hosted in cloud data centers. This release also introduces TTL (time-to-live) collections. These collections can be expired after a configurable time value. For more details about the improvements in the MongoDB 2.2 release, visit mongodb.org/display/DOCS/v2.2.
Before considering how to implement MongoDB on Azure, let’s look at applicable Azure cloud services for cloud-based scenarios. When implementing MongoDB on the Microsoft cloud, you can consider a couple types of Azure service architectures and products for your deployment. Let’s first review the relevant services.
Azure VMs are the Microsoft Infrastructure as a Service (IaaS) offering. Similar to the Amazon Elastic Compute Cloud (EC2), Azure VMs give you access to elastic, on-demand virtual servers. You can use preconfigured VMs with either Windows or Linux, configure the VMs based on your own preferences or the specific needs of your application, or build and deploy VMs from scratch. You manage the VMs yourself, including scaling, installing security patches, and ongoing performance monitoring and management. Azure VMs also give you some control over their environments. (This VM service is currently in preview (beta) for Azure.)
After you install MongoDB on an Azure VM, you can develop MongoDB applications in any language MongoDB supports. Supported languages include .NET languages, Java, Node.js, PHP and more. You can deploy developed MongoDB applications to any location, including public clouds such as Azure or Amazon Web Services (AWS), and on any private cloud.
Azure worker roles are the Microsoft Platform as a Service (PaaS) offering. Worker roles provide prebuilt, preconfigured instances of computing power. In contrast to Azure VMs, you don’t have to configure or manage Azure worker roles. Azure handles the deployment details—from provisioning and load balancing to health monitoring for continuous availability. Using worker roles can help developers who prefer not to manage their applications at the infrastructure level. The level of control that users have over their environments is restricted. Azure Web roles are also part of the Microsoft PaaS offering. They are used to host applications such as Web sites. Another consideration when using Azure Cloud Services includes the ability to use fault and upgrade domains, which I talk more about in the next section.
Azure storage containers are one of several data storage choices in the Azure suite. They are inexpensive and easy to use for both application data storage and data backups. Azure storage scales automatically and without user involvement. The container limits are 100 TB per storage account, 1 MB per entity and 255 properties per entity.
The Azure services mentioned in this article are just a subset of all possible Azure services. I have only included services of interest for core MongoDB deployments. You might want to use additional cloud services, such as Azure Media Services, in your particular application architecture.
For more information on the full features of Azure, see windowsazure.com/en-us/home/features/overview/. To find out more about Azure or to set up an account, visit windowsazure.com.
Given that MongoDB runs on either Azure VMs (IaaS) or Azure worker roles and storage containers (PaaS), you need to consider the different capabilities and implementation details of each type of service in determining which product makes the most sense for your application.
After requesting (and being granted) access to the preview functionality for Azure VMs, you can install MongoDB on your particular Azure instances. Alternatively, you can use the recently released MongoDB Installer for Azure to set up a MongoDB replica set quickly and easily on Azure VMs. This tool is available for download at mongodb.org/display/DOCS/MongoDB+on+Azure+VM+–+Windows+Installer.
This installer is built on top of Windows PowerShell and the Azure command line tool. It contains a number of deployment scripts, as shown in Figure 7. The tool is designed to help you get either single node or multinode MongoDB configurations up and running quickly on Azure.
NOTE The MongoDB Installer for Azure is designed to run on a user’s local machine—not directly on an Azure VM—and then to deploy the output to Azure VMs. The current build of preconfigured Azure VMs with Windows doesn’t allow you to run PowerShell scripts on those VMs, so this installer won’t work if you attempt to run it directly on a preconfigured Azure VM.
When using the MongoDB Installer for Azure, installing and configuring a MongoDB replica set on Azure VMs requires only two steps. To start the process, you should first download the publish settings file from the preceding download link. Next, you should run the installer from the command prompt. Figure 6 shows the files created by unzipping the downloaded deployment wizard.
Figure 6 Deployment files from the MongoDB Installer for Azure deployment wizard
Using an Azure VM, you also have the option to host your MongoDB instance on an OS other than Windows. In the current preview, you can create your own VMs or create a VM instance from one of several preinstalled OS configurations, which include various versions of Windows or Linux, as shown in Figure 7. The Azure portal includes a step-by-step guide to installing MongoDB on an Azure VM running Linux.
Figure 7 Azure VM OS selection options
There are pros and cons to deploying MongoDB on Azure VMs. These advantages and disadvantages are generally consistent with those related to using deploying solutions on IaaS more generally.
Using MongoDB on Azure VMs has the following advantages:
Here are some disadvantages of using an IaaS implementation:
You can also select an alternate Azure configuration for MongoDB. In this configuration, MongoDB servers run as Azure worker roles and use Azure storage containers for data storage. You can download and use a package (the MongoDB Replica Set Azure wrapper) to quickly set up this configuration. To see a detailed list of setup steps, visit mongodb.org/display/DOCS/MongoDB+on+Azure+Worker+Roles.
To perform the setup using the steps in the document just referenced, you must first download the MongoDB Replica Set Azure wrapper package from GitHub. Then you use the downloaded package to create the MongoDB replica set members. Each replica set member runs as a separate Azure worker role instance. As mentioned earlier, MongoDB data files are stored in an Azure storage instance that is mounted as a cloud drive.
The MongoDB Azure wrapper is delivered as a Microsoft Visual Studio 2010 solution with associated source files. This package contains a Visual Studio solution with the MongoDB binaries, sample code and sample applications. The package is designed to work with 64-bit versions of Windows with Internet Information Services and the June 2012 Azure SDK installed. By default, the MongoDB C# driver is included as part of the package. You can also use other application languages, such as PHP and Node.js; you simply need to download those drivers.
You can also choose to configure Azure upgrade domains manually in this deployment type. Fault and upgrade domains affect application availability and fault tolerance. Fault and upgrade domains are defined as follows:
The disadvantages to this configuration are as follows:
Text Box: Worker RoleText Box: Worker RoleText Box: Worker RoleText Box: Worker Role Figure 8 shows an application that leverages both Azure services and MongoDB. In addition to the two daemons (instances of the MongoDB service, that is, “mongod”) that are shown in green as “mongos,” notice the five worker role instances. One worker role hosts the MongoDB config servers, and each of the other four worker roles hosts a replica set. Each replica set hosts one (of the four total) data shard. Within each replica set are one primary and two secondary instances of mongod. Not shown on the diagram is the fact that each replica set connects to an instance (endpoint) of Azure container storage. In this scenario, each instance of cloud storage would be mounted as an Azure cloud drive.
Figure 8 Scaling MongoDB on Azure
Figure8 is just one sample configuration. MongoDB and Azure services offer many options around scalability and availability, depending on your budget and application requirements.
Azure provides an attractive cloud platform for hosting MongoDB. Table 1 summarizes the strengths and weaknesses of using Azure worker roles and Azure VMs with MongoDB.
Table 1 Pros and Cons of Azure Worker Roles and Azure VMs
Given the ease of use, flexibility and scalability of MongoDB, it’s a natural choice for developers. You can quickly implement scalable, reliable and agile database solutions with MongoDB on Azure.
Lynn Langit is a consultant and an author and a trainer for DevelopMentor on SQL Server 2012 topics. She is also a MongoDB Master. Prior to her current work, she founded and served as lead architect of a development firm that created BI solutions and was a developer evangelist for the Microsoft MSDN team for four years.
I think I found a type on your document. "In MongoDB, within a particular collection of BSON documents, an individual document is limited in size to 16 TB." When actually is 16MB per BSON in a collection. From documentation "GridFS is a specification for storing and retrieving files that exceed the BSON-document size limit of 16MB." Link here: http://docs.mongodb.org/manual/core/gridfs/
Nice article thanks Lynn. How does MongoLab in the Azure Store fit into the roadmap of all this in your opinion? I'm also interested in how i can write an application once, then have the flexibility to deploy it either on-premise or in the cloud at some point in the future. Is it safe to assume that data created and stored in one scenario can be simply migrated to the other scenario through MongoDB export/import functionality?
More MSDN Magazine Blog entries >
Browse All MSDN Magazines
Subscribe to MSDN Flash newsletter
Receive the MSDN Flash e-mail newsletter every other week, with news and information personalized to your interests and areas of focus.