November 2012

Volume 27 Number 11

Microsoft Azure - MongoDB on Microsoft Azure for .NET Developers

By Lynn Langit | November 2012

MongoDB is an open source, document-oriented database designed with both scalability and developer agility in mind. In this article, I introduce both MongoDB and Azure, and then explore the technical details and benefits of deploying MongoDB to the Microsoft cloud. I also address the pros and cons of using one of two possible Azure cloud services—Azure cloud service worker roles and Azure virtual machines (VMs) with MongoDB.

About MongoDB

Unlike traditional relational databases, MongoDB doesn’t store data in tables and rows. Rather, it stores BSON (binary serialized object notation) documents, which are binary JSON (JavaScript Object Notation) documents, with dynamic schemas. These BSON documents are stored in collections, which are named groupings of documents. An example BSON document is shown in Figure 1.

Sample document stored in MongoDB
Figure 1 Sample document stored in MongoDB

Nor does MongoDB use SQL syntax for queries. Instead, BSON enables developers to map easily to object-oriented languages.

Many types of use cases are viable for MongoDB, including those that have data storage with schema-less or schema-light source data. This type of data is sometimes called unstructured or semi-structured. Some real-world examples of applications that are built on this type of data include log files, health records and manufacturing event data. This flexibility in schema simplifies application development and reduces the impedance mismatch between .NET objects and persisted data that sometimes occurs when using a traditional relational database management system (RDBMS).

MongoDB also scales easily, through sharding and replica sets. In MongoDB, sharding is the partitioning of data among multiple machines while preserving the data in order, which allows for quick and easy scalability. Replica sets are a form of asynchronous master/slave replication that allow for high availability via automatic failover and automatic recovery of member nodes.

MongoDB was built for virtualized and cloud environments. Cloud services, such as Azure, are a natural fit for MongoDB. By coupling the easy-to-scale architecture (for example, through sharding and replica sets) of MongoDB and the elastic cloud capacity of Azure, you can quickly and easily grow your infrastructure as application demands increase.

Getting Started with MongoDB

If you’re new to MongoDB, you can find many resources on both the main MongoDB and other sites. In particular, you might want to download the quick reference cards (which are free) from the 10gen site. These cards include information about setup, commands, queries, indexing and T-SQL-to-MongoDB language mapping. On YouTube, you can also watch the set of five-minute screencasts I made on getting started with MongoDB.

NOTE 10gen is a company that not only develops MongoDB but also provides support and services, such as training for MongoDB. Among other community activities, 10gen supports local MongoDB user groups. 10gen also offers free online courses, available from its learning portal.

If you want to try MongoDB without installing anything, you can use an online interactive tutorial to get a sense of how to work with MongoDB from the command line. Figure 2 shows an example of the interactive Web interface for trying out MongoDB.

Trying out MongoDB from the command line
Figure 2 Trying out MongoDB from the command line

MongoDB novices can also grab one of the many GUI tools available to get up and running with simple administration quickly and in a familiar fashion. The tool I’ve been using is MongoVUE, which has a look and feel similar to SQL Server Management Studio. There is a free version of MongoVUE, but I prefer the licensed version (which costs $35 for a single user) because I use the included features. I particularly like the simple import from SQL Server that’s included in this version. A complete list of features shown via screenshots is on the MongoVUE Web site. Figure 3 shows the tree view of the server, databases, collections, indices and so on from a MongoDB server.

 

Tree view of MongoDB server in MongoVUE
Figure 3 Tree view of MongoDB server in MongoVUE

MongoDB Scalability

When thinking about using MongoDB either on premise or in the cloud, you need to understand the basics of how to implement scaling with MongoDB. A sample configuration that uses both sharding and replica sets will help illustrate.

In MongoDB, within a particular collection of BSON documents, an individual document is limited in size to 16 TB. MongoDB also includes support for storing files even larger than the document limit via the GridFS. Files stored through the GridFS can also be sharded.

Figure 4shows the high-level architecture of a scaled MongoDB solution. Along the top are four shards, each containing a distinct replica set. Each shard is a subset of the data, split via a shard key, similar to table partitioning in a RDBMS. For example, shard 1 in replica set A contains three copies of data documents with ID values 1–50; shard 2 in replica set B contains three copies of data documents 51–100; and so on. Each shard in a replica set consists of one primary node and (in this case) two secondary nodes, which serve as failover nodes in case the primary node goes down. Together, these three nodes serve as a replica set. You can configure up to seven voting replica set members.

 

Scaling MongoDB
Figure 4  Scaling MongoDB

To learn more about replica sets in MongoDB, go to docs.mongodb.org/manual/administration/replica-sets. To find out more about scaling options in MongoDB, see docs.mongodb.org/manual/core/sharding-introduction.

Recent Updates to MongoDB (2.2 Release)

The 2.2 release (from September 2012) of MongoDB has several substantial updates, including an aggregation framework (which provides a simpler way to write and execute queries), concurrency improvements (which include the ability to take locks at more granular levels than that of the entire database), an updated C# driver and more. You can think of the new aggregation capability as a simpler way to create Map/Reduce jobs. Many new operators, shown in Figure 5, support this capability.

Aggregation operators added to MongoDB 2.2
igure 5 Aggregation operators added to MongoDB 2.2

As of release 2.2, MongoDB also includes a new monitoring service called MMS (for MongoDB Monitoring Service). This service is a cloud-hosted dashboard service that shows you the status of your MongoDB cluster. New monitoring commands include db.currentOp and db.serverStatus.

In addition, MongoDB 2.2 supports tag-aware sharding, which allows for more finely grained control over sharding configurations, such as the ability to geo-tag particular shards hosted in cloud data centers. This release also introduces TTL (time-to-live) collections. These collections can be expired after a configurable time value. For more details about the improvements in the MongoDB 2.2 release, visit mongodb.org/display/DOCS/v2.2.

Azure Services for MongoDB

Before considering how to implement MongoDB on Azure, let’s look at applicable Azure cloud services for cloud-based scenarios. When implementing MongoDB on the Microsoft cloud, you can consider a couple types of Azure service architectures and products for your deployment. Let’s first review the relevant services.

Azure Virtual Machines

Azure VMs are the Microsoft Infrastructure as a Service (IaaS) offering. Similar to the Amazon Elastic Compute Cloud (EC2), Azure VMs give you access to elastic, on-demand virtual servers. You can use preconfigured VMs with either Windows or Linux, configure the VMs based on your own preferences or the specific needs of your application, or build and deploy VMs from scratch. You manage the VMs yourself, including scaling, installing security patches, and ongoing performance monitoring and management. Azure VMs also give you some control over their environments. (This VM service is currently in preview (beta) for Azure.)

After you install MongoDB on an Azure VM, you can develop MongoDB applications in any language MongoDB supports. Supported languages include .NET languages, Java, Node.js, PHP and more. You can deploy developed MongoDB applications to any location, including public clouds such as Azure or Amazon Web Services (AWS), and on any private cloud.

Azure Cloud Services

 Azure worker roles are the Microsoft Platform as a Service (PaaS) offering. Worker roles provide prebuilt, preconfigured instances of computing power. In contrast to Azure VMs, you don’t have to configure or manage Azure worker roles. Azure handles the deployment details—from provisioning and load balancing to health monitoring for continuous availability. Using worker roles can help developers who prefer not to manage their applications at the infrastructure level. The level of control that users have over their environments is restricted. Azure Web roles are also part of the Microsoft PaaS offering. They are used to host applications such as Web sites. Another consideration when using Azure Cloud Services includes the ability to use fault and upgrade domains, which I talk more about in the next section.

Azure Storage Drives

Azure storage containers are one of several data storage choices in the Azure suite. They are inexpensive and easy to use for both application data storage and data backups. Azure storage scales automatically and without user involvement. The container limits are 100 TB per storage account, 1 MB per entity and 255 properties per entity.

The Azure services mentioned in this article are just a subset of all possible Azure services. I have only included services of interest for core MongoDB deployments. You might want to use additional cloud services, such as Azure Media Services, in your particular application architecture.

For more information on the full features of Azure, see windowsazure.com/en-us/home/features/overview/. To find out more about Azure or to set up an account, visit windowsazure.com.

Understanding Implementation Details: IaaS or PaaS

Given that MongoDB runs on either Azure VMs (IaaS) or Azure worker roles and storage containers (PaaS), you need to consider the different capabilities and implementation details of each type of service in determining which product makes the most sense for your application.

Using Azure Virtual Machines

After requesting (and being granted) access to the preview functionality for Azure VMs, you can install MongoDB on your particular Azure instances. Alternatively, you can use the recently released MongoDB Installer for Azure to set up a MongoDB replica set quickly and easily on Azure VMs. This tool is available for download at docs.mongodb.org/ecosystem/platforms/windows-azure.

This installer is built on top of Windows PowerShell and the Azure command line tool. It contains a number of deployment scripts, as shown in Figure 7. The tool is designed to help you get either single node or multinode MongoDB configurations up and running quickly on Azure.

NOTE The MongoDB Installer for Azure is designed to run on a user’s local machine—not directly on an Azure VM—and then to deploy the output to Azure VMs. The current build of preconfigured Azure VMs with Windows doesn’t allow you to run PowerShell scripts on those VMs, so this installer won’t work if you attempt to run it directly on a preconfigured Azure VM.

When using the MongoDB Installer for Azure, installing and configuring a MongoDB replica set on Azure VMs requires only two steps. To start the process, you should first download the publish settings file from the preceding download link. Next, you should run the installer from the command prompt. Figure 6 shows the files created by unzipping the downloaded deployment wizard.

Deployment files from the MongoDB Installer for Azure deployment wizard
Figure 6 Deployment files from the MongoDB Installer for Azure deployment wizard

Using an Azure VM, you also have the option to host your MongoDB instance on an OS other than Windows. In the current preview, you can create your own VMs or create a VM instance from one of several preinstalled OS configurations, which include various versions of Windows or Linux, as shown in Figure 7. The Azure portal includes a step-by-step guide to installing MongoDB on an Azure VM running Linux.

Azure VM OS selection options
Figure 7 Azure VM OS selection options

There are pros and cons to deploying MongoDB on Azure VMs. These advantages and disadvantages are generally consistent with those related to using deploying solutions on IaaS more generally.

Using MongoDB on Azure VMs has the following advantages:

  • You have more control over your machine configuration than if you use Azure worker roles. Put another way, on Azure VMs, you can install and configure services, define policies, define and capture machine logs and more.
  • You can use a non-Windows OS (Linux) on your VM.
  • You can control backup and restore Windows.
  • You can easily set up this configuration by using the MongoDB Installer for Azure.
  • You can select from several flexible deployment options, such as machine [instance] sizing.

Here are some disadvantages of using an IaaS implementation:

  • This service is still in beta (preview), so you can’t currently use this configuration for production loads.
  • The pricing for the service hasn’t yet been announced.
  • Security management is “roll your own,” which means that users must define and implement their own machine security measures.

Using Azure Cloud Services Worker Roles

You can also select an alternate Azure configuration for MongoDB. In this configuration, MongoDB servers run as Azure worker roles and use Azure storage containers for data storage. You can download and use a package (the MongoDB Replica Set Azure wrapper) to quickly set up this configuration. To see a detailed list of setup steps, visit docs.mongodb.org/ecosystem/platforms/windows-azure.

To perform the setup using the steps in the document just referenced, you must first download the MongoDB Replica Set Azure wrapper package from GitHub. Then you use the downloaded package to create the MongoDB replica set members. Each replica set member runs as a separate Azure worker role instance. As mentioned earlier, MongoDB data files are stored in an Azure storage instance that is mounted as a cloud drive.

The MongoDB Azure wrapper is delivered as a Microsoft Visual Studio 2010 solution with associated source files. This package contains a Visual Studio solution with the MongoDB binaries, sample code and sample applications. The package is designed to work with 64-bit versions of Windows with Internet Information Services and the June 2012 Azure SDK installed. By default, the MongoDB C# driver is included as part of the package. You can also use other application languages, such as PHP and Node.js; you simply need to download those drivers.

You can also choose to configure Azure upgrade domains manually in this deployment type. Fault and upgrade domains affect application availability and fault tolerance. Fault and upgrade domains are defined as follows:

  • A fault domain is a physical unit of failure and is closely related to the physical infrastructure in the Azure data centers. In Azure the rack can be considered a fault domain. You can’t control the allocation of a fault domain, but you can programmatically find out which fault domain a service is running within.
  • An upgrade domain is a configurable, logical unit that determines how a particular service is upgraded. The default number of upgrade domains that are configured for an application is five.
  • Running MongoDB on Azure Worker Role has both the benefits and the drawbacks of using PaaS more generally as well some that are specific to MongoDB. Here are the advantages of using this configuration:
  • Microsoft manages VM OS updates and security, so you don’t have to perform these tasks.
  • Microsoft manages VM backup and restore.
  • Microsoft scales VM instances as you need them.
  • Microsoft offers other options, such as affinity groups. Affinity groups give you the option of locating Azure services in the same physical data center location for maximum performance across the application.
  • Microsoft offers configurable update domains so that you have more control over the location from which the system restarts in case of failure.
  • Microsoft manages replication of Azure storage. User data is triple-replicated with immediate failover within the same data center and is also geo-replicated between two data centers hundreds of miles apart.

The disadvantages to this configuration are as follows:

  • You can’t configure the OS, so you have to develop applications that run on the predefined machine configuration.
  • You can’t deploy any OS other than Windows.
  • You can’t share your cluster across deployments, so your application must function within the context of a single deployment.

MongoDB on Azure Scalability: An Example

Text Box: Worker RoleText Box: Worker RoleText Box: Worker RoleText Box: Worker Role Figure 8 shows an application that leverages both Azure services and MongoDB. In addition to the two daemons (instances of the MongoDB service, that is, “mongod”) that are shown in green as “mongos,” notice the five worker role instances. One worker role hosts the MongoDB config servers, and each of the other four worker roles hosts a replica set. Each replica set hosts one (of the four total) data shard. Within each replica set are one primary and two secondary instances of mongod. Not shown on the diagram is the fact that each replica set connects to an instance (endpoint) of Azure container storage. In this scenario, each instance of cloud storage would be mounted as an Azure cloud drive.

Scaling MongoDB on Azure
Figure 8 Scaling MongoDB on Azure

Figure8 is just one sample configuration. MongoDB and Azure services offer many options around scalability and availability, depending on your budget and application requirements.

Summing Up and Next Steps

Azure provides an attractive cloud platform for hosting MongoDB. Table 1 summarizes the strengths and weaknesses of using Azure worker roles and Azure VMs with MongoDB.

Table 1 Pros and Cons of Azure Worker Roles and Azure VMs

MongoDB on Pros Cons
IaaS – Azure VMs
  • Non-Windows VMs (Linux)
  • Backup/restore on demand
  • Developer fully controls configuration
  • Still in preview (beta)
  • Developer must manage security
  • Developer must manage updates/patches
PaaS – Azure Cloud Services Worker Roles
  • Automated server scaling
  • Automatic OS updates/patches
  • Optimal security
  • Use of fault/update domains
  • No individual instance control
  • Cluster can be deployed only across a single deployment

Given the ease of use, flexibility and scalability of MongoDB, it’s a natural choice for developers. You can quickly implement scalable, reliable and agile database solutions with MongoDB on Azure.


Lynn Langit is a consultant and an author and a trainer for DevelopMentor on SQL Server 2012 topics. She is also a MongoDB Master. Prior to her current work, she founded and served as lead architect of a development firm that created BI solutions and was a developer evangelist for the Microsoft MSDN team for four years.