September 2012

Volume 27 Number 09

Forecast: Cloudy - Humongous Microsoft Azure

By Joseph Fultz | September 2012

First Things First

If you’re thinking of trying out MongoDB or considering it as an alternative to Azure SQL Database or Azure Tables, you need to be aware of some issues on the design and planning side, some related to infrastructure and some to development.

Deployment Architecture

Generally, the data back end needs to be available and durable. To do this with MongoDB, you use a replication set. Replication sets provide both failover and replication, using a little bit of artificial intelligence (AI) to resolve any tie in electing the primary node of the set. What this means for your Azure roles is that you’ll need three instances to set up a minimal replication set, plus a storage location you can map to a drive for each of those roles. Be aware that due to differences in virtual machines (VMs), you’ll likely want to have at least midsize VMs for any significant deployment. Otherwise, the memory or CPU could quickly become a bottleneck.

Figure 1 depicts a typical architecture for deploying a minimal MongoDB ReplicaSet that’s not exposed to the public. You could convert it to expose the data store externally, but it’s better to do that via a service layer. One of the problems that MongoDB can help address via its built-in features is designing and deploying a distributed data architecture. MongoDB has a full feature set to support sharding; combine that feature with ReplicaSets and Azure Compute and you have a data store that’s highly scalable, distributed and reliable. To help get you started, 10gen provides a sample solution that sets up a minimal ReplicaSet. You’ll find the information at bit.ly/NZROWJ and you can grab the files from GitHub at bit.ly/L6cqMF.

Azure MongoDB Deployment
Figure 1 Azure MongoDB Deployment

Data Schema

Being a wiz at DB schema design may actually hinder you when designing for a NoSQL approach. The skills required are more like object modeling and integration design for messaging infrastructures. There are two reasons for this:

  1. The data is viewed as a document and sometimes contains nested objects or documents.
  2. There’s minimal support for joins, so you have to balance the storage format of the data against the implications of nesting and the number of calls the client has to make to get a single view.

One of the first activities of moving from a relational mindset to the MongoDB document perspective is redesigning the data schema. For some objects that are separate in a relational model, the separation is maintained. For example, Products and Orders will still be separate schema in MongoDB, and you’ll still use a foreign key to do lookups between the two. Oversimplifying a bit, the redesign for these two objects in relation to one another is mostly straightforward, as shown in Figure 2.

Direct Schema Translation
Figure 2 Direct Schema Translation

However, it may not be as easy when you work with schemas that aren’t as cleanly separated conceptually, even though they may be easily and obviously separated in a relational model. For example, Customers and CustomerAddresses are entities that might be merged such that Customer will contain a collection of associated addresses (see Figure 3).

Converting Relational Schema to Nested Object Schema
Figure 3 Converting Relational Schema to Nested Object Schema

You’ll need to take a careful look at your relational model and consider every foreign key relationship and how that will get represented in the entity graph as it’s translated to the NoSQL model.

Data Interaction

Both query behavior and caching behavior are important in a relational system, but it’s caching behavior that remains most important here. Much as with Azure Tables, it’s easy to drop an object into MongoDB. And unlike Azure Tables and more like Azure SQL Databases, any of the fields can be indexed, which allows for better query performance on single objects. However, the lack of joins (and general lack of query expressiveness) turns what could once be a query with one or more joins for a chunky data return into multiple calls to the back-end data store to fetch that same data. This can be a little daunting if you want to fetch a collection of objects and then fetch a related collection for each item in the first collection. So, using my relational pubs database, I might write a SQL query that looks something like the following to fetch all author last names and all titles from each author:

Select authors.au_lname, authors.au_id,
  titles.title_id, titles.title
From authors inner join titleauthor
  on authors.au_id = titleauthor.au_id
  inner join titles on
  titles.title_id = titleauthor.title_id
Order By authors.au_lname

In contrast, to get the same data using the C# driver and MongoDB, the code looks like what’s shown in Figure 4.

Figure 4 Joining to MongoDB Collections

MongoDatabase mongoPubs = _mongoServer.GetDatabase("Pubs");
MongoCollection<BsonDocument> authorsCollection =
  mongoPubs.GetCollection("Authors");
MongoCursor<BsonDocument> authors = authorsCollection.FindAll();
string auIdQueryString = default(string);           
Dictionary<string,BsonDocument> authorTitles =
  new Dictionary<string,BsonDocument>();
// Build string for "In" comparison
// Build list of author documents, add titles next
foreach (BsonDocument bsonAuthor in authors)
{
  auIdQueryString = bsonAuthor["au_id"].ToString() + ",";
  authorTitles.Add(bsonAuthor["au_id"].ToString(), 
    new BsonDocument{{"au_id",
    bsonAuthor["au_id"].ToString()},
   {"au_lname", bsonAuthor["au_lname"]}});
   authorTitles.Add("titles",
   new BsonDocument(new Dictionary<string,object>()));
}
// Adjust last character
auIdQueryString = auIdQueryString.Remove(auIdQueryString.Length-1,1);
// Create query
QueryComplete titleByAu_idQuery = Query.In("au_id", auIdQueryString);
Dictionary<string, BsonDocument> bsonTitlesToAdd =
  new Dictionary<string,BsonDocument>();
// Execute query, coalesce authors and titles
foreach (BsonDocument bsonTitle in 
  authorsCollection.Find(titleByAu_idQuery))
{
  Debug.WriteLine(bsonTitle.ToJson());
  // Add to author BsonDocument
  BsonDocument authorTitlesDoc = 
    authorTitles[bsonTitle["au_id"].ToString()];
  ((IDictionary<string, object>) authorTitlesDoc["titles"]).Add(bsonTitle["title_id"].ToString(), 
      bsonTitle);
}

There are ways you might optimize this through code and structure, but don’t miss the point that while MongoDB is well-suited for direct queries even on nested objects, more complex queries that require cross-entity sets are a good bit more … well, let’s just say more manual. Most of us use LINQ to help bridge the object-to-relational world. The interesting thing with MongoDB is that you’ll want that bridge, but for the opposite reason—you’ll miss the relational functionality.

You might also miss referential constraints, especially foreign key constraints. Because you can literally add anything into the MongoDB collection, an item may or may not have the proper data to relate it to other entities. While this might seem like a failing of the platform if you’re a die-hard RDBMS fan, it isn’t. It is, in fact, a departure in philosophy. For NoSQL databases in general, the idea is to move the intelligence in the system out of the data store and let the data store focus on the reading and writing of data. Thus, if you feel the need to explicitly enforce things like foreign key constraints in your MongoDB implementation, you’ll do that through the business or service layer that sits in front of the data store.

The Migration

Once you’ve redesigned the data schemas and considered query behavior and requirements it’s time to get some data out there in the cloud in order to work with it.

The bad news is that there’s no wizard that lets you point to your Azure SQL Database instance and your MongoDB instance and click Migrate. You’ll need to write some scripts, either in the shell or in code. Fortunately, if code for the MongoDB side of the equation is constructed well, you’ll be able to reuse a good portion of it for normal runtime operation of the solution.

The first step is referencing the MongoDB.Bson and Mongo­DB.Driver libraries and adding the using statements:

using MongoDB.Bson.IO;
using MongoDB.Bson.Serialization;
using MongoDB.Bson.Serialization.Attributes;
using MongoDB.Bson.Serialization.Conventions;
using MongoDB.Bson.Serialization.IdGenerators;
using MongoDB.Bson.Serialization.Options;
using MongoDB.Bson.Serialization.Serializers;
using MongoDB.Driver.Builders;
using MongoDB.Driver.GridFS;
using MongoDB.Driver.Wrappers;

Objects will then show some new methods on them that are extremely useful when you’re trying to move from regular .NET objects to the Bson objects used with MongoDB. As Figure 5 shows, this becomes quite obvious in a function for converting the output rows from a database fetch into a BsonDocument to save into MongoDB.

Figure 5 Migrating Data with LINQ and MongoDB

pubsEntities myPubsEntities = new pubsEntities();
var pubsAuthors = from row in myPubsEntities.authors
  select row;
MongoDatabase mongoPubs = _mongoServer.GetDatabase("Pubs");
mongoPubs.CreateCollection("Authors");
MongoCollection<BsonDocument> authorsCollection =
  mongoPubs.GetCollection("Authors");
BsonDocument bsonAuthor;
foreach (author pubAuthor in pubsAuthors)
{
  bsonAuthor = pubAuthor.ToBsonDocument();
    authorsCollection.Insert(bsonAuthor);
}

The simple example in Figure 5 converts the data directly using the MongoDB extension methods. However, you have to be careful, especially with LINQ, when performing this type of operation. For example, if I attempt the same operation directly for Titles, the depth of the object graph of the Titles table in the entity model will cause the MongoDB driver to produce a stack overflow error. In such a case, the conversion will be a little more verbose in code, as shown in Figure 6.

Figure 6 Converting Values Individually

pubsEntities myPubsEntities = new pubsEntities();
var pubsTitles = from row in myPubsEntities.titles
  select row;
MongoDatabase mongoPubs = _mongoServer.GetDatabase("Pubs");
MongoCollection<BsonDocument> titlesCollection =
  mongoPubs.GetCollection("Titles");
BsonDocument bsonTitle;
foreach (title pubTitle in pubsTitles)
{
  bsonTitle = new BsonDocument{ {"titleId", pubTitle.title_id},
     {"pub_id", pubTitle.pub_id},
     {"publisher", pubTitle.publisher.pub_name},
     {"price", pubTitle.price.ToString()},
     {"title1", pubTitle.title1}};
  titlesCollection.Insert(bsonTitle);
}

To keep the conversion as simple as possible, the best approach is to write the SQL queries to return individual entities that can more easily be added to the appropriate MongoDB collection. For BsonDocuments that have child document collections, it will take a multistep approach to create the parent BsonDocument, add the child BsonDocuments to the parent BsonDocument, and then add the parent to the collection.

The obvious bits you’ll need to convert if moving from an Azure SQL Database to a MongoDB implementation is all of the code that lives in stored procedures, views and triggers. In many cases, the code will be somewhat simpler, because you’ll be dealing with one BsonDocument with children that you persist in its entirety instead of having to work across the relational constraints of multiple tables. Furthermore, instead of writing TSQL, you get to use your favorite .NET language, with all the support of Visual Studio as the IDE. The code that may not be initially accounted for is what you’ll have to create to be able to do transactions across documents. In one sense, it’s a pain to have to move all of that Azure SQL Database platform functionality into application code. On the other hand, once you’re done you’ll have an extremely fast and scalable data back end, because it’s focused solely on data shuttling. You also get a highly scalable middle tier by moving all of that logic previously trapped in the RDMBS into a proper middle-tier layer.

One last note of some significance is that, due to the nature of the data store, the data size will likely increase. This is because every document has to hold both schema and data. While this may not be terribly important for most, due to the low cost of space in Azure Tables, it’s still something that needs to be accounted for in the design.

Final Thoughts

Once the data is available in MongoDB, working with it will, in many regards, feel familiar.

As of C# driver 1.4 (currently on 1.5.0.4566) the LINQ support is greatly improved, so writing the code won’t feel completely unfamiliar. So, if your project or solution might benefit from a NoSQL data store like MongoDB, don’t let the syntax frighten you, because the adjustment will be minimal. Keep in mind, however, that there are some important differences between a mature, robust RDBMS platform—such as Azure SQL Database—and MongoDB. For example, health and monitoring will require more manual work. Instead of monitoring only some number of Azure SQL Database instances, you’ll have to monitor the host worker roles, the Azure Blob storage host of the database files and the log files of MongoDB itself.

NoSQL solutions offer great performance for some database operations, and some useful and interesting features that can really be a boon to a solution development team. If you have a large amount of data and you’re on a limited budget, the MongoDB on Azure option might be a great addition to your solution architecture.


Joseph Fultz is a software architect at Hewlett-Packard Co., working as part of the HP.com Global IT group. Previously he was a software architect for Microsoft, working with its top-tier enterprise and ISV customers to define architecture and design solutions.

Thanks to the following technical expert for reviewing this article: Wen-ming Ye