June 2010

Volume 25 Number 06

The Working Programmer - Going NoSQL with MongoDB, Part 2

By Ted Neward | June 2010

Ted NewardIn my previous article, MongoDB’s basics took front and center: getting it installed, running and inserting and finding data. However, I covered only the basics—the data objects used were simple name/value pairs. That made sense, because MongoDB’s “sweet spot” includes unstructured and relatively simple data structures. But surely this database can store more than just simple name/value pairs.

In this article, we’ll use a slightly different method to investigate MongoDB (or any technology). The procedure, called an exploration test, will help us find a possible bug in the server and, along the way, highlight one of the common issues object-oriented developers will run into when using MongoDB.

In Our Last Episode …

First we’ll make sure we’re all on the same page, and we’ll also cover some slightly new ground. Let’s look at MongoDB in a bit more structured fashion than we did in the previous article (msdn.microsoft.com/magazine/ee310029). Rather than just create a simple application and hack on it, let’s kill two birds with one stone and create exploration tests—code segments that look like unit tests but that explore functionality rather than try to verify it.

Writing exploration tests serves several different purposes when investigating a new technology. One, they help discover whether the technology under investigation is inherently testable (with the assumption that if it’s hard to exploration-test, it’s going to be hard to unit-test—a huge red flag). Two, they serve as a sort of regression when a new version of the technology under investigation comes out, because they give a heads-up if old functionality no longer works. And three, since tests should be relatively small and granular, exploration tests inherently make learning a technology easier by creating new “what-if” cases that build on previous cases.

But unlike unit tests, exploration tests aren’t continuously developed alongside the application, so once you consider the technology learned, set the tests aside. Don’t discard them, however—they can also help separate bugs in application code from those in the library or framework. The tests do so by providing a lightweight, application-neutral environment for experimentation without the overhead of the application.

With that in mind, let’s create MongoDB-Explore, a Visual C# test project. Add MongoDB.Driver.dll to the list of assembly references and build to make sure everything is good to go. (Building should pick up the one TestMethod that’s generated as part of the project template. It will pass by default, so everything should be good, which means that if the project fails to build, something’s screwed up in the environment. Checking assumptions is always a good thing.)

As tempting as it would be to jump into writing code right away, though, a problem surfaces pretty quickly: MongoDB needs the external server process (mongod.exe) to be running before client code can connect against it and do anything useful. While it’s tempting to simply say “Fine, fine, let’s start it and get back to writing code,” there’s a corollary problem. It’s an almost sure bet that at some point, 15 weeks later when looking back at this code, some poor developer (you, me or a teammate) will try to run these tests, see them all fail and lose two or three days trying to figure out what’s going on before she thinks to look to see if the server’s running.

Lesson: Try to capture all the dependencies in the tests somehow. The issue will arise again during unit-testing, anyway. At that point we’ll need to start from a clean server, make some changes and then undo them all. That’s easiest to accomplish by simply stopping and starting the server, so solving it now saves time later.

This idea of running something before testing (or after, or both) isn’t a new one, and Microsoft Test and Lab Manager projects can have both per-test and per-test-suite initializers and cleanup methods. These are adorned by the custom attributes ClassInitialize and ClassCleanup for per-test-suite bookkeeping and TestInitialize and TestCleanup for per-test bookkeeping. (See “Working with Unit Tests” for more details.) Thus, a per-test-suite initializer will launch the mongod.exe process, and the per-test-suite cleanup will shut the process down, as shown in Figure 1.

Figure 1 Partial Code for Test Initializer and Cleanup

namespace MongoDB_Explore

{

  [TestClass]

  public class UnitTest1

  {

    private static Process serverProcess;



   [ClassInitialize]

   public static void MyClassInitialize(TestContext testContext)

   {

     DirectoryInfo projectRoot = 

       new DirectoryInfo(testContext.TestDir).Parent.Parent;

     var mongodbbindir = 

       projectRoot.Parent.GetDirectories("mongodb-bin")[0];

     var mongod = 

       mongodbbindir.GetFiles("mongod.exe")[0];



     var psi = new ProcessStartInfo

     {

       FileName = mongod.FullName,

       Arguments = "--config mongo.config",

       WorkingDirectory = mongodbbindir.FullName

     };



     serverProcess = Process.Start(psi);

   }

   [ClassCleanup]

   public static void MyClassCleanup()

   {

     serverProcess.CloseMainWindow();

     serverProcess.WaitForExit(5 * 1000);

     if (!serverProcess.HasExited)

       serverProcess.Kill();

  }

...

The first time this runs, a dialog box will pop up informing the user that the process is starting. Clicking OK will make the dialog go away ... until the next time the test is run. Once that dialog gets too annoying, find the radio box that says, “Never show this dialog box again” and check it to make the message goes away for good. If firewall software is running, such as Windows Firewall, the dialog will likely make an appearance here also, because the server wants to open a port to receive client connections. Apply the same treatment and everything should run silently. Put a breakpoint on the first line of the cleanup code to verify the server is running, if desired.

Once the server is running, tests can start firing—except another problem surfaces: Each test wants to work with its own fresh database, but it’s helpful for the database to have some pre-existing data to make testing of certain things (queries, for example) easier. It would be nice if each test could have its own fresh set of pre-existing data. That will be the role of the TestInitializer- and TestCleanup-adorned methods.

But before we get to that, let’s look at this quick TestMethod, which tries to ensure that the server can be found, a connection made, and an object inserted, found and removed, to bring the exploration tests up to speed with what we covered in the previous article (see Figure 2).

Figure 2 TestMethod to Make Sure the Server Can Be Found and a Connection Made

[TestMethod]

public void ConnectInsertAndRemove()

{

  Mongo db = new Mongo();

  db.Connect();



  Document ted = new Document();

  ted["firstname"] = "Ted";

  ted["lastname"] = "Neward";

  ted["age"] = 39;

  ted["birthday"] = new DateTime(1971, 2, 7);

  db["exploretests"]["readwrites"].Insert(ted);

  Assert.IsNotNull(ted["_id"]);



  Document result =

    db["exploretests"]["readwrites"].FindOne(

    new Document().Append("lastname", "Neward"));

  Assert.AreEqual(ted["firstname"], result["firstname"]);

  Assert.AreEqual(ted["lastname"], result["lastname"]);

  Assert.AreEqual(ted["age"], result["age"]);

  Assert.AreEqual(ted["birthday"], result["birthday"]);



  db.Disconnect();

}

If this code runs, it trips an assertion and the test fails. In particular, the last assertion around “birthday” is fired. So apparently, sending a DateTime into the MongoDB database without a time doesn’t round-trip quite correctly. The data type goes in as a date with an associated time of midnight but comes back as a date with an associated time of 8 a.m., which breaks the AreEqual assertion at the end of the test.

This highlights the usefulness of the exploration test—without it (as is the case, for example, with the code from the previous article), this little MongoDB characteristic might have gone unnoticed until weeks or months into the project. Whether this is a bug in the MongoDB server is a value judgment and not something to be explored right now. The point is, the exploration test put the technology under the microscope, helping isolate this “interesting” behavior. That lets developers looking to use the technology make their own decisions as to whether this is a breaking change. Forewarned is forearmed.

Fixing the code so the test passes, by the way, requires the DateTime that comes back from the database to be converted to local time. I brought this up in an online forum, and according to the response from the MongoDB.Driver author, Sam Corder, “All dates going in are converted to UTC but left as UTC coming back out.” So you must either convert the DateTime to UTC time before storing it via DateTime.ToUniversalTime, or else convert any DateTime retrieved from the database to the local time zone via the DateTime.ToLocalTime, by using the following sample code:

Assert.AreEqual(ted["birthday"], 

  ((DateTime)result["birthday"]).ToLocalTime());

This in itself highlights one of the great advantages of community efforts—typically the principals involved are only an e-mail away.

Adding Complexity

Developers looking to use MongoDB need to understand that, contrary to initial appearances, it isn’t an object database—that is, it can’t handle arbitrarily complex object graphs without help. There are a few conventions that deal with ways to provide that help, but thus far doing so remains on the developer’s shoulders.

For example, consider Figure 3, a simple collection of objects designed to reflect the storage of a number of documents describing a well-known family. So far so good. In fact, while it’s at it, the test really should query the database for those objects inserted, as shown in Figure 4, just to make sure they’re retrievable. And … the test passes. Awesome.

Figure 3 A Simple Object Collection

[TestMethod]

public void StoreAndCountFamily()

{

  Mongo db = new Mongo();

  db.Connect();



  var peter = new Document();

  peter["firstname"] = "Peter";

  peter["lastname"] = "Griffin";



  var lois = new Document();

  lois["firstname"] = "Lois";

  lois["lastname"] = "Griffin";



  var cast = new[] {peter, lois};

  db["exploretests"]["familyguy"].Insert(cast);

  Assert.IsNotNull(peter["_id"]);

  Assert.IsNotNull(lois["_id"]);



  db.Disconnect();

}

Figure 4 Querying the Database for Objects

[TestMethod]

public void StoreAndCountFamily()

{

  Mongo db = new Mongo();

  db.Connect();



  var peter = new Document();

  peter["firstname"] = "Peter";

  peter["lastname"] = "Griffin";



  var lois = new Document();

  lois["firstname"] = "Lois";

  lois["lastname"] = "Griffin";



  var cast = new[] {peter, lois};

  db["exploretests"]["familyguy"].Insert(cast);

  Assert.IsNotNull(peter["_id"]);

  Assert.IsNotNull(lois["_id"]);



  ICursor griffins =

    db["exploretests"]["familyguy"].Find(

      new Document().Append("lastname", "Griffin"));

  int count = 0;

  foreach (var d in griffins.Documents) count++;

  Assert.AreEqual(2, count);



  db.Disconnect();

}

Actually, that might not be entirely true—readers following along at home and typing in the code might find that the test doesn’t pass after all, as it claims that the expected count of objects isn’t matching 2. This is because, as databases are expected to do, this one retains state across invocations, and because the test code isn’t explicitly removing those objects, they remain across tests.

This highlights another feature of the document-oriented database: Duplicates are fully expected and allowed. That’s why each document, once inserted, is tagged with the implicit_id attribute and given a unique identifier to be stored within it, which becomes, in effect, the document’s primary key.

So, if the tests are going to pass, the database needs to be cleared before each test runs. While it’s pretty easy to just delete the files in the directory where MongoDB stores them, again, having this done automatically as part of the test suite is vastly preferable. Each test can do so manually after completion, which could get to be a bit tedious over time. Or the test code can take advantage of the TestInitialize and TestCleanup feature of Microsoft Test and Lab Manager to capture the common code (and why not include the database connect and disconnect logic), as shown in Figure 5.

Figure 5 Taking Advantage of TestInitialize and TestCleanup

private Mongo db;



[TestInitialize]

public void DatabaseConnect()

{

  db = new Mongo();

  db.Connect();

}

        

[TestCleanup]

public void CleanDatabase()

{

  db["exploretests"].MetaData.DropDatabase();



  db.Disconnect();

  db = null;

}

Though the last line of the CleanDatabase method is unnecessary because the next test will overwrite the field reference with a new Mongo object, sometimes it’s best to make it clear that the reference is no longer good. Caveat emptor. The important thing is that the test-dirtied database is dropped, emptying the files MongoDB uses to store the data and leaving everything fresh and sparkly clean for the next test.

As things stand, however, the family model is incomplete—the two people referenced are a couple, and given that, they should have a reference to each other as spouses, as shown here:

peter["spouse"] = lois;

  lois["spouse"] = peter;

Running this in the test, however, produces a StackOverflowException—the MongoDB driver serializer doesn’t natively understand the notion of circular references and naively follows the references around ad infinitum. Oops. Not good.

Fixing this requires you to choose one of two options. With one, the spouse field can be populated with the other document’s _id field (once that document has been inserted) and updated, as shown in Figure 6.

Figure 6 Overcoming the Circular References Problem

[TestMethod]

public void StoreAndCountFamily()

{

  var peter = new Document();

  peter["firstname"] = "Peter";

  peter["lastname"] = "Griffin";



  var lois = new Document();

  lois["firstname"] = "Lois";

  lois["lastname"] = "Griffin";



  var cast = new[] {peter, lois};

  var fg = db["exploretests"]["familyguy"];

  fg.Insert(cast);

  Assert.IsNotNull(peter["_id"]);

  Assert.IsNotNull(lois["_id"]);



  peter["spouse"] = lois["_id"];

  fg.Update(peter);

  lois["spouse"] = peter["_id"];

  fg.Update(lois);



  Assert.AreEqual(peter["spouse"], lois["_id"]);

  TestContext.WriteLine("peter: {0}", peter.ToString());

  TestContext.WriteLine("lois: {0}", lois.ToString());

  Assert.AreEqual(

    fg.FindOne(new Document().Append("_id",

    peter["spouse"])).ToString(),

    lois.ToString());



  ICursor griffins =

    fg.Find(new Document().Append("lastname", "Griffin"));

  int count = 0;

  foreach (var d in griffins.Documents) count++;

  Assert.AreEqual(2, count);

}

There’s a drawback to the approach, though: It requires that the documents be inserted into the database and their _id values (which are Oid instances, in the MongoDB.Driver parlance) be copied into the spouse fields of each object as appropriate. Then each document is again updated. Although trips to the MongoDB database are fast in comparison to those with a traditional RDBMS update, this method is still somewhat wasteful.

A second approach is to pre-generate the Oid values for each document, populate the spouse fields, and then send the whole batch to the database, as shown in Figure 7.

Figure 7 A Better Way to Solve the Circular References Problem

[TestMethod]

public void StoreAndCountFamilyWithOid()

{

  var peter = new Document();

  peter["firstname"] = "Peter";

  peter["lastname"] = "Griffin";

  peter["_id"] = Oid.NewOid();



  var lois = new Document();

  lois["firstname"] = "Lois";

  lois["lastname"] = "Griffin";

  lois["_id"] = Oid.NewOid();



  peter["spouse"] = lois["_id"];

  lois["spouse"] = peter["_id"];



  var cast = new[] { peter, lois };

  var fg = db["exploretests"]["familyguy"];

  fg.Insert(cast);



  Assert.AreEqual(peter["spouse"], lois["_id"]);

  Assert.AreEqual(

    fg.FindOne(new Document().Append("_id",

    peter["spouse"])).ToString(),

    lois.ToString());



  Assert.AreEqual(2, 

    fg.Count(new Document().Append("lastname", "Griffin")));

}

This approach requires only the Insert method, because now the Oid values are known ahead of time. Note, by the way, that the ToString calls on the assertion test are deliberate—this way, the documents are converted to strings before being compared.

What’s really important to notice about the code in Figure 7, though, is that de-referencing the document referenced via the Oid can be relatively difficult and tedious because the document-oriented style assumes that documents are more or less stand-alone or hierarchical entities, not a graph of objects. (Note that the .NET driver provides DBRef, which provides a slightly richer way of referencing/dereferencing another document, but it’s still not going to make this into an object-graph-friendly system.) Thus, while it’s certainly possible to take a rich object model and store it into a MongoDB database, it’s not recommended. Stick to storing tightly clustered groups of data, using Word or Excel documents as a guiding metaphor. If something can be thought of as a large document or spreadsheet, then it’s probably a good fit for MongoDB or some other document-oriented database.

More to Explore

We’ve finished our investigation of MongoDB, but before we wrap up, there are a few more things to explore, including carrying out predicate queries, aggregates, LINQ support and some production administration notes. We’ll tackle that next month. (That article is going to be a pretty busy piece!) In the meantime, explore the MongoDB system, and be sure to drop me an e-mail with suggestions for future columns.      


Ted Neward  is a principal with Neward & Associates, an independent firm specializing in enterprise .NET Framework and Java platform systems. He has written more than 100 articles, is a C# MVP, INETA speaker and the author or coauthor of a dozen books, including the forthcoming “Professional F# 2.0” (Wrox). He consults and mentors regularly. Reach him at ted@tedneward.com and read his blog at blogs.tedneward.com.

Thanks to the following technical expert for reviewing this article: Sam Corder