The Working Programmer
Cassandra NoSQL Database, Part 3: Clustering
Last time, I examined Apache Cassandra, the “open source, distributed, decentralized, elastically scalable, highly available, fault-tolerant, tuneably consistent, column-oriented database that bases its distribution design on Amazon Dynamo and its data model on Google Bigtable,” as described in the book, “Cassandra: The Definitive Guide” (O’Reilly Media, 2010). To be more precise, having installed Cassandra (in the first part in this series), I looked at how to program to it from the Microsoft .NET Framework, doing the basic bits of reading and writing data to it. Nothing spectacular.
In fact, part of Cassandra’s “spectacular” is wrapped up in its inherent abilities to cluster well, giving Cassandra easy scale-out. This means it can grow out to “ridiculous” sizes—in most cases with little to no administrative effort—particularly when compared against the work required by most relational databases to store equivalent sizes. As an example, a local tech firm here in Redmond, Wash. (where I live), claimed at a recent startup meetup that it was storing more than 50PB of data in Cassandra.
Even allowing for exaggeration and hyperbole, just one-tenth of that (5PB, or more than 5,000TB) is a pretty hefty chunk of data. To be fair, the Cassandra Web site (cassandra.apache.org) states, “The largest known Cassandra cluster has over 300 terabytes of data in over 400 machines,” which is still pretty hard to do with an out-of-the-box relational setup.
But the key to all that storage is in the cluster, and while getting a cluster of that size set up in production is probably beyond the scope of this article, we can at least start to play with it by getting a multi-node cluster running for development work. It requires a few steps, so I’ll walk through it one step at a time. (By the way, DataStax has an easy install for Cassandra, but as near as I can tell it lacks the ability to set up a multi-node cluster on one box; that’s about its only downside that I can see thus far.)
In the first article in this series (msdn.microsoft.com/magazine/jj553519), I went through the (sometimes agonizing) pain of setting up Cassandra from the .zip file and the command line: Make sure a Java runtime is installed and on the PATH; make sure a JAVA_HOME environment variable is configured; unzip the Cassandra distribution into a directory; and then launch the “cassandra.bat” file from the “bin” directory to get the server up and running.
At the time, it may have seemed really anachronistic to do so, but two positive things come from doing the install that way. First, you get some experience in how to install a server written in Java (and that turns out to be a pretty useful skill to have, given how many of the different NoSQL implementations are written in Java). Second, you’ll need to “trick out” that setup at a pretty low level to get Cassandra running multiple times on a single box.
You see, Cassandra’s notion of scalability comes from a “ring” of servers: multiple instances of the Cassandra service running on several boxes, each one storing a portion of the total data set. Then, when new data is written to the ring, Cassandra “gossips” (that’s the actual technical term for it) between the different nodes in the ring to put the data in the right place within the ring. In a well-administered ring, Cassandra will balance the data between the nodes evenly. Cassandra has a number of different strategies for writing the data out between the nodes, and it’s always possible to write a new custom strategy (assuming you’re comfortable writing Java), but for now I’m going to stick with the defaults to keep things easier.
One Ring to Rule Them All …
Normally, the easiest way to set up a Cassandra cluster is to have multiple machines, and obviously one way to do that on a single laptop is to set up multiple virtual machine instances all running simultaneously. But that can get unwieldy and amp up the hardware requirements pretty quickly, particularly if you’re one of those developers who does everything off a laptop (like me).
Thus, the second way to get multiple nodes is to have Cassandra run multiple times on the same box, storing data in multiple places and listening on different socket ports. This means diving into Cassandra’s configuration files to set up two (or more) different configuration setups, and launching each.
Assuming a Cassandra 1.1 install (the latest version as of this writing), Cassandra stores all her configuration information in the /conf directory. Within that directory, there are two files in particular I need to edit: log4j-server.properties and cassandra.yaml. I also need to figure out where the nodes’ data and logs are going to go, so I’ll go ahead and just create two subdirectories under the Cassandra install directory. Assuming you installed Cassandra at C:\Prg\apache-cassandra-1.1.0 (as I did), then you want to create two new directories underneath that, one for each node you’re going to create: C:\Prg\apache-cassandra-1.1.0\node1 and \node2.
Within those two directories, copy over the contents of the Cassandra /conf directory, which will bring over those two files you need. You also want to copy over the cassandra.bat file from /bin, because that’s where the third and final change will need to happen, in order to tell Cassandra where the configuration files she needs to run will be.
Isn’t this Java stuff fun?
The first file, log4j-server.properties, is a configuration file for the log4j diagnostic logging open source project. (Java uses “.properties” files much like Windows used “.ini” files back in the day.) Your main interest here is to make sure that each Cassandra node is writing a diagnostic log file to a different place than the other nodes. Personally, I want all data for each node to be within those \node1 and \node2 directories, so I want to find the line inside log4j-server.properties that reads like this:
Then I want to change it to read something more Windows-ish and more \node1-ish, like this:
The log directory doesn’t have to exist before Cassandra starts—she’ll create it if it isn’t there. By the way, make sure the slashes are forward slashes here Just trust me on this one; it’ll work. (Java recognizes them whether they’re forward or backward slashes, but the properties file syntax uses backward slashes as escape sequence characters, sort of like how they work in C# strings.)
Second, you need to crack open the “cassandra.yaml” file to make the next set of changes. The “.yaml” syntax is “Yet Another Markup Language,” and—yes, you guessed it—it’s another .ini-style configuration syntax. Java never standardized on this, so it’s quite common to see several different configuration styles all conjoined together in a single project (as in Cassandra).
Specifically, you need to change a couple of settings in here; these are scattered throughout the file (which, by the way, is littered with tons of comments, so they’re really somewhat self-explanatory if you read through it all):
cluster_name: 'Test Cluster' data_file_directories: - /var/lib/cassandra/data commitlog_directory: /var/lib/cassandra/commitlog saved_caches_directory: /var/lib/cassandra/saved_caches listen_address: localhost rpc_address: localhost
The “cluster_name” is optional, but it’s not a bad thing to change anyway, maybe to something such as “MyCluster” or “Big Cluster O Fun.” The rest of the settings, however, need to be changed. The “directories” entries need to point to the \node1 and \node2 directories, respectively.
One Ring to Find Them All …
The last two settings need to be changed for different reasons. Cassandra, remember, instinctively wants to run as one service per machine, so she assumes that it’s OK to just bind a TCP/IP socket to “localhost.” But if you have two or more services running on the same box, that’s not going to work. So you need to tell her to bind to addresses that will effectively resolve to the same box, even though they might be different values. Fortunately, you can do this by explicitly putting 127.0.0.1 for node1, 127.0.0.2 for node2 and so forth.
(You might be asking why this works; the answer is beyond the scope of this article, but any good TCP/IP reference should be able to explain it. If you’re not convinced, try “ping 127.0.0.1” and “ping 127.0.0.2” on your box. Both should resolve just fine. If you don’t like specifying those values, you can always assign them names in your “hosts” file in the C:\Windows\System32\drivers\etc directory.)
Part of the reason that Cassandra needs this network configuration worked out is because she’s going to “discover” the ring by first connecting to a “seed” node, which will then tell that instance about the other nodes in the ring. This is all part of the gossip protocol that she uses to convey important information around the ring. If we were setting up the ring to run across different machines, Cassandra would need the “seeds” configuration setting to point to a running node, but in this case—because we’re all running on the same box—the default 127.0.0.1 works out just fine.
After all the changes, the cassandra.yaml file in \node1 should look like this:
cluster_name: 'Test Cluster' data_file_directories: - C:/Prg/apache-cassandra-1.1.0/node1/data commitlog_directory: C:/Prg/apache-cassandra-1.1.0/node1/commitlog saved_caches_directory: C:/Prg/apache-cassandra-1.1.0/node1/saved_caches listen_address: localhost rpc_address: localhost For \node2, the file should look like this: cluster_name: 'Test Cluster' data_file_directories: - C:/Prg/apache-cassandra-1.1.0/node2/data commitlog_directory: C:/Prg/apache-cassandra-1.1.0/node2/commitlog saved_caches_directory: C:/Prg/apache-cassandra-1.1.0/node2/saved_caches listen_address: 127.0.0.2 rpc_address: 127.0.0.2
Finally, Cassandra needs to be told when she starts up where to find the configuration files, and normally she does this by looking along the Java CLASSPATH (which is vaguely similar to the assembly resolution mechanism in the .NET Framework, but about a half-decade more primitive, to be blunt). She also wants to expose some management and monitoring information to JMX (the Java equivalent to PerfMon or Windows Management Instrumentation) over a TCP/IP port, and both services can’t use the same port. Thus, the final changes have to be to cassandra.bat:
REM Ensure that any user defined CLASSPATH variables are not used on startup
And for the cassandra.bat in \node2:
REM Ensure that any user defined CLASSPATH variables are not used on startup
As well as the following line in \node2:
In the original, the port will read “7199.”
Like I said, isn’t this Java stuff fun?
… And in the Darkness Bind Them
But once all the configuration stuff gets out of the way, the fun starts. Fire up a command-prompt window (one with JAVA_HOME and CASSANDRA_HOME environment variables pointing to the root of the JDK and Cassandra install directories, remember), and change directory over to the \node1 directory you’ve been tricking out. Fire off “cassandra -f” at the prompt, and watch the diagnostic info scroll by. This is the first instance, and assuming all the configuration settings are good (no typos), you should see the text scroll by and end with “Listening for thrift clients …”
Now, in a second command-prompt window, navigate over to \node2 and do the same thing. This time, as it fires up, you’ll also see some activity happen in a few minutes in the \node1 window—what’s happening there is that after the \node2 instance gets up and running, it connects to the \node1 instance (the “seed”), and the two essentially configure each other to start working in a ring together. In particular, look for the two lines “JOINING: waiting for ring and schema information” and “Node /127.0.0.1 is now part of the cluster” to appear in the \node2 window, and “Node /127.0.0.2 is now part of the cluster” and “InetAddress /127.0.0.2 is now UP” in the \node1 window.
But, if you missed seeing those messages, Cassandra has one more surprise in store for you. In a third command-prompt window, go to the original Cassandra \bin directory and launch “nodetool ring –h 127.0.0.1,” and you should see something like Figure 1.
Figure 1 Two Cassandra Instances, Each Owning 50 Percent of the Data
This is really exciting stuff, because as you can see from the Owns column, the two Cassandra instances have already figured out that each one should own 50 percent of the data, without any additional configuration work on your part. Sweet!
The best part is, if you run the code from the previous article, the data will be spread across the cluster without any additional changes.
It’s a Complement, Not a Replacement
Like some of the other database tools this column has explored (MongoDB and SQLite), Cassandra shouldn’t be considered as a wholesale replacement for a relational database, but as a complementary technology that can be used either for areas where the feature set of a relational database just doesn’t fit well (caching or storing highly unstructured data sets come to mind, for example), or as a hybrid system, in conjunction with a relational database. For example, a company might store a “fixed” set of data elements in a relational database and include as one of the relational columns a Cassandra key, in order to retrieve the remaining, unstructured data. The relational database can then remain structured and relational (obeying most or all of the normal-form rules), but the system overall will still have the flexibility to store additional unanticipated data elements that users always seem to want to add to the system as it ages.
For another example, consider Web page hit data, which would always be keyed off the page itself, yet would easily track into the millions or billions of elements of data. A URL-shortening service (such as bit.ly) would be trivial to do here, because the minimized URL path (the “foobar” part in http://bit.ly/foobar) would be the key, and hit data stats—as well as an optional description and even perhaps a periodic snapshot of the redirected URL—would be made for Cassandra. And so on.
Cassandra isn’t going to take over the datacenter anytime soon, nor should it. But when used intelligently, it’s a powerful new tool in the toolbox, and developers would be foolish to ignore it. There’s a lot more to explore about Cassandra, but it’s time to let the Trojan prophetess go and move on to other things.
Ted Neward is an architectural consultant with Neudesic LLC. He has written more than 100 articles and authored or coauthored a dozen books, including “Professional F# 2.0” (Wrox, 2010). He’s an F# MVP and noted Java expert, and speaks at both Java and .NET conferences around the world. He consults and mentors regularly—reach him at firstname.lastname@example.org or Ted.Neward@neudesic.com if you’d like him to come work with your team. He blogs at blogs.tedneward.com and can be followed on Twitter at twitter.com/tedneward.
Thanks to the following technical expert for reviewing this article: Kelly Sommers