Azure Insider : Microsoft Azure and Open Source Power Grid Computing
There’s something like this actually in place—the Search for Extra Terrestrial Intelligence (SETI) project. The search for extraterrestrial life uses a large-scale grid or distributed computing over the Internet. It monitors space for signs of transmissions from alien civilizations by analyzing electromagnetic radiation in the microwave spectrum. It’s a good example of the power of grid computing.
The genesis of this work came out of the participation of Microsoft in one of the world’s largest hackathons, Tech Crunch Disrupt 2013. Microsoft took third place out of 280 teams. You can see the entire solution at tcrn.ch/OkIchx.
The challenge at a competition like this is you only have two days to complete a project before the judges come in and shoot you down. Besides dealing with sleep deprivation, you have to leverage as many prebuilt components as possible to complete the project on time. Most, if not all, of the technology used in the competition was based on open source software running in Azure. The open source technologies used included Jade, Express, Socket.io, Bootstrap, jQuery and Node.js.
We relied heavily on the now ubiquitous Web Sockets standard. Web Sockets are part of the HTML5 initiative. They provide a full duplex bidirectional connection over which you can transmit messages between client and server. Web Sockets enable a standardized approach for the server to send content to the browser without being explicitly asked by the client.
This let us exchange messages back and forth while keeping the connection open—creating full communication and orchestration, which is a necessary capability for a grid computing system. Today’s modern browsers such as Firefox 6, Safari 6, Google Chrome 14, Opera 12.10 and Internet Explorer 10 (and later) universally support Web Sockets.
Role of Web Sockets
Web Sockets start working when the client sends a Web Socket handshake request to the server in the form of an HTTP GET request. With Web Sockets, what follows the handshake doesn’t conform to the standard HTTP protocol. Data text frames in full duplex are sent back and forth, with each text frame representing a payload accompanied by a small header. You can split larger messages across multiple data frames.
Running the entire project yourself is quite easy. You can view a brief video that shows the project in action at 1drv.ms/1d79pjo. Before watching the video, you can grab all the code from GitHub at bit.ly/1mgWWwc. Setting up the project to run is straightforward with Node.js:
- Start by installing Node.js from nodejs.org
- Install Git (git-scm.com) or GitHub (github.com)
- Clone your fork with a Git clone (bit.ly/1cZ1nZh)
- Install the Node.js package manager (NPM) in the cloned directory
- Start NPM to run
You’ll need to install the various Node.js packages highlighted in this column. You can download the packages using the NPM at npmjs.org. You can also learn how to install them with a right-click in Visual Studio at bit.ly/OBbtEF. To learn more about using Visual Studio with Node.js, check out Bruno’s blog post, “Getting Started with Node.js and Visual Studio” (bit.ly/1gzKkbj).
Focus on App.js
The final solution we created actually has two server-side processes. The first and most obvious server-side process is the one that’s breaking the large computing job into smaller pieces and distributing the work and data to connected client browsers. You’ll find that code in App.js.
There’s a second server-side process that provides a portal experience to managing and viewing the large computing jobs executing on the grid. You’ll find that code in Server.js. It provides a real-time dashboard experience, complete with live updating graphs and numbers through a browser (see Figure 1). Our column will focus on the App.js code.
Figure 1 High-Level Grid Architecture
Three node packages (along with some supporting packages) greatly simplify implementing this architecture. For example, the express package is a popular package that helps with URL routes, handling requests and views. It also simplifies things such as parsing payloads, cookies and storing sessions.
Taken together, all of the packages referenced in Figure 2 will dramatically reduce the amount of code you have to write. A good Node.js developer understands the language and the built-in capabilities. A great Node.js developer is familiar with the various packages and is skilled at using them efficiently. Do yourself a favor and familiarize yourself with the Node.js Packaged Modules library at npmjs.org.
Figure 2 The Bidirectional Communication of Grid Architecture
Let’s begin by examining the setup code for the Node.js server-side process. The workflow begins when the server opens a port and waits for connections. Both Express and Socket.io let the server listen for incoming connections of browsers on port 3,000:
// Create a Web server, allowing the Express package to // handle the requests. var server = http.createServer(app); // Socket.io injects itself into HTTP server, handling Socket.io // requests, not handled by Express itself. var io = socketio.listen(server);
The Jade Engine
You could adapt all of this for other general-purpose grid computing problems. We think the more important take away from this article is the power and flexibility of Node.js. The repos on GitHub for Node.js exceed that of jQuery, a powerful testimony of how Node.js resonates with today’s modern developer.
We’d like to thank the startup and partner evangelists, whose job it is to help companies and entrepreneurs understand and leverage the Microsoft stack and related technologies, many of which are open sourced. Warren Wilbee, West Region startup manager, seeded the Tech Crunch Disrupt team with some of his top players, including Felix Rieseberg, Helen Zeng, Steve Seow, Timothy Strimple and Will Tschumy.
Bruno Terkaly is a developer evangelist for Microsoft. His depth of knowledge comes from years of experience in the field, writing code using a multitude of platforms, languages, frameworks, SDKs, libraries and APIs. He spends time writing code, blogging and giving live presentations on building cloud-based applications, specifically using the Azure platform. You can read his blog at blogs.msdn.com/b/brunoterkaly.
Ricardo Villalobos is a seasoned software architect with more than 15 years of experience designing and creating applications for companies in multiple industries. Holding different technical certifications, as well as a master’s degree in business administration from the University of Dallas, he works as a cloud architect in the DPE Globally Engaged Partners team for Microsoft, helping companies worldwide to implement solutions in Azure. You can read his blog at blog.ricardovillalobos.com.
Terkaly and Villalobos jointly present at large industry conferences. They encourage readers of Azure Insider to contact them for availability. Terkaly can be reached at email@example.com and Villalobos can be reached at Ricardo.Villalobos@microsoft.com.
Thanks to the following Microsoft technical experts for reviewing this article: Gert Drapers, Cort Fritz and Tim Park