June 2015

Volume 30 Number 6


Big Data - MapReduce Without Hadoop Using the ASP.NET Pipeline

By Doug Duerner| June 2015

Have you ever wanted to add the power of MapReduce over Big Data to your smartphone apps or rich data analytics on your tablet or other small device, but thought it would be too difficult?

Have you ever wanted to transform your existing single-node application into a distributed system quickly and easily, without having to re-architect the entire application?

These questions are what prompted us to embark on an adventure to create an extremely easy to set up and use RESTful MapReduce component.

Products like Hadoop excel at the challenges of Big Data. We created a solution that sacrifices some of that functionality for simplicity and agility in order to make it easier to develop Big Data applications. This way, you don’t have to be an expert to get a working system up and running in a short time. The simplicity of the mesh versus the complexity of setting up Hadoop, and the agility of our solution versus the “elephantness” of a Hadoop cluster make it a compelling proposition.

In a nutshell, we created a very simple infrastructure that can use MapReduce to either do computationally intensive processing out on the “mesh” nodes or, alternatively, do data collection out on those nodes, with the results being correlated and aggregated into one final result that’s returned to the client.

Background

The IIS Web Server (with its ASP.NET pipeline) has proven to be a highly scalable, enterprise-grade Web server. But these technologies aren’t limited to simply serving up Web pages and hosting Web sites. There’s really no technical reason you can’t use them as a general-purpose pipeline mechanism accessed via HTTP. The ASP.NET pipeline steps execute in sequence (not moving to the next step until the previous step has completed), but each step can execute asynchronously in parallel. The IIS Web Server can be configured to run multiple ASP.NET pipelines (multiple w3wp.exes) servicing HTTP requests.

Using the ASP.NET pipeline as a general-purpose pipeline (that just happens to be accessed via HTTP), instead of serving up Web pages and hosting Web sites, might seem a bit unorthodox, but an ASP.NET pipeline (with asynchronous pipeline steps) is actually quite similar to CPU instruction pipelining in microprocessors (bit.ly/1DifFvO), and the ability to have multiple w3wp.exe files (with an ASP.NET pipeline in each w3wp.exe) is quite similar to superscalar design in microprocessors (bit.ly/1zMr6KD). These similarities, along with proven scalability, are what make using the IIS Web Server and ASP.NET pipeline for anything that needs pipelining functionality a compelling proposition.

There are lots of products that already do RESTful MapReduce (Hadoop, Infinispan, Riak, CouchDB, MongoDB and more), but our research suggests they’re rather difficult to set up or require specialized expertise.

We wanted to simply use our existing Windows IIS servers that are already up and running; use our existing data API methods that have already been written; get the data for our UI screens on demand; and have the whole distributed MapReduce system up and running in minutes (all with limited knowledge of distributed systems or MapReduce system design and architecture). This way, you could quickly and easily transform an existing small-scale application into a larger distributed system with minimal effort or knowledge, on your own servers or on the cloud. Or, if you wanted to add rich data analytics to your existing smartphone app, you could do so with minimal effort.

This RESTful MapReduce component is a bolt-on that doesn’t require the existing application to be rewritten, and it’s a prime candidate when your goal is to merely add basic distributed functionality to an existing application that already has an extensive public data API. It’s possible to quickly and easily emulate distributed computing patterns such as “scatter-gather,” as shown in Figure 1.

Emulate Distributed Computing Patterns Such as “Scatter-Gather”
Figure 1 Emulate Distributed Computing Patterns Such as “Scatter-Gather”

The sample project that accompanies this article provides the starting point of a simple basic infrastructure that demonstrates this design philosophy and can be expanded going forward. The compelling factor of this design is not that it’s better than the other products, but that it’s easier. It’s simply an easy-to-use design alternative to the large enterprise MapReduce systems in use today. The design is by no means a replacement for the mature enterprise MapReduce products like Hadoop, and we’re not implying that it’s even close to containing all the functionality of the leading products.

MapReduce

In simple terms, MapReduce is a way of aggregating large stores of data. The Map step executes on many distributed processing server nodes. It usually executes a task on each distributed server node to retrieve data from the data nodes, and can optionally transform or pre-process the data while it’s still on the distributed server node. The Reduce step executes on one or more final processing server nodes and consolidates all the results from the Map steps into one final result set using many different combining algorithms.

In the context of a business object API, the Map step executes a business object API method to get data, and the Reduce step combines all the Map step result sets into one final result set (doing a union by primary key, or an aggregation like a sum by group, for example) that’s returned to the client that made the request.

One of the key benefits of MapReduce is it lets you “scale out” instead of “scale up.” In other words, you simply keep adding more normal server nodes rather than purchasing better hardware for the one main server node to scale. Scaling out is generally the cheaper, more flexible choice because it uses regular commodity hardware, while scaling up is typically much more expensive because the cost of the hardware tends to exponentially increase as it becomes more sophisticated.

As an interesting side note, MapReduce excels when it comes to extremely large volumes of data (Internet scale) and the data is partially structured or unstructured, like log files and binary blobs. In contrast, SQL relational databases excel when you have normalized structured data with schemas, at least up to a certain limit when the overhead of the relational database is unable to deal with the huge amount of data.

Figure 2 shows a high-level overview of the MapReduce process and compares a simple SQL relational database query with the corresponding query in a Big Data MapReduce process.

Simple SQL Relational Database Query Versus Same Query with MapReduce
Figure 2 Simple SQL Relational Database Query Versus Same Query with MapReduce

REST

Representational State Transfer (REST) defines a public API running over HTTP that uses a create, read, update, delete (CRUD) paradigm, based respectively on the HTTP verbs Post, Get, Put and Delete, to return a representation of an object from the server to the client that made the request. REST seeks to allow public access to the object itself as an entity, not just functional operations on the object. It isn’t a specification or an RFC; it is simply a design recommendation. You can adhere closely to the pure REST design and require the URL be formatted to treat the object as an entity, like this:

https://server/MapReducePortal/BookSales/Book A

Or you can opt for a more RPC-style design and require the URL to be formatted with the class and method names to execute, as follows:

https://server/MapReducePortal/BookSales/GetTotalBookSalesByBookName/Book A
https://server/MapReducePortal/BookSales/GetTotalBookSalesByBookName?bookName=Book A

RESTful MapReduce

RESTful MapReduce means doing MapReduce operations over HTTP for the API and for the transport mechanism among distributed server nodes.

REST over HTTP for the API and transport has several advantages that make it attractive:

  • The HTTP protocol over port 80 is firewall-friendly.
  • Client applications from almost any platform can easily consume resources without the need for platform-specific dependencies.
  • The HTTP verbs (Get, Post, Put, Delete) are a simple, elegant paradigm for requesting resources.
  • Gzip compression can assist in reducing payload sizes.
  • The HTTP protocol itself has additional advantages, such as built-in caching.

Currently, the sample project uses REST over HTTP for the API and transport and supports only Get and Post, no write operations, and communicates entirely in JSON. It’s similar to some of the well-known methodologies for accessing the Hadoop Distributed File System (HDFS) externally (like Hadoop YARN and Hadoop WebHDFS), but supports only the absolute minimum necessary for the system to operate. We’re not trying to replace Hadoop, or match all of its extensive functionality. We’re merely trying to provide an extremely rudimentary, easy-to-use alternative, at the expense of functionality.

MapReduce Configuration

For the sample project, simply copy the MapReduceModule.dll into the \bin directory of the virtual directory on each IIS server node you want to use as a distributed server node in your Map­Reduce system, and then put an entry in the modules section of the web.config, like so:

<modules>
  <add name="MapReduceModule" type="MapReduce.MapReduceModule" />
</modules>

You’re done. It’s as easy as that.

If there’s no virtual directory on the IIS server node, create a new virtual directory with a \bin directory, make it an Application and make sure it’s using a Microsoft .NET Framework 4 Application Pool. Increase the w3wp.exe worker process count on the Application Pool that services the MapReducePortal virtual directory to provide more processing pipelines for MapReduce requests. The other advanced configuration options used for tuning the IIS Server have usually already been set by the IT department that manages the server and are beyond the scope of this article, but if that’s not the case, they’re readily available on the Microsoft Web site.

REST Configuration

For the sample project, simply place the PathInfoAttribute on any of your existing business object data API methods and specify the PathInfo string that will be used to map the URL to the method and method arguments. That’s it.

One of the cool features of the sample code is whatever data types the existing business object data API methods are currently returning can stay the same and don’t need to be changed. The infrastructure can handle pretty much any types automatically because it uses a .NET DynamicObject to dynamically represent the data types returned. For example, if the existing method returns a collection of Customer objects, then the DynamicObject represents a Customer data type.

The PathInfoAttribute PathInfo string uses the same .NET UriTemplate class that Windows Communication Foundation (WCF) uses and allows you to do all the same fancy things you can do in a WCF Web HTTP REST project or an ASP.NET Web API 2 project, such as argument variable name substitution, wild cards, and so forth. You choose what URL maps to what methods. You have total control and are free to implement your REST API any way you like. You can stick closer to a pure REST API and make your URL segments represent your objects like first-class entities:

https://server/MapReducePortal/BookSales/Book A
[PathInfoAttribute(PathInfo="/BookSales/{bookName}", ReturnItemType="Book")]
public BookSales GetTotalBookSalesByBookName(string bookName)
{
}

Or, if you prefer, you can loosely follow REST and make your URL segments specify the class name and method name you want to execute in the URL segments:

https://server/MapReducePortal/BookSales/GetTotalBookSalesByBookName/Book A
[PathInfoAttribute(PathInfo="/BookSales/GetTotalBookSalesByBookName/{bookName}",
  ReturnItemType="Book")]
public BookSales GetTotalBookSalesByBookName(string bookName)
{
}

It’s entirely up to you.

Compelling Factors

One of the compelling factors of the sample project’s design is the scalability gained by using the ASP.NET pipeline as a MapReduce pipeline to execute the MapReduce process. Because the ASP.NET pipeline operates sequentially, it’s suitable for performing both the Map and Reduce steps. And what’s cool is that though the pipeline is sequential and will not move to the next step until the previous step has completed, each step can still be executed asynchronously. This allows the pipeline to continue to receive and process new MapReduce requests even while the pipeline is blocked waiting for the Map calls to return from the other distributed server nodes.

As Figure 3shows, each w3wp.exe houses one ASP.NET pipeline acting as a MapReduce pipeline. The w3wp.exe (IIS worker process) is managed by the application pool assigned to the Map­ReducePortal virtual directory. By default, the application pool has one w3wp.exe processing new incoming requests to the virtual directory, but can very easily be configured to have as many w3wp.exes as you like. This lets you have multiple MapReduce pipelines on a single standalone server node, all teaming up to process the incoming MapReduce requests to the MapReducePortal virtual directory. The asynchronous nature of the individual ASP.NET pipeline allows many requests to be processed in parallel. The ability to have multiple w3wp.exes facilitating multiple ASP.NET pipelines takes you to the next level.

Increase the IIS Worker Process Count for the App Pool to Have More MapReduce Pipelines Servicing MapReduce Requests Sent to the MapReducePortal Virtual Directory of This IIS Server
Figure 3 Increase the IIS Worker Process Count for the App Pool to Have More MapReduce Pipelines Servicing MapReduce Requests Sent to the MapReducePortal Virtual Directory of This IIS Server

The sample project’s design also lets you keep adding as many IIS servers as you like to form a larger and larger “mesh” of server nodes, as shown in Figure 4. The larger the mesh grows, potentially the larger the problem that can be handled by breaking it into smaller and smaller pieces to be solved and the greater the level of parallelism that can potentially be achieved. The asynchronous ASP.NET pipeline, combined with multiple pipelines per server, enable parallelism across a single server’s CPU cores. The mesh of servers provides another level of parallelism across many server machines. It’s a snap to add more IIS servers to the mesh; all you have to do is copy the MapReduceModule.dll to the \bin folder under the virtual directory and add an entry to the web.config file. Because the IIS servers are all simply standalone servers, no additional configuration is required. Products like Hadoop, in contrast, generally require more effort, planning and expertise because the servers must typically be configured as an actual server “cluster.”

Any Server Node Can Initiate the MapReduce Request, and Any Number of Other Distributed Server Nodes Listed in the AJAX URL Can Execute the Map Step Parts of That Request in Parallel
Figure 4 Any Server Node Can Initiate the MapReduce Request, and Any Number of Other Distributed Server Nodes Listed in the AJAX URL Can Execute the Map Step Parts of That Request in Parallel

You don’t even need specially built IIS servers. You can simply use any available IIS servers merely by copying the MapReduceModule.dll to any virtual directory that’s already on the server. That’s all it takes. The next AJAX call can now include the new IIS server in the distributednodes list parameter in the URL QueryString.

Another benefit of the server mesh design is that it doesn’t rely on a Master node to function. In products like Hadoop, the Master node manages the server cluster and the location of the data across that server cluster. And it’s the Master node that’s been the source of failure when Hadoop was scaled to its limit in production, rather than the amount of data or the infrastructure.

In this server mesh design, there’s no Master node. Any server node can initiate the MapReduce request, and the data lives on the node that collects it. As Figure 4 shows, any server node can be the requester of data and the provider of data at the same time in parallel. A server can request data from any other nodes in the server mesh that are performing the Map function, and can receive the results, combining them into one final result set in the Reduce step. At the same time, that same server node can also be acting as an edge server node handling a Map step, returning its partial results for a MapReduce request that originated from another server node and is going to be reduced back on that node.

Currently, the clients that are making the requests identify the location of the data (via the distributednodes list in the URL QueryString). Instead, you could modify the design to store this list (or just this node’s nearest neighbor nodes or the nodes hosting data split across multiple nodes) in a database table on each individual node and programmatically add them to the URL at run time. In a sense, this would turn the single Master node notion into a distributed Master node concept, where each node knows where to get its data. It would be as if the Master node were spread across the mesh, allowing it to scale with the mesh.

Because this mesh design uses a range of tried-and-true Microsoft products—Windows Server, IIS Web Server and SQL Server databases—you get the robust redundancy and fault tolerance features (such as Network Load Balancing [NLB] on Windows Server for IIS, AlwaysOn Availability Groups with Automatic Page Repair or Mirroring with Automatic Page Repair for SQL Server) that are already built into these commercial products. Details for these features are readily available on Microsoft Web sites.

The sample project’s design also allows multiple MapReduce requests to be “chained” together to form workflows where the starting input to one MapReduce request is the results from the previous MapReduce request. This is accomplished by changing the MapReduce request to a Post instead of a Get and including the previous MapReduce request results in the body of the Post request. Figure 5 shows an example of the resulting output in the test page.

Displaying Output Resulting from Chaining in a Test Page
Figure 5 Displaying Output Resulting from Chaining in a Test Page

Sample Project Overview

In essence, the MapReduceModule.dll transforms the ASP.NET pipeline into a MapReduce pipeline. It uses an HttpModule to implement both Map and Reduce functionality. As an interesting side note, some of the combining operations (like union) that are executed during the Reduce step rely on an IEqualityComparer<T>, where T is a DynamicObject that, in a sense, allows you to do the equality comparisons based on a property name as a string value at run time, even though IEqualityComparer<T> requires a concrete type to be defined at compile time. Pretty cool!

Figure 6 is a high-level overview of the design of the MapReduce­Module.dll, showing the processing flow as it passes through the MapReduceModule.dll. MapReduceModule is the only dll required, and it needs to be on every server node that you want to participate in the MapReduce infrastructure. Adding the MapReduce­Module.dll to the server is a snap and is accomplished by simply copying the MapReduceModule.dll to the \bin folder under the virtual directory and adding an entry to the web.config file.

High-Level Design Overview of Processing Flow for MapReduceModule.dll
Figure 6 High-Level Design Overview of Processing Flow for MapReduceModule.dll

In Figure 6, the IHttpModule uses the first step in the ASP.NET pipeline for the MAP functionality by subscribing to the AddOnBeginRequestProcessingAsync event that’s fired during the Begin Request Processing step in the ASP.NET pipeline. The IHttpModule uses the last step in the ASP.NET pipeline for the REDUCE functionality by subscribing to the AddOnEndRequestProcessingAsync event that’s fired during the End Request Processing step in the ASP.NET pipeline.

In short, you subscribe only to the Begin Request Processing and End Request Processing events in the ASP.NET pipeline. They execute sequentially and don’t move to the next step until the previous step has completed.

During the Begin Request Processing step, the IHttpModule initiates all the MAP requests by querying the local node and by sending an HTTP Web request to each of the distributed server nodes present in the distributednodes list parameter in the URL QueryString. The HTTP Web request sent to each of the distributed server nodes uses the same URL that initiated this request, but with no distributednodes parameter in its URL.

Out on the distributed server nodes that receive the MAP request, the same two ASP.NET pipeline steps are sequentially executed, but because there is no distributednodes parameter in its URL, the Begin Request Processing and the End Request Processing steps essentially query only that node. The MAP data retrieval method specified with the PathInfoAttribute is executed out on that edge distributed server node in order to get the local data from that node. The data that’s returned in the response stream from each edge distributed server node to the server node that initiated the original request is then stored in the HttpContext using the URL as the key so it can be retrieved later during the final REDUCE step.

On the local server node that initiated the original request, the MAP data retrieval method specified with the PathInfoAttribute is executed in order to get the local data that’s on the local server node that initiated the original request. The data from the local server node is then stored in the HttpContext using the URL as the key so it can be retrieved in the final REDUCE step.

During the End Request Processing step, the IHttpModule executes the REDUCE step by looking in the HttpContext for all the data and the REDUCE parameters that were supplied in the URL QueryString (which can consist of predefined options like sum= and union=, sort=, or custom function options like reduce=CustomReduceFunction). Next, it merges/reduces all the data sets from all the nodes into one final result set using the specified REDUCE param­eter. Finally, it serializes the final result set to JSON and returns that result set in the response stream to the client that initiated the original AJAX MapReduce request. If no REDUCE parameters are specified, then all the raw data from all the nodes is returned. Figure 7 shows an example of the resulting output in the test page.

The Resulting Output in a Test Page
Figure 7 The Resulting Output in a Test Page

Comparing the Sample Project with Hadoop

Figure 8 compares basic MapReduce functionality in Hadoop and the sample project.

Figure 8 A Comparison of Basic MapReduce Functionality

Hadoop Sample Project
Java MAP job function that counts words Any method decorated with PathInfoAttribute is like a MAP job function
Java REDUCE job function that sums the word counts Reduce parameters in the URL QueryString (such as sum=) is like a REDUCE job function that does sum operation
Writable interface (serialization) [Serializable()] attribute (serialization)
WritableComparable interface (sorting)

IComparer<T> interface (sorting)

IEqualityComparer<T> interface (sum,union)

The input to the MAP job is a set of <key,value> pairs, and the REDUCE job output is a set of <key,value> pairs The arguments for the methods marked with PathInfoAttribute are like input to the MAP job, and the reduce parameters in the URL QueryString do reduce operation and serialize results to JSON like the REDUCE job output

One common scenario in which MapReduce excels is counting the number of times a specific word appears in millions of documents. Figure 9 shows a comparison of some basic pseudocode that implements the Big Data equivalent of the famous “Hello World” sample program—the “Word Count Sample.” The figure shows the Hadoop Java code implementation and the corresponding C# code that could be used to accomplish the equivalent in the sample project. Keep in mind this code is merely pseudocode and is by no means correct or complete. It’s shown merely to illustrate possible ways to accomplish similar functionality in the two designs. Figure 10 shows the resulting output in the test page.

Figure 9 “Word Count Sample” Pseudocode Comparison

Hadoop MAP

public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output,
  Reporter reporter) throws IOException {
  String line = value.toString();
  StringTokenizer tokenizer = new StringTokenizer(line);
  while (tokenizer.hasMoreTokens()) {
    word.set(tokenizer.nextToken());
    output.collect(word, one);
  }
}

Hadoop REDUCE

public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable>
  output, Reporter reporter) throws IOException {
  int sum = 0;
  while(values.hasNext()) {
    sum += values.next().get();
  }
  output.collect(key, new IntWritable(sum));
}

Sample Project MAP

https://server/.../WordCount/Test.txt?distributednodes=Node1,Node2,Node3&union=Word&sum=Count
[PathInfoAttribute(PathInfo="/WordCount/{fileName}", ReturnItemType="Row")]
public HashSet<Row> GetWordCount(string fileName)
{
  HashSet<Row> rows = new HashSet<Row>();
  byte[] bytes = File.ReadAllBytes(fileName);
  string text = Encoding.ASCII.GetString(bytes);
  string[] words = text.Split(new char[ ]{ ' ', '\r', '\n' });
  foreach(string word in words)
  {
    dynamic row = new Row();
    row["Word"] = word;
    row["Count"] = 1;
  }
  return rows;
}

Sample Project REDUCE

https://server/.../WordCount/Test.txt?distributednodes=Node1,Node2,Node3&union=Word&sum=Count

The Resulting Output in a Test Page
Figure 10 The Resulting Output in a Test Page

Figure 11 shows how to accomplish basic MapReduce functionality in the sample project. Notice how the object entity in the URL is mapped to the equivalent of the MAP step function with the PathInfoAttribute, and how the REDUCE parameter options in the URL QueryString, like sum= and reduce=, equate to the equivalent REDUCE step functionality in Hadoop.

Figure 11 Basic MapReduce Functionality in the Sample Project

          (Like Hadoop MAP)                    (Like Hadoop REDUCE)

https://server/.../BookSales?distributednodes=Node1,Node2,Node3&union=BookName&sum=Sales
[PathInfoAttribute(PathInfo="/BookSales", ReturnItemType="Book")]
public BookSales GetTotalBookSales()
{
}

          (Like Hadoop MAP)                    (Like Hadoop REDUCE)

https://server/.../Alarms?distributednodes=Node1,Node2,Node3&reduce=UnionIfNotDeleted
[PathInfoAttribute(PathInfo="/Alarms", ReturnItemType="Alarm")]
public Alarms GetAlarms()
{
}
private static HashSet<Alarm> UnionIfNotDeleted(HashSet<Alarm> originalData,
  HashSet<Alarm> newData)
{
}

Additional Examples

Figure 12 shows additional ways to accomplish Map­Reduce-type functionality and how the RESTful URL maps to the methods. The method’s implementation code is omitted for the sake of brevity. The code could be implemented in many different ways, ranging from: an algorithm that counts words; to BookSales table data in the database at each book store in a chain of book stores; to a business object data API method that returns a collection of business object classes; to sensor data from distributed locations across the country. It’s all up to your imagination—have fun!

Figure 12 Miscellaneous Ways to Accomplish MapReduce with the Sample Project

Example

https://server/.../BookSales/Book A?distributednodes=Node1,Node2,Node3&union=BookName&sum=Sales
[PathInfoAttribute(PathInfo="/BookSales/{bookName}", ReturnItemType="Book")]
public BookSales GetTotalBookSales(string bookName)
{
}

Example

https://server/.../Alarms?distributednodes=Node1,Node2,Node3&union=AlarmID
[PathInfoAttribute(PathInfo="/Alarms", ReturnItemType="Alarm")]
public Alarms GetAlarms()
{
}

Example

https://server/.../Alarms?distributednodes=Node1,Node2,Node3&reduce=UnionIfNotDeleted
[PathInfoAttribute(PathInfo="/Alarms", ReturnItemType="Alarm")]
public Alarms GetAlarms()
{
}
private static HashSet<Alarm> UnionIfNotDeleted(HashSet<Alarm> originalData,
  HashSet<Alarm> newData)
{
}

Example

https://server/.../SensorMeasurements/2?distributednodes=Node1,Node2,Node3&union=SensorID
[PathInfoAttribute(PathInfo="/SensorMeasurements/{sensorID}",
  ReturnItemType="SensorMeasurement")]
public SensorMeasurements GetSensorMeasurements(int sensorID)
{
}

Example

https://server/.../MP3Songs?distributednodes=Node1,Node2,Node3&union=SongTitle
[PathInfoAttribute(PathInfo="/MP3Songs", ReturnItemType=" MP3Song")]
public MP3Songs GetMP3Songs()
{
}

Wrapping Up

In this article we presented a simple and basic infrastructure for Map­Reduce functionality that can be accessed RESTfully over HTTP and consumed on a small device such as a smartphone or tablet. We also touched on transforming a single node application into a basic distributed system.

There are lots of extensive MapReduce infrastructures that do pretty much everything under the sun, but the goal and focus of this article was to make a basic MapReduce mechanism that is extremely easy to set up, and is simple to use.

The simplicity of setting up and expanding our solution gives you the ability to test your idea small (on several laptops) and easily scale up big once your idea has been proven (on as many servers as you need.)

The sample project allows you to use your existing business object data API methods in the Map step by simply applying an attribute to the method that maps the URL path to that method. It also allows you to control the Reduce step by adding simple commands to the URL QueryString, such as a combining operation (like union) on the data based on a primary key.

By applying the attribute to the data API methods in an existing business object and specifying a union command based on a primary key field in the URL, you get a simple mechanism that can transform parts of a single node application into a basic distributed system with very little effort, providing the ability to have a centralized global view of the entire distributed system in one place. For example, a business data object that normally only retrieves items on that single node can now retrieve items on multiple nodes, merged based on a primary key field in the item. Data for the local offices could all be correlated or aggregated on demand and viewed in one screen at the headquarters.

For small devices, the “heavy lifting” happens on the IIS servers in the mesh and not on the small device. Thus, for example, a smartphone app can enjoy the MapReduce paradigm by making one simple HTTP call, using minimal phone resources.


Doug Duerner is a senior software engineer with more than 15 years designing and implementing large-scale systems with Microsoft technologies. He has worked for several Fortune 500 banking institutions and for a commercial software company that designed and built the large-scale distributed network management system used by the Department of Defense’s Defense Information Systems Agency (DISA) for its “Global Information Grid” and the Department of State (DoS). He is a geek at heart, focusing on all aspects, but enjoys the most complex and challenging technical hurdles, especially those that everyone says “can’t be done.” Duerner can be reached at coding.innovation@gmail.com.

Yeon-Chang Wang is a senior software engineer with more than 15 years designing and implementing large-scale systems with Microsoft technologies. He, too, has worked for a Fortune 500 banking institution and for a commercial software company that designed and built the large-scale distributed network management system used by the Department of Defense’s Defense Information Systems Agency (DISA) for its “Global Information Grid” and the Department of State (DoS). He also designed and implemented a large-scale Driver Certification System for one of the world’s largest chip manufacturers. Wang has a master’s degree in Computer Science. He eats complex problems for dinner and can be reached at yeon_wang@yahoo.com.

Thanks to the following Microsoft technical experts for reviewing this article: Mikael Sitruk and Mark Staveley
Mikael Sitruk is a Senior Software Engineer with more than 17 years in design and implementation of large scale system with wide range of technologies. Prior Microsoft, he worked for a Telecom software provider leader and implemented several novel products. He is passionate about distributed system, big data and machine learning. He worked several years with the Hadoop ecosystem and with No-Sql technologies like Cassandra and HBase.  Mikael can be reached at Mikael.Sitruk@outlook.com 

Mark Staveley is a Senior Programmer with Azure’s Big Compute Team. Prior to moving to Azure, Mark was part of Microsoft Research (responsible for overseeing their Big Data Management and Processing Program) and was also previously part of the Xbox One Compilers and Code Gen Team (working on Game Engine Performance and Compatibility). Mark holds a BSc from Queen’s University, a MSc from the University of Waikato, and a PhD in Computer Science / Computational Chemistry from Memorial University.  Prior to Microsoft, Mark was a researcher with two of Canada’s largest High Performance Computing Centers.