# An Extensive Examination of Data Structures

**Visual Studio .NET 2003**

Scott Mitchell

4GuysFromRolla.com

March 2004

**Summary:** A graph, like a tree, is a collection of nodes and edges, but has no rules dictating the connection among the nodes. In this fifth part of the article series, we'll learn all about graphs, one of the most versatile data structures. (26 printed pages)

Download the Graphs.msi sample file.

#### Contents

Introduction

Examining the Different Classes of Edges

Creating a C# Graph Class

A Look at Some Common Graph Algorithms

Conclusion

Related Books

## Introduction

Part 1 and Part 2 of this article series focused on linear data structures—the array, the ArrayList, the Queue, the Stack, and the Hashtable. In Part 3, we began our investigation of trees. Recall that trees consist of a set of *nodes*, where all of the nodes share some connection to other nodes. These connections are referred to as *edges*. As we discussed, there are numerous rules as to how these connections can occur. For example, all nodes in a tree except for one—the root—must have precisely one *parent* node, while all nodes can have an arbitrary number of children. These simple rules ensure that, for any tree, the following three statements will hold true:

- Starting from any node, any other node in the tree can be reached. That is, there exists no node that can't be reached through some simple path.
- There are no
*cycles*. A cycle exists when, starting from some node*v*, there is some path that travels through some set of nodes*v*_{1},*v*_{2}, ...,*v*_{k}that then arrives back at*v*. - The number of edges in a tree is precisely one less than the number of nodes.

In Part 3 we focused on *binary trees*, which are a special form of trees. Binary trees are trees whose nodes have at most two children.

In this fifth installment of the article series we're going to examine *graphs*. Graphs are composed of a set of nodes and edges, just like trees, but with graphs there are no rules for the connections between nodes. With graphs, there is no concept of a root node, nor is there a concept of parents and children. Rather, a graph is a collection of interconnected nodes.

NoteRealize that all trees are graphs. A tree is a special case of a graph in which all nodes are reachable from some starting node and one that has no cycles.

Figure 1 shows three examples of graphs. Notice that graphs, unlike trees, can have sets of nodes that are disconnected from other sets of nodes. For example, graph (a) has two distinct, unconnected sets of nodes. Graphs can also contain cycles. Graph (b) has several cycles. One cycle is the path from v_{1} to v_{2} to v_{4} and back to v_{1}. Another one is from v_{1} to v_{2} to v_{3} to v_{5} to v_{4} and back to v_{1}. (There are also cycles in graph (a).) Graph (c) does not have any cycles, as it has one less edge than it does number of nodes, and all nodes are reachable. Therefore, it is a tree.

**Figure 1. Three examples of graphs**

Many real-world problems can be modeled using graphs. For example, search engines like Google model the Internet as a graph, where Web pages are the nodes in the graph and the links among Web pages are the edges. Programs like Microsoft MapPoint that can generate driving directions from one city to another use graphs, modeling cities as nodes in a graph and the roads connecting the cities as edges.

## Examining the Different Classes of Edges

Graphs, in their simplest terms, are a collection of nodes and edges, but there are different kinds of edges:

- Directed versus undirected edges
- Weighted versus unweighted edges

When talking about using graphs to model a problem, it is important to indicate the class of graph with which you are working. Is it a graph whose edges are directed and weighted, or one whose edges are undirected and weighted? In the next two sections, we'll discuss the differences between directed and undirected edges and weighted and unweighted edges.

### Directed and Undirected Edges

The edges of a graph provide the connections between one node and another. By default, an edge is assumed to be bidirectional. That is, if there exists an edge between nodes *v* and *u*, it is assumed that one can travel from *v* to *u* and from *u* to *v*. Graphs with bidirectional edges are said to be *undirected graphs* because there is no implicit direction in their edges.

For some problems, though, an edge might infer a one-way connection from one node to another. For example, when modeling the Internet as a graph, a hyperlink from Web page *v* linking to Web page *u* would imply that the edge between *v* to *u* would be unidirectional. That is, that one could navigate from *v* to *u*, but not from *u* to *v*. Graphs that use unidirectional edges are said to be *directed graphs*.

When drawing a graph, bidirectional edges are drawn as a straight line, as shown in Figure 1. Unidirectional edges are drawn as an arrow, showing the direction of the edge. Figure 2 shows a directed graph where the nodes are Web pages for a particular Web site and a directed edge from *u* to *v* indicates that there is a hyperlink from Web page *u* to Web page *v*. Notice that both *u* links to *v* and *v* links to *u*, two arrows are used—one from *v* to *u* and another from *u* to *v*.

**Figure 2. Model of pages making up a website**

### Weighted and Unweighted Edges

Typically graphs are used to model a collection of "things" and their relationship among these "things." For example, the graph in Figure 2 modeled the pages in a website and their hyperlinks. Sometimes, though, it is important to associate some cost with the connection from one node to another.

A map can be easily modeled as a graph, with the cities as nodes and the roads connecting the cities as edges. If we wanted to determine the shortest distance and route from one city to another, we first need to assign a cost from traveling from one city to another. The logical solution would be to give each edge a *weight*, such as how many miles it is from one city to another.

Figure 3 shows a graph that represents several cities in southern California. The cost of any particular path from one city to another is the sum of the costs of the edges along the path. The shortest path, then, would be the path with the least cost. In Figure 3, for example, a trip from San Diego to Santa Barbara is 210 miles if driving through Riverside, then to Barstow, and then to Santa Barbara. The shortest trip, however, is to drive 100 miles to Los Angeles, and then another 30 miles up to Santa Barbara.

**Figure 3. Graph of California cities with edges valued as miles**

Realize that the directionality and weight of edges are *orthogonal*. That is, a graph can have one of four arrangements of edges:

- Directed, weighted edges
- Directed, unweighted edges
- Undirected, weighted edges
- Undirected, unweighted edges

The graphs in Figure 1 had undirected, unweighted edges. Figure 2 had directed, unweighted edges, and Figure 3 used undirected, weighted edges.

### Sparse Graphs and Dense Graphs

While a graph could have zero or a handful of edges, typically a graph will have more edges than it has nodes. What's the maximum number of edges a graph could have, given *n* nodes? It depends on whether the graph is directed or undirected. If the graph is directed, then each node could have an edge to every other node. That is, all *n* nodes could have *n* – 1 edges, giving a total of *n* * (*n* – 1) edges, which is nearly *n*^{2}.

NoteFor this article, I am assuming nodes are not allowed to have edges to themselves. In general, though, graphs allow for an edge to exist from a nodevback to nodev. If self-edges are allowed, the total number of edges for a directed graph would ben^{2}.

If the graph is undirected, then one node, call it *v*_{1}, could have an edge to each and every other node, or *n* – 1 edges. The next node, call it *v*_{2}, could have at most *n* – 2 edges because an edge from *v*_{2} to already exists. The third node, *v*_{3}, could have at most *n* – 3 edges, and so forth. Therefore, for *n* nodes, there would be at most (*n* – 1) + (*n* – 2) + ... + 1 edges. As you might have guessed, summed up this comes to [*n* ** *(*n*-1)] / 2, or exactly half as many edges as a directed graph.

If a graph has significantly less than *n*^{2} edges, the graph is said to be *sparse*. For example, a graph with *n* nodes and *n* edges, or even 2*n* edges would be said to be sparse. A graph with close to the maximum number of edges is said to be *dense*.

When using graphs in an algorithm it is important to know the ratio between nodes and edges. As we'll see later on in this article, the asymptotic running time operations performed on a graph is typically expressed in terms of the number of nodes and edges in the graph.

## Creating a C# Graph Class

While graphs are a very common data structure used in a wide array of different problems, there is no built-in graph data structure in the .NET Framework. Part of the reason is because an efficient implementation of a **Graph** class depends on a number of factors specific to the problem at hand. For example, graphs are typically modeled in one of two ways:

- Adjacency list
- Adjacency matrix

These two techniques differ in how the nodes and edges of the graph are maintained internally by the **Graph** class. Let's examine both of these approaches and weigh the pros and cons of each method.

### Representing a Graph Using an Adjacency List

In Part 3 we created a C# class for binary trees, called **BinaryTree**. Recall that each node in a binary tree was represented by a **Node** class. The `Node`

class contained three properties:

**Value**, which held the value of the node, an**object****Left**, a reference to the`Node`

's left child**Right**, a reference to the`Node`

's right child

Clearly the **Node** class and **BinaryTree** classes are not sufficient for a graph. First, the **Node** class for a binary tree allows for only two edges—a left and right child. For a more general graph, though, there could be an arbitrary number of edges emanating from a node. Also, the **BinaryTree** class contains a reference to a single node, the root. But with a graph, there is no single point of reference. Rather, the graph would need to know about *all* of its nodes.

One option, then, is to create a **Node** class that has as one of its properties an array of **Node** instances, which represent the **Node**'s neighbors. Our **Graph** class would also have an array of **Node** instances, with one element for each of the nodes in the graphs. Such a representation is called an *adjacency list* because each node maintains a list of adjacent nodes. Figure 4 depicts an adjacency list representation in graphical form.

**Figure 4. Adjacency list representation in graphical form**

Notice that with an undirected graph, an adjacency list representation duplicated the edge information. For example, in adjacency list representation (b) in Figure 4, the node *a* has *b* in its adjacency list, and node *b* also has node *a* in its adjacency list.

Each node has precisely as many Nodes in its adjacency list as it has neighbors. Therefore, an adjacency list is a very space-efficient representation of a graph. You never store more data than needed. Specifically, for a graph with *V* nodes and *E* edges, a graph using an adjacency list representation will require *V* + *E* `Node`

instances for a directed graph and *V* + 2*E* `Node`

instances for an undirected graph.

While Figure 4 does not show it, adjacency lists can also be used to represent weighted graphs. The only addition is that for each Node *n*'s adjacency list, each `Node`

instance in the adjacency list needs to store the cost of the edge from *n*.

The one downside of an adjacency list is that determining if there is an edge from some node *u* to *v* requires that *u*'s adjacency list be searched. For dense graphs, *u* will likely have many `Node`

s in its adjacency list. Determining if there is an edge between two nodes, then, takes linear time for dense adjacency list graphs. Fortunately, when using graphs we'll likely not need to determine if there exists an edge between two particular nodes. More often than not, we'll want to simply enumerate *all* the edges of a particular node.

### Representing a Graph Using an Adjacency Matrix

An alternative method for representing a graph is to use an *adjacency matrix*. For a graph with *n* nodes, an adjacency matrix is an *n* x *n* two-dimensional array. For weighted graphs the array element (*u*, *v*) would give the cost of the edge between *u* and *v* (or, perhaps -1 if no such edge existed between *u* and *v)*. For an unweighted graph, the array could be an array of Booleans, where a True at array element (*u*, *v*) denotes an edge from *u* to *v* and a False denotes a lack of an edge.

Figure 5 depicts how an adjacency matrix representation in graphical form.

**Figure 5. Adjacency matrix representation in graphical form**

Note that undirected graphs display symmetry along the adjacency matrix's diagonal. That is, if there is an edge from *u* to *v* in an undirected graph then there will be two corresponding array entries in the adjacency matrix: (*u*, *v*) and (*v*, *u*).

Since determining if an edge exists between two nodes is simply an array lookup, this can be determined in constant time. The downside of adjacency matrices is that they are space inefficient. An adjacency matrix requires an *n*^{2} element array, so for sparse graphs much of the adjacency matrix will be empty. Also, for undirected graphs half of the graph is repeated information.

While either an adjacency matrix or adjacency list would suffice as an underlying representation of a graph for our **Graph** class, let's move forward using the adjacency list model. I chose this approach primarily because it is a logical extension from the **Node** and **BinaryTree** classes that we've already created together.

### Creating the Node Class

The **Node** class represents a single node in the graph. When working with graphs, the nodes typically represent some entity. Therefore, our **Node** class contains a **Data** property of type **object** that can be used to store any sort of data associated with the node. Furthermore, we'll need some way to easily identify nodes, so let's add a string **Key** property, which serves as a unique identifier for each **Node**.

Since we are using the adjacency list technique to represent the graph, each `Node`

instance needs to have a list of its neighbors. If the graph uses weighted edges, the adjacency list also needs to store the weight of each edge. To manage this adjacency list, we'll first need to create an **AdjacencyList** class.

#### The AdjacencyList and EdgeToNeighbor classes

A `Node`

contains an **AdjacencyList** class, which is a collection of edges to the `Node`

's neighbors. Since an **AdjacencyList** stores a collection of edges, we first need to create a class that represents an edge. Let's call this class **EdgeToNeighbor**, since it models an edge that extends to a neighboring node. Since we might want to associate a weight with this edge, **EdgeToNeighbor** needs two properties:

**Cost**, an integer value indicating the weight of the edge**Neighbor**, a Node reference

The **AdjancencyList** class, then, is derived from the **System.Collections.CollectionBase** class, and is simply a strongly-typed collection of `EdgeToNeighbor`

instances. The code for **EdgeToNeighbors** and **AdjacencyList** is shown below:

public class EdgeToNeighbor { // private member variables private int cost; private Node neighbor; public EdgeToNeighbor(Node neighbor) : this(neighbor, 0) {} public EdgeToNeighbor(Node neighbor, int cost) { this.cost = cost; this.neighbor = neighbor; } public virtual int Cost { get { return cost; } } public virtual Node Neighbor { get { return neighbor; } } } public class AdjacencyList : CollectionBase { protected internal virtual void Add(EdgeToNeighbor e) { base.InnerList.Add(e); } public virtual EdgeToNeighbor this[int index] { get { return (EdgeToNeighbor) base.InnerList[index]; } set { base.InnerList[index] = value; } } }

The **Node** class's **Neighbors** property exposes the `Node`

's internal `AdjacencyList`

member variable. Notice that the **AdjacencyList** class's **Add()** method is marked `internal`

so that only classes in the assembly can add an edge to a `Node`

's adjacency list. This is done so that the developer using the **Graph** class can only modify the graph's structure through the **Graph** class members and not indirectly through the `Node`

's **Neighbors** property.

#### Adding edges to a node

In addition to its **Key**, **Data**, and **Neighbors** properties, the **Node** class needs to provide a method to allow the developer manipulating the **Graph** class to add an edge from itself to a neighbor. Recall that with the adjacency list approach, if there exists an undirected edge between nodes *u* and *v*, then *u* will have a reference to *v* in its adjacency list *and* *v* will have a reference to *u* in its adjacency list. `Node`

s should only be responsible for maintaining their own adjacency lists, and not that of others `Node`

s in the graph. As we'll see later, the **Graph** class contains methods to add either directed or undirected edges between two nodes.

To make the job of the **Graph** class adding an edge between two `Node`

s easier, the **Node** class contains a method for adding a directed edge from itself to some neighbor. This method, **AddDirected()**, takes in a `Node`

instance and an optional weight, creates an `EdgeToNeighbor`

instance, and adds it to the `Node`

's adjacency list. The following code highlights this process:

protected internal virtual void AddDirected(Node n) { AddDirected(new EdgeToNeighbor(n)); } protected internal virtual void AddDirected(Node n, int cost) { AddDirected(new EdgeToNeighbor(n, cost)); } protected internal virtual void AddDirected(EdgeToNeighbor e) { neighbors.Add(e); }

### Building the Graph Class

Recall that with the adjacency list technique, the graph maintains a list of its nodes. Each node, then, maintains a list of adjacent nodes. So, in creating the **Graph** class we need to have a list of `Node`

s. We could opt to use an ArrayList to maintain this list, but a more efficient approach would be to use a Hashtable. A Hashtable here is a more sensible approach because in the methods used to add an edge in the **Graph** class, we'll need to make sure that the two `Node`

s specified to add an edge between both exist in the graph. With an ArrayList we'd have to linearly search through the array to find both `Node`

instances; with a Hashtable we can take advantage of a constant-time lookup. (For more information on Hashtables and their asymptotic running times, read Part 2 of the article series.)

The **NodeList** class, shown below, contains strongly-typed **Add()** and **Remove()** methods for adding and removing Node instances from the graph. It also has a **ContainsKey()** method, which determines if a particular `Node Key`

value already exists in the graph.

public class NodeList : IEnumerable { // private member variables private Hashtable data = new Hashtable(); // methods public virtual void Add(Node n) { data.Add(n.Key, n); } public virtual void Remove(Node n) { data.Remove(n.Key); } public virtual bool ContainsKey(string key) { return data.ContainsKey(key); } public virtual void Clear() { data.Clear(); } // Properties... public virtual Node this[string key] { get { return (Node) data[key]; } } // ... some methods and properties removed for brevity ... }

The **Graph** class contains a public property `Nodes`

, which is of type `NodeList`

. Additionally, the **Graph** class has a number of methods for adding directed or undirected, and weighted or unweighted edges between two existing nodes in the graph. The **AddDirectedEdge()** method takes in two `Node`

s and an optional weight, and creates a directed edge from the first `Node`

to the second. Similarly, the **AddUndirectedEdge() **method takes in two `Node`

s and an optional weight, adding a directed edge from the first to the second `Node`

, as well as a directed edge from the second back to the first `Node`

.

In addition to its methods for adding edges, the **Graph** class has a **Contains()** method that returns a Boolean indicating if a particular `Node`

exists in the graph or not. The germane code for the **Graph** class is shown below:

public class Graph { // private member variables private NodeList nodes; public Graph() { this.nodes = new NodeList(); } public virtual Node AddNode(string key, object data) { // Make sure the key is unique if (!nodes.ContainsKey(key)) { Node n = new Node(key, data); nodes.Add(n); return n; } else throw new ArgumentException("There already exists a node in the graph with key " + key); } public virtual void AddNode(Node n) { // Make sure this node is unique if (!nodes.ContainsKey(n.Key)) nodes.Add(n); else throw new ArgumentException("There already exists a node in the graph with key " + n.Key); } public virtual void AddDirectedEdge(string uKey, string vKey) { AddDirectedEdge(uKey, vKey, 0); } public virtual void AddDirectedEdge(string uKey, string vKey, int cost) { // get references to uKey and vKey if (nodes.ContainsKey(uKey) && nodes.ContainsKey(vKey)) AddDirectedEdge(nodes[uKey], nodes[vKey], cost); else throw new ArgumentException("One or both of the nodes supplied were not members of the graph."); } public virtual void AddDirectedEdge(Node u, Node v) { AddDirectedEdge(u, v, 0); } public virtual void AddDirectedEdge(Node u, Node v, int cost) { // Make sure u and v are Nodes in this graph if (nodes.ContainsKey(u.Key) && nodes.ContainsKey(v.Key)) // add an edge from u -> v u.AddDirected(v, cost); else // one or both of the nodes were not found in the graph throw new ArgumentException("One or both of the nodes supplied were not members of the graph."); } public virtual void AddUndirectedEdge(string uKey, string vKey) { AddUndirectedEdge(uKey, vKey, 0); } public virtual void AddUndirectedEdge(string uKey, string vKey, int cost) { // get references to uKey and vKey if (nodes.ContainsKey(uKey) && nodes.ContainsKey(vKey)) AddUndirectedEdge(nodes[uKey], nodes[vKey], cost); else throw new ArgumentException("One or both of the nodes supplied were not members of the graph."); } public virtual void AddUndirectedEdge(Node u, Node v) { AddUndirectedEdge(u, v, 0); } public virtual void AddUndirectedEdge(Node u, Node v, int cost) { // Make sure u and v are Nodes in this graph if (nodes.ContainsKey(u.Key) && nodes.ContainsKey(v.Key)) { // Add an edge from u -> v and from v -> u u.AddDirected(v, cost); v.AddDirected(u, cost); } else // one or both of the nodes were not found in the graph throw new ArgumentException("One or both of the nodes supplied were not members of the graph."); } public virtual bool Contains(Node n) { return Contains(n.Key); } public virtual bool Contains(string key) { return nodes.ContainsKey(key); } public virtual NodeList Nodes { get { return this.nodes; } } }

Notice that the **AddDirectedEdge()** and **AddUndirectedEdge()** methods check to ensure that the `Node`

s passed in exist in the graph. If they do not, an `ArgumentException`

is thrown. Also note that these two methods have a number of overloads. You can add two nodes by passing in `Node`

references or the `Node`

s' `Key`

values.

### Using the Graph Class

At this point we have created all of the classes needed for our graph data structure. We'll soon turn our attention to some of the more common graph algorithms, such as constructing a minimum spanning tree and finding the shortest path from a single node to all other nodes. But before we, do let's examine how to use the **Graph** class in a C# application.

Once we create an instance of the **Graph** class, the next task is to add the `Node`

s to the graph. This involves calling the **AddNode()** method of the **Graph** class for each node to add to the graph. Let's recreate the graph from Figure 2. We'll need to start by adding six nodes. For each of these nodes let's have the `Key`

be the Web page's filename. We'll leave the `Data`

as `null`

, although this might conceivably contain the contents of the file, or a collection of keywords describing the Web page content.

Graph web = new Graph(); web.AddNode("Privacy.htm", null); web.AddNode("People.aspx", null); web.AddNode("About.htm", null); web.AddNode("Index.htm", null); web.AddNode("Products.aspx", null); web.AddNode("Contact.aspx", null);

Next we need to add the edges. Since this is a directed, unweighted graph, we'll use the **AddDirectedEdge(u, v)** method of the **Graph** class to add an edge from *u* to *v*.

web.AddDirectedEdge("People.aspx", "Privacy.htm"); // People -> Privacy web.AddDirectedEdge("Privacy.htm", "Index.htm"); // Privacy -> Index web.AddDirectedEdge("Privacy.htm", "About.htm"); // Privacy -> About web.AddDirectedEdge("About.htm", "Privacy.htm"); // About -> Privacy web.AddDirectedEdge("About.htm", "People.aspx"); // About -> People web.AddDirectedEdge("About.htm", "Contact.aspx"); // About -> Contact web.AddDirectedEdge("Index.htm", "About.htm"); // Index -> About web.AddDirectedEdge("Index.htm", "Contact.aspx"); // Index -> Contacts web.AddDirectedEdge("Index.htm", "Products.aspx"); // Index -> Products web.AddDirectedEdge("Products.aspx", "Index.htm"); // Products -> Index web.AddDirectedEdge("Products.aspx", "People.aspx");// Products -> People

After these commands, `web`

represents the graph shown in Figure 2. Once we have constructed a graph, we'll want to answer some questions. For example, for the graph we just created, we might want to ask, "What's the least number of links a user must click to reach any Web page when starting from the homepage (`Index.htm`

)?" To answer such questions, we can usually fall back on using existing graph algorithms. In the next section we'll examine two common algorithms for weighted graphs:

- Constructing a minimum spanning tree
- Finding the shortest path from one node to all others

## A Look at Some Common Graph Algorithms

Because graphs are a data structure that can be used to model a bevy of real-world problems, there are unlimited numbers of algorithms designed to find solutions for common problems. To further our understanding of graphs, let's take a look at two of the most studied applications of graphs.

### The Minimum Spanning Tree Problem

Imagine that you work for the phone company and your task is to provide phone lines to a village with 10 houses, each labeled H1 through H10. Specifically this involves running a single cable that connects each home. That is, the cable must run through houses H1, H2, and so forth, up through H10. Due to geographic obstacles like hills, trees, rivers, and so on, it is not feasible to run the cable from one house to another.

Figure 6 shows this problem depicted as a graph. Each node is a house, and the edges are the means by which one house can be wired up to another. The weights of the edges dictate the distance between the homes. Your task is to wire up all ten houses using the least amount of telephone wiring possible.

**Figure 6. Graphical representation of hooking up a 10-home village with phone lines**

For a connected, undirected graph, there exists some subset of the edges that connect all the nodes and does not introduce a cycle. Such a subset of edges would form a tree (since it would comprise one less edge than vertices and is acyclic), and is called a *spanning tree*. There are typically many spanning trees for a given graph. Figure 7 shows two valid spanning trees from the Figure 6 graph. (The edges forming the spanning tree are bolded.)

**Figure 7.Spanning tree subsets based on Figure 6**

For graphs with weighted edges, different spanning trees have different associated costs, where the cost is the sum of the weights of the edges that comprise the spanning tree. A *minimum spanning tree*, then, is the spanning tree with a minimum cost.

There are two basic approaches to solving the minimum spanning tree problem. One approach is build up a spanning tree by choosing the edges with the minimum weight, so long as adding that edge does not create a cycle among the edges chosen thus far. This approach is shown in Figure 8.

**Figure 8. Minimum spanning tree that uses the edges with the minimum weight**

The other approach builds up the spanning tree by dividing the nodes of the graph into two disjoint sets: the nodes currently in the spanning tree and those nodes not yet added. At each iteration, the least weighted edge that connects the spanning tree nodes to a node in the spanning tree is added to the spanning tree. To start off the algorithm, some random start node must be selected. Figure 9 illustrates this approach in action, using H1 as the starting node. (In Figure 9 those nodes that are in the set of nodes in the spanning tree are shaded light yellow.)

**Figure 9. Prim method of finding the minimum spanning tree**

Notice that the techniques illustrated in Figure 8 and Figure 9 arrived at the same minimum spanning tree. If there is only one minimum spanning tree for the graph, then both of these approaches will reach the same conclusion. If, however, there are multiple minimum spanning trees, these two approaches might arrive with different results (both results will be correct, naturally).

NoteThe first approach we examined was discovered by Joseph Kruskal in 1956 at Bell Labs. The second technique was discovered in 1957 by Robert Prim, also a researcher at Bell Labs. There is a plethora of information on these two algorithms on the Web, including Java applets showing the algorithms in progress graphically (Kruskal's Algorithm | Prim's Algorithm), as well as source code in a variety of languages.

### Computing the Shortest Path from a Single Source

When flying from one city to another, part of the headache is finding a route that requires the fewest number of connections. No one likes their flight from New York to Los Angeles to go from New York to Chicago, then Chicago to Denver, and finally Denver to Los Angeles. Most people would rather have a direct flight straight from New York to Los Angeles.

Imagine, however, that you are not one of those people. Instead, you are someone who values his money much more than his time, and are most interested in finding the *cheapest* route, regardless of the number of connections. This might mean flying from New York to Miami, then Miami to Dallas, then Dallas to Phoenix, Phoenix to San Diego, and finally San Diego to Los Angeles.

We can solve this problem by modeling the available flights and their costs as a directed, weighted graph. Figure 10 shows such a graph.

**Figure 10. Modeling of available flights based on cost**

What we are interested in knowing is what is the shortest path from New York to Los Angeles. By inspecting the graph, we can quickly determine that it's from New York to Chicago to San Francisco and finally down to Los Angeles, but in order to have a computer accomplish this task we need to formulate an algorithm to solve the problem at hand.

Edgar Dijkstra, one of the most noted computer scientists of all time, invented the most commonly used algorithm for finding the shortest path from a source node to all other nodes in a weighted, directed graph. This algorithm, dubbed Dijkstra's Algorithm, works by maintaining two tables, each of which has a record for each node. These two tables are:

- A distance table, which keeps an up-to-date "best distance" from the source node to every other node.
- A route table, which, for each node
*n*, indicates what node was used to reach*n*to get the best distance.

Initially, the distance table has each record set to some high value (like positive infinity) except for the start node, which has a distance to itself of 0. The route table's rows are all set to `null`

. Also, a collection of nodes, *Q*, that need to be examined is maintained; initially, this collection contains all of the nodes in the graph.

The algorithm proceeds by selecting (and removing) the node from *Q* that has the lowest value in the distance table. Let this selected node be called *n* and the value in the distance table for *n* is *d*. For each of the *n*'s edges, a check is made to see if *d* plus the cost to get from *n* to that particular neighbor is less than the value for that neighbor in the distance table. If it is, then we've found a better way to reach that neighbor, and the distance and route tables are updated accordingly.

To help clarify this algorithm, let's begin applying it to the graph from Figure 10. Since we want to know the cheapest route from New York to Los Angeles we use New York as our source node. Our initial distance table, then, contains a value of infinity for each of the other cities, and a value of 0 for New York. The route table contains `null`

s for all entries, and *Q* contains all nodes (see Figure 11).

**Figure 11. Distance table and route table for determining cheapest fare**

Now, we start by extracting the city from *Q* that has the lowest value in the distance table, which is New York. We then examine each of New York's neighbors and check to see if the cost to fly from New York to that neighbor is less than the best cost we know of, namely the cost in the distance table. After this first check, we'd have removed New York from *Q* and updated the distance and route tables for Chicago, Denver, Miami, and Dallas.

**Figure 12. Step 2 in the process of determining the cheapest fare**

The next iteration gets the cheapest city out of *Q*, Chicago, and then checks its neighbors to see if there is a better cost. Specifically, we'll check to see if there's a better route for getting to San Francisco or Denver. Clearly the cost to get to San Francisco from Chicago—$75 + $25Los Angeles is less than Infinity, so San Francisco's records are updated. Also, note that it is cheaper to fly from Chicago to Denver than from New York to Denver ($75 + $20 < $100), so Denver is updated as well. Figure 13 shows the values of the tables and *Q* after Chicago has been processed.

**Figure 13. Table status after the third leg of the process is finished**

This process continues until there are no more nodes in *Q*. Figure 14 shows the final values of the tables when *Q* has been exhausted.

**Figure 14. Final results of determining the cheapest fare**

At the point of exhausting *Q*, the distance table contains the lowest cost from New York to each city. To determine the flight path to arrive at L.A., start by examining the L.A. entry in the route table and work back up to New York. That is, the route table entry for L.A. is San Francisco, meaning the last leg of the flight to L.A. leaves from San Francisco. The route table entry for San Francisco is Chicago, meaning you'll get to San Francisco through Chicago. Finally, Chicago's route table entry is New York. Putting this together we see that the flight path is from New York to Chicago to San Francisco to L.A.

NoteTo see a working implementation of Dijkstra's Algorithm in C#, check out the download for this article, which includes a testing application for theGraphclass that determines the shortest distance from one city to another using Dijkstra's Algorithm.

## Conclusion

Graphs are a commonly used data structure because they can be used to model many real-world problems. A graph consists of a set of nodes with an arbitrary number of connections, or edges, between the nodes. These edges can be either directed or undirected and weighted or unweighted.

In this article, we examined the basics of graphs and created a **Graph** class. This class was similar to the **BinaryTree** class created in Part 3, the difference being that instead of only have a reference for at most two edges, the `Node`

s of the **Graph** class could have an arbitrary number of references. This similarity is not surprising because trees are a special case of graphs.

In addition to creating a **Graph** class, we also looked at two common graph algorithms, the minimum spanning tree problem, and computing the shortest path from some source node to all other nodes in a weighted, directed graph. While we did not examine source code to implement these algorithms, there are plenty source-code examples available on the Internet. Also, the download included with this article contains a testing application for the **Graph** class that uses Dijkstra's Algorithm to compute the shortest route between two cities.

In the next installment, Part 6, we'll look at efficiently maintaining disjoint sets. Disjoint sets are a collection of two or more sets that do not share any elements in common. For example, with Prim's Algorithm for finding the minimum spanning tree, the nodes of the graph can be divided into two disjoint sets—the set of nodes that currently constitute the spanning tree and the set of nodes that are not yet in the spanning tree.

## Related Books

- Introduction to Algorithms by Thomas H. Cormen

**Scott Mitchell**, author of five books and founder of 4GuysFromRolla.com, has been working with Microsoft Web technologies for the past five years. Scott works as an independent consultant, trainer, and writer, and recently completed his Masters degree in Computer Science at the University of California – San Diego. He can be reached at mitchell@4guysfromrolla.com or through his blog at http://ScottOnWriting.NET.