Export (0) Print
Expand All

The Zen of Recursion

SQL Server 2000
This article may contain URLs that were valid when originally published, but now link to sites or pages that no longer exist. To maintain the flow of the article, we've left these URLs in the text, but disabled the links. Focus
The Zen of Recursion
Use SQL Server stored procedures recursively to traverse hierarchies

Chris Behrens

Recursion is the property or quality of self-reference. In programming terms, recursion refers to a routine that repeatedly calls itself until a certain condition is fulfilled. One powerful way that SQL Server database developers can use recursion is to traverse hierarchies—collections of items that have parent-child relationships. People deal with hierarchies all the time: Every day, you have a series of tasks you must prioritize and then perform. Also, most people work within a hierarchy of employees. The classic way to represent a hierarchy within a database is to add a parent key to the object. A parent key is simply the primary key of an object that has a parent relationship to the current object. To see an example, let's define a management structure. Table 1 defines a typical employee, Chris; the record for Chris's boss appears in Table 2. Note that the boss's record has an ISupervisorID of 0 and that Chris's record has an ISupervisorID of 212, the IEmployeeID value in the boss's record. These values specify that the boss record is a parent of Chris's record. The matching values of ISupervisorID in Chris's record and IEmployeeID in the boss's record form a data link—or parent key—between the two records.

Now, as a simple example of stored-procedure recursion in a hierarchy, let's say that you're tasked with generating a list of all employees and their subordinates. You need to write a script to retrieve all the records at the top level (i.e., the employees whose ISupervisorIDs are 0), then grab the child records for each record in that set, and so on in a recursive process. Listing 1 contains pseudocode for such a process, and Table 3 is an example of output that meets your needs.

This approach is logically sound, but when you turn the pseudocode into the SQL code in Listing 2 and run it, you get the execution error that Figure 1 shows. The code worked for a level, then failed because SQL Server declared the cursor named cur_Level with a connection-level scope. This means that during its lifetime, the cursor can be accessed across every query that occurs on the same connection. Because the recursive calls occur on this connection, they inherit the cursor, causing an error whenever there's an attempt to create another cursor with the same name. You have two ways of fixing this problem.

First, you could change the Default to local cursor database option setting (support for local cursors is a feature of SQL Server 7.0). By default, this option is set to GLOBAL to conform to the requirements of earlier SQL Server releases, but you can set the option to LOCAL to bypass this problem if you think you'll be using locally scoped cursors often.

A second option for getting rid of the error is to simply add the LOCAL keyword to the declaration, as in Listing 3. After you add the LOCAL keyword, the code performs perfectly, giving you the results that Figure 2 shows.

Now let's look at a more common and useful example: a threaded discussion list. Recursion is a good tool to use in creating a discussion list because messages are presented in a parent-child format, with base-level messages first, followed by replies to that base-level message.

A threaded discussion list is a typical hierarchy: Messages are either top-level subjects or responses (children) to other messages. The message structure looks like Table 4. Let's say that this message board becomes enormously popular, receiving thousands of hits per second. An effective scaling strategy for a high-use application is forward caching, which means that you pregenerate the content to display later. Inserting the content into a cache table ahead of time conserves the server resources that the board's high-traffic conditions demand. Because performance is so important, forward caching is a better option than simply selecting the content through stored procedures.

When you use ADO to access the output of these procedures, the results are represented as a series of recordsets that are accessible through ADO's NextRecordset method (for more information about NextRecordset, see MSDN online documentation). The process of generating messages for users of the discussion list comprises two steps. First, when the user selects the main page, SQL Server generates the message board from a forward cache table called tblForumCache. Then, when the user creates a new message or responds to one, SQL Server uses the recursive stored procedure to regenerate the cache table.

Besides simply displaying the message, you'll want to display how many replies a particular message has, as well as the date and time that each message was created. The final output should look something like Table 5.

You can improve the user experience with this application. Let's say that for expanding and collapsing these elements, you're going to use a Dynamic HTML (DHTML) function that relies on each element's ancestry (i.e., the hierarchy of parent and child relationships up to the top of the data structure). Here's an example message that has two levels of replies:

-   SQL Server - entered 6/28/00 3:15 PM by cbehrens
   -   Stored Procedures - entered 6/28/00 4:15 PM by mjohnson
      -   Recursion - entered 6/28/00 5:15 PM by hsmith

For each message, you need to generate an ancestry code consisting of the IMessageID values of all the messages that came before. This ancestry code, which the DHTML functions will use, allows for the locating and sorting of messages that belong to each top-level message.

(100) -    SQL Server  - entered 6/28/00 3:15 PM by cbehrens
(100_101) -    Stored Procedures  - entered 6/28/00 4:15 PM by mjohnson
(100_101_102) -    Recursion  - entered 6/28/00 5:15 PM by hsmith

Each line begins with an ancestry code; the message titled "SQL Server" has nothing but its own ID (100) because it's a base-level message. The message titled "Stored Procedures" is a child of the first message, so it carries a code of 100_101: the message ID of its parent plus its own message ID. And, the message titled "Recursion" has a code of 100_101_102—namely, the IDs of each message in its ancestry plus its own message ID.

Table 6 shows the structure of the cache table that you use to pregenerate content, as I discussed earlier. Table 7 lists each of the query elements and their definitions.

The first thing you need to do in this process is to clear the old cache. To do this, simply delete the old data:

DELETE FROM tblForumCache

Next, you need to use a separate procedure to perform the recursion. Let's name this procedure sp_RecurseMessages and give it two parameters: IMessageID and IDepthLevel. To call the procedure, run this command:

EXEC sp_RecurseMessages 0, 0

These parameters tell the procedure to start with a depth level and message ID of 0. You need to specify a 0 value so that the procedure can pass the information back and forth between the levels of recursion. The pseudocode for this procedure is similar to that for the previous procedure, with a few additions, as Listing 4 shows.

The finished recursive code, which Listing 5 shows, opens the cursor initially by selecting the children of the designated parent (initially the parent will be the top-level message because the parent ID is 0). After the procedure enters the record loop, it must update the parent record with an additional reply count that corresponds to the parent message. The code then runs a second procedure to obtain the ancestry of the message and inserts the ancestry values into the cache table.

Next, the stored procedure searches for children of the current message, after incrementing the depth variable to show that the code is proceeding down a level. Then the original procedure calls itself, proceeding through the same steps and perhaps down to the children's children. After the children recordsets are exhausted, the procedure that was called recursively goes up a level, returning control to the calling procedure, and decrements the depth count to reflect that change in level.

Note that the code initializes the @szAncestry variable before passing it to the sp_GetAncestry function. You could optimize the sp_GetAncestry procedure in various ways, such as getting the reply count all at once rather than running a query each time, but this method is a good and simple way to present this stored procedure.

Finally, you need to create the function that actually yields the ancestry. In this case, you don't want a recordset result; a simple string will suffice. To get the string value of the procedure, you declare the @szAncestry variable that you're passing in as an OUTPUT parameter:

CREATE PROCEDURE sp_GetAncestry
@szPath varchar(500) OUTPUT,
@lMessageID int

The rest of the stored procedure, which Listing 6 shows, essentially repeats the simple recursion function. Notice that the procedure again declares the cursor as local to avoid scoping problems. The code then processes down the hierarchy and back up, picking up the message ID with an underscore along the way, to form the ancestry code that the procedure needs.

You can implement this procedure by placing a trigger on UPDATE, INSERT, and DELETE events in the tblMessage table. Then, when a message is updated, inserted, or deleted, the trigger clears the cache and executes the cache-rebuilding function. Although this approach will cause a slight hiccup when the user creates a message, 90 percent of the users, who won't post a message anyway, will enjoy greatly improved performance because of the forward caching.

To optimize this flow even further, you could try the following three tricks:

  • Generate ancestry gradually. You could use a series of stored procedures to build a history string as the procedure traverses the hierarchy, trimming and adding on where necessary. Doing so would save the overhead of the sp_GetAncestry call.
  • Generate ReplyCount all at once. You could run one query to determine how many children a message has, thus avoiding the multiple calls to update the record.
  • Pregenerate the forum content. You could further develop the trigger so that after it creates the cache table, it transforms the table's contents into an HTML file that would serve as the forum. Figure 3 shows a finished forum screen. This approach would save the Web server the overhead of dynamically generating the file, and it would save the database server an open connection and the overhead of querying the cache table.

Recursion can be a powerful tool for dealing with hierarchical data. With the new LOCAL keyword and a little cleverness, you can leverage recursion in SQL Server 7.0 with terrific results.

LISTING 1: Pseudocode for Simple Recursive Procedure

Open cursor for current level:
Parameter: @lEmployeeID
SELECT lEmployeeID AS ID, szEmployeeLastName + ', ' + szEmployeeFirstName, szRank AS Rank FROM tblEmployee 
WHERE lSupervisorID = 
@lEmployeeID ORDER BY dtDateTimeCreated
	SELECT @lEmployeeID AS ID, @Name AS Name, @Rank AS Rank
	Execute this procedure recursively with the current @lEmployeeID
	Scroll next to get the next values
Close our cursor and perform cleanup.

LISTING 2: Simple Recursive Procedure

CREATE PROCEDURE sp_SimpleRecurse
@lEmployeeID int
AS
DECLARE @Name varchar(500)
DECLARE @Rank varchar(500)
DECLARE cur_Level CURSOR FOR 
SELECT lEmployeeID AS ID, szEmployeeLastName + ', ' + szEmployeeFirstName, szRank AS Rank FROM tblEmployee
 WHERE lSupervisorID = @lEmployeeID ORDER BY dtDateTimeCreated
OPEN cur_Level
	FETCH NEXT FROM cur_Level INTO @lEmployeeID, @Name, @Rank
	WHILE @@FETCH_STATUS = 0
	BEGIN
		SELECT @lEmployeeID AS ID, @Name AS Name, @Rank AS Rank
		EXEC sp_SimpleRecurse @lEmployeeID
		FETCH NEXT FROM cur_Level INTO @lEmployeeID, @Name, @Rank
	END
CLOSE cur_Level
DEALLOCATE cur_Level

LISTING 3: Recursive Procedure with LOCAL Keyword Added

DECLARE cur_Level CURSOR LOCAL FOR 
SELECT lEmployeeID AS ID, szEmployeeLastName + ', ' + szEmployeeFirstName, szRank AS Rank FROM tblEmployee
 WHERE lSupervisorID = @lEmployeeID ORDER BY dtDateTimeCreated

LISTING 4: Pseudocode for Second Procedure

OPEN cur_Level LOCAL cursor FOR 
SELECT lMessageID, szSubject FROM tblMessage WHERE lParentID = @lMessageID ORDER BY dtDateTimeCreated
	While still records in recordset
		--Add one to the replycount of our parent message.
		--Get the ancestry of this message, starting with its own id.
		INSERT INTO tblForumCache (...) VALUES (...)
		--Place the information into the cache table.
		SELECT @lDepthLevel = @lDepthLevel + 1
		--Increment the depthlevel by one.
		EXEC sp_RecurseMessages @lMessageID, @lDepthLevel
		--Perform the same operation for the kids.
		SELECT @lDepthLevel = @lDepthLEvel - 1
		--Since we've bubbled up a level, decrement the depthlevel.
	NEXT
	END
CLOSE cur_Level

LISTING 5: Recursive Code for Threaded Discussion List

CREATE PROCEDURE sp_RecurseMessages
@lMessageID int,
@lDepthLevel int
AS
DECLARE @szSubject varchar(500)
DECLARE @lParentID int
DECLARE @szAncestry varchar(500)
SELECT @lParentID = @lMessageID
DECLARE cur_Level CURSOR LOCAL FOR 
SELECT lMessageID, szSubject FROM tblMessage WHERE lParentID = @lMessageID ORDER BY dtDateTimeCreated
OPEN cur_Level
	FETCH NEXT FROM cur_Level INTO @lMessageID, @szSubject	
	WHILE @@FETCH_STATUS = 0
	BEGIN
		UPDATE tblForumCache SET lReplies = lReplies + 1 WHERE lMessageID = @lParentID
		SELECT @szAncestry = '_' + CONVERT(varchar(30), @lMessageID)
		EXEC sp_GetAncestry @szAncestry OUTPUT, @lMessageID
		INSERT INTO tblForumCache (lMessageID, szSubject, lDepthLevel, szID,lReplies) VALUES
		(@lMessageID, @szSubject, @lDepthLevel, @szAncestry, 0)
		SELECT @lDepthLevel = @lDepthLevel + 1
		EXEC sp_RecurseMessages @lMessageID, @lDepthLevel
		SELECT @lDepthLevel = @lDepthLevel - 1
		FETCH NEXT FROM cur_Level INTO @lMessageID, @szSubject
	END
CLOSE cur_Level
DEALLOCATE cur_Level

LISTING 6: Recursive Stored Procedure That Builds Hierarchy Path of Parent-Child Relationships

CREATE PROCEDURE sp_GetAncestry

@szPath varchar(500) OUTPUT,
@lMessageID int

AS

DECLARE cur_Level CURSOR LOCAL FOR 
SELECT lParentID FROM tblMessage WHERE lMessageID = @lMessageID 

OPEN cur_Level
	FETCH NEXT FROM cur_Level INTO @lMessageID
	WHILE @@FETCH_STATUS = 0
	BEGIN
		IF @lMessageID <> 0  AND @lMessageID IS NOT NULL
		BEGIN
		SELECT @szPath = "_" + CONVERT(varchar(20), 	@lMessageID) + @szPath
			EXEC sp_GetAncestry @szPath OUTPUT, @lMessageID
		END
		FETCH NEXT FROM cur_Level INTO @lMessageID
	END
CLOSE cur_Level
DEALLOCATE cur_Level

TABLE 1: Employee Definition
Field TitleField Value
lEmployeeID 110
szEmployeeFirstNameChris
szEmployeeLastNameBehrens
szRankPeon
lSupervisorID212

TABLE 2: Manager Definition
Field TitleField Value
lEmployeeID212
szEmployeeFirstNamePointy Haired
szEmployeeLastNameGuy
szRankManager
lSupervisorID0

TABLE 3: Example of Output of Hierarchical Employee Data
ID Name Rank
212Guy, Pointy HairedManager
110Behrens, ChrisPeon

TABLE 4: Message Structure
ElementDefinition
lMessageIDThe table's primary key
lForumIDThe forum in which the message will appear
lParentIDThe parent message's ID
tContentThe message text
dtDateTimeCreatedThe time the message was created

TABLE 5: Message Thread Output
lMessageIDLForumIDlParentIDszSubjecttContentdtDateTimeCreated
110Test MessageThis is our simple test message.5/24/00 2:04:47 PM
211Son of Test MessageThis is the response to our simple message.5/24/00 2:04:59 PM

TABLE 6: Cache Table Structure
lCacheIDLmessageIDszSubjectlDepthLevelszIDlReplies
11Test Message0_12
22Re: Test Message1_1_22
33This is a test message too0_1_2_30
44Re: Test Message1_1_41

TABLE 7: Description of Cache Table Columns
ElementDefinition
lCacheIDThe table's primary key
lMessageIDIdentifies the message within the main message table
szSubjectDesignates the text to appear in the forum view
lDepthlevelThe message's level within the hierarchy; use to indent the message and give a visual cue to the hierarchy
szIDThe message's ancestry
lRepliesHow many immediate children the message has

Bugs, comments, suggestions | Legal | Privacy | Advertising

Copyright © 2002 Penton Media, Inc. All rights reserved.

Show:
© 2015 Microsoft