The Zen of Recursion
The Zen of Recursion
Use SQL Server stored procedures recursively to traverse hierarchies
Recursion is the property or quality of self-reference. In programming terms, recursion refers to a routine that repeatedly calls itself until a certain condition is fulfilled. One powerful way that SQL Server database developers can use recursion is to traverse hierarchiescollections of items that have parent-child relationships. People deal with hierarchies all the time: Every day, you have a series of tasks you must prioritize and then perform. Also, most people work within a hierarchy of employees. The classic way to represent a hierarchy within a database is to add a parent key to the object. A parent key is simply the primary key of an object that has a parent relationship to the current object. To see an example, let's define a management structure. Table 1 defines a typical employee, Chris; the record for Chris's boss appears in Table 2. Note that the boss's record has an ISupervisorID of 0 and that Chris's record has an ISupervisorID of 212, the IEmployeeID value in the boss's record. These values specify that the boss record is a parent of Chris's record. The matching values of ISupervisorID in Chris's record and IEmployeeID in the boss's record form a data linkor parent keybetween the two records.
Now, as a simple example of stored-procedure recursion in a hierarchy, let's say that you're tasked with generating a list of all employees and their subordinates. You need to write a script to retrieve all the records at the top level (i.e., the employees whose ISupervisorIDs are 0), then grab the child records for each record in that set, and so on in a recursive process. Listing 1 contains pseudocode for such a process, and Table 3 is an example of output that meets your needs.
This approach is logically sound, but when you turn the pseudocode into the SQL code in Listing 2 and run it, you get the execution error that Figure 1 shows. The code worked for a level, then failed because SQL Server declared the cursor named cur_Level with a connection-level scope. This means that during its lifetime, the cursor can be accessed across every query that occurs on the same connection. Because the recursive calls occur on this connection, they inherit the cursor, causing an error whenever there's an attempt to create another cursor with the same name. You have two ways of fixing this problem.
First, you could change the Default to local cursor database option setting (support for local cursors is a feature of SQL Server 7.0). By default, this option is set to GLOBAL to conform to the requirements of earlier SQL Server releases, but you can set the option to LOCAL to bypass this problem if you think you'll be using locally scoped cursors often.
A second option for getting rid of the error is to simply add the LOCAL keyword to the declaration, as in Listing 3. After you add the LOCAL keyword, the code performs perfectly, giving you the results that Figure 2 shows.
Now let's look at a more common and useful example: a threaded discussion list. Recursion is a good tool to use in creating a discussion list because messages are presented in a parent-child format, with base-level messages first, followed by replies to that base-level message.
A threaded discussion list is a typical hierarchy: Messages are either top-level subjects or responses (children) to other messages. The message structure looks like Table 4. Let's say that this message board becomes enormously popular, receiving thousands of hits per second. An effective scaling strategy for a high-use application is forward caching, which means that you pregenerate the content to display later. Inserting the content into a cache table ahead of time conserves the server resources that the board's high-traffic conditions demand. Because performance is so important, forward caching is a better option than simply selecting the content through stored procedures.
When you use ADO to access the output of these procedures, the results are represented as a series of recordsets that are accessible through ADO's NextRecordset method (for more information about NextRecordset, see MSDN online documentation). The process of generating messages for users of the discussion list comprises two steps. First, when the user selects the main page, SQL Server generates the message board from a forward cache table called tblForumCache. Then, when the user creates a new message or responds to one, SQL Server uses the recursive stored procedure to regenerate the cache table.
Besides simply displaying the message, you'll want to display how many replies a particular message has, as well as the date and time that each message was created. The final output should look something like Table 5.
You can improve the user experience with this application. Let's say that for expanding and collapsing these elements, you're going to use a Dynamic HTML (DHTML) function that relies on each element's ancestry (i.e., the hierarchy of parent and child relationships up to the top of the data structure). Here's an example message that has two levels of replies:
- SQL Server - entered 6/28/00 3:15 PM by cbehrens - Stored Procedures - entered 6/28/00 4:15 PM by mjohnson - Recursion - entered 6/28/00 5:15 PM by hsmith
For each message, you need to generate an ancestry code consisting of the IMessageID values of all the messages that came before. This ancestry code, which the DHTML functions will use, allows for the locating and sorting of messages that belong to each top-level message.
(100) - SQL Server - entered 6/28/00 3:15 PM by cbehrens (100_101) - Stored Procedures - entered 6/28/00 4:15 PM by mjohnson (100_101_102) - Recursion - entered 6/28/00 5:15 PM by hsmith
Each line begins with an ancestry code; the message titled "SQL Server" has nothing but its own ID (100) because it's a base-level message. The message titled "Stored Procedures" is a child of the first message, so it carries a code of 100_101: the message ID of its parent plus its own message ID. And, the message titled "Recursion" has a code of 100_101_102namely, the IDs of each message in its ancestry plus its own message ID.
The first thing you need to do in this process is to clear the old cache. To do this, simply delete the old data:
DELETE FROM tblForumCache
Next, you need to use a separate procedure to perform the recursion. Let's name this procedure sp_RecurseMessages and give it two parameters: IMessageID and IDepthLevel. To call the procedure, run this command:
EXEC sp_RecurseMessages 0, 0
These parameters tell the procedure to start with a depth level and message ID of 0. You need to specify a 0 value so that the procedure can pass the information back and forth between the levels of recursion. The pseudocode for this procedure is similar to that for the previous procedure, with a few additions, as Listing 4 shows.
The finished recursive code, which Listing 5 shows, opens the cursor initially by selecting the children of the designated parent (initially the parent will be the top-level message because the parent ID is 0). After the procedure enters the record loop, it must update the parent record with an additional reply count that corresponds to the parent message. The code then runs a second procedure to obtain the ancestry of the message and inserts the ancestry values into the cache table.
Next, the stored procedure searches for children of the current message, after incrementing the depth variable to show that the code is proceeding down a level. Then the original procedure calls itself, proceeding through the same steps and perhaps down to the children's children. After the children recordsets are exhausted, the procedure that was called recursively goes up a level, returning control to the calling procedure, and decrements the depth count to reflect that change in level.
Note that the code initializes the @szAncestry variable before passing it to the sp_GetAncestry function. You could optimize the sp_GetAncestry procedure in various ways, such as getting the reply count all at once rather than running a query each time, but this method is a good and simple way to present this stored procedure.
Finally, you need to create the function that actually yields the ancestry. In this case, you don't want a recordset result; a simple string will suffice. To get the string value of the procedure, you declare the @szAncestry variable that you're passing in as an OUTPUT parameter:
CREATE PROCEDURE sp_GetAncestry @szPath varchar(500) OUTPUT, @lMessageID int
The rest of the stored procedure, which Listing 6 shows, essentially repeats the simple recursion function. Notice that the procedure again declares the cursor as local to avoid scoping problems. The code then processes down the hierarchy and back up, picking up the message ID with an underscore along the way, to form the ancestry code that the procedure needs.
You can implement this procedure by placing a trigger on UPDATE, INSERT, and DELETE events in the tblMessage table. Then, when a message is updated, inserted, or deleted, the trigger clears the cache and executes the cache-rebuilding function. Although this approach will cause a slight hiccup when the user creates a message, 90 percent of the users, who won't post a message anyway, will enjoy greatly improved performance because of the forward caching.
To optimize this flow even further, you could try the following three tricks:
- Generate ancestry gradually. You could use a series of stored procedures to build a history string as the procedure traverses the hierarchy, trimming and adding on where necessary. Doing so would save the overhead of the sp_GetAncestry call.
- Generate ReplyCount all at once. You could run one query to determine how many children a message has, thus avoiding the multiple calls to update the record.
- Pregenerate the forum content. You could further develop the trigger so that after it creates the cache table, it transforms the table's contents into an HTML file that would serve as the forum. Figure 3 shows a finished forum screen. This approach would save the Web server the overhead of dynamically generating the file, and it would save the database server an open connection and the overhead of querying the cache table.
Recursion can be a powerful tool for dealing with hierarchical data. With the new LOCAL keyword and a little cleverness, you can leverage recursion in SQL Server 7.0 with terrific results.
LISTING 1: Pseudocode for Simple Recursive Procedure
Open cursor for current level: Parameter: @lEmployeeID SELECT lEmployeeID AS ID, szEmployeeLastName + ', ' + szEmployeeFirstName, szRank AS Rank FROM tblEmployee WHERE lSupervisorID = @lEmployeeID ORDER BY dtDateTimeCreated SELECT @lEmployeeID AS ID, @Name AS Name, @Rank AS Rank Execute this procedure recursively with the current @lEmployeeID Scroll next to get the next values Close our cursor and perform cleanup.
LISTING 2: Simple Recursive Procedure
CREATE PROCEDURE sp_SimpleRecurse @lEmployeeID int AS DECLARE @Name varchar(500) DECLARE @Rank varchar(500) DECLARE cur_Level CURSOR FOR SELECT lEmployeeID AS ID, szEmployeeLastName + ', ' + szEmployeeFirstName, szRank AS Rank FROM tblEmployee WHERE lSupervisorID = @lEmployeeID ORDER BY dtDateTimeCreated OPEN cur_Level FETCH NEXT FROM cur_Level INTO @lEmployeeID, @Name, @Rank WHILE @@FETCH_STATUS = 0 BEGIN SELECT @lEmployeeID AS ID, @Name AS Name, @Rank AS Rank EXEC sp_SimpleRecurse @lEmployeeID FETCH NEXT FROM cur_Level INTO @lEmployeeID, @Name, @Rank END CLOSE cur_Level DEALLOCATE cur_Level
LISTING 3: Recursive Procedure with LOCAL Keyword Added
DECLARE cur_Level CURSOR LOCAL FOR SELECT lEmployeeID AS ID, szEmployeeLastName + ', ' + szEmployeeFirstName, szRank AS Rank FROM tblEmployee WHERE lSupervisorID = @lEmployeeID ORDER BY dtDateTimeCreated
LISTING 4: Pseudocode for Second Procedure
OPEN cur_Level LOCAL cursor FOR SELECT lMessageID, szSubject FROM tblMessage WHERE lParentID = @lMessageID ORDER BY dtDateTimeCreated While still records in recordset --Add one to the replycount of our parent message. --Get the ancestry of this message, starting with its own id. INSERT INTO tblForumCache (...) VALUES (...) --Place the information into the cache table. SELECT @lDepthLevel = @lDepthLevel + 1 --Increment the depthlevel by one. EXEC sp_RecurseMessages @lMessageID, @lDepthLevel --Perform the same operation for the kids. SELECT @lDepthLevel = @lDepthLEvel - 1 --Since we've bubbled up a level, decrement the depthlevel. NEXT END CLOSE cur_Level
LISTING 5: Recursive Code for Threaded Discussion List
CREATE PROCEDURE sp_RecurseMessages @lMessageID int, @lDepthLevel int AS DECLARE @szSubject varchar(500) DECLARE @lParentID int DECLARE @szAncestry varchar(500) SELECT @lParentID = @lMessageID DECLARE cur_Level CURSOR LOCAL FOR SELECT lMessageID, szSubject FROM tblMessage WHERE lParentID = @lMessageID ORDER BY dtDateTimeCreated OPEN cur_Level FETCH NEXT FROM cur_Level INTO @lMessageID, @szSubject WHILE @@FETCH_STATUS = 0 BEGIN UPDATE tblForumCache SET lReplies = lReplies + 1 WHERE lMessageID = @lParentID SELECT @szAncestry = '_' + CONVERT(varchar(30), @lMessageID) EXEC sp_GetAncestry @szAncestry OUTPUT, @lMessageID INSERT INTO tblForumCache (lMessageID, szSubject, lDepthLevel, szID,lReplies) VALUES (@lMessageID, @szSubject, @lDepthLevel, @szAncestry, 0) SELECT @lDepthLevel = @lDepthLevel + 1 EXEC sp_RecurseMessages @lMessageID, @lDepthLevel SELECT @lDepthLevel = @lDepthLevel - 1 FETCH NEXT FROM cur_Level INTO @lMessageID, @szSubject END CLOSE cur_Level DEALLOCATE cur_Level
LISTING 6: Recursive Stored Procedure That Builds Hierarchy Path of Parent-Child Relationships
CREATE PROCEDURE sp_GetAncestry @szPath varchar(500) OUTPUT, @lMessageID int AS DECLARE cur_Level CURSOR LOCAL FOR SELECT lParentID FROM tblMessage WHERE lMessageID = @lMessageID OPEN cur_Level FETCH NEXT FROM cur_Level INTO @lMessageID WHILE @@FETCH_STATUS = 0 BEGIN IF @lMessageID <> 0 AND @lMessageID IS NOT NULL BEGIN SELECT @szPath = "_" + CONVERT(varchar(20), @lMessageID) + @szPath EXEC sp_GetAncestry @szPath OUTPUT, @lMessageID END FETCH NEXT FROM cur_Level INTO @lMessageID END CLOSE cur_Level DEALLOCATE cur_Level
|TABLE 1: Employee Definition|
|Field Title||Field Value|
|TABLE 2: Manager Definition|
|Field Title||Field Value|
|TABLE 3: Example of Output of Hierarchical Employee Data|
|212||Guy, Pointy Haired||Manager|
|TABLE 4: Message Structure|
|lMessageID||The table's primary key|
|lForumID||The forum in which the message will appear|
|lParentID||The parent message's ID|
|tContent||The message text|
|dtDateTimeCreated||The time the message was created|
|TABLE 5: Message Thread Output|
|1||1||0||Test Message||This is our simple test message.||5/24/00 2:04:47 PM|
|2||1||1||Son of Test Message||This is the response to our simple message.||5/24/00 2:04:59 PM|
|TABLE 6: Cache Table Structure|
|2||2||Re: Test Message||1||_1_2||2|
|3||3||This is a test message too||0||_1_2_3||0|
|4||4||Re: Test Message||1||_1_4||1|
|TABLE 7: Description of Cache Table Columns|
|lCacheID||The table's primary key|
|lMessageID||Identifies the message within the main message table|
|szSubject||Designates the text to appear in the forum view|
|lDepthlevel||The message's level within the hierarchy; use to indent the message and give a visual cue to the hierarchy|
|szID||The message's ancestry|
|lReplies||How many immediate children the message has|
Copyright © 2002 Penton Media, Inc. All rights reserved.