Recalculation in Microsoft Excel 2002

This content is no longer actively maintained. It is provided as is, for anyone who may still be using these technologies, with no warranties or claims of accuracy with regard to the most recent product version or service release.

 

Loreen La Penna
Microsoft Corporation

October 2001

Applies to:
     Microsoft® Excel 2002

Summary: This article explores how Microsoft Excel 2002 recalculates large, complex models that span multiple sheets even faster than previous versions of Excel, and how you can optimize your worksheets for fastest recalculation. (12 printed pages)

Contents

Automatic Calculation
Manual Calculation
How Excel 2002 Recalculates
Optimization
Recalculation in Excel 2000 and Earlier Versions
Further Optimization

Automatic Calculation

By default, all versions of Microsoft® Excel perform what is known as a minimum recalculation. This means that Excel recalculates only those cells that need to be recalculated, which saves time when updating large or complex worksheet models.

At its simplest, this means that if a cell value changes, only those cells that refer to or depend on that cell are recalculated. So, if you have the formula =A1 in cell A2, that means that A1 supports cell A2. When the value in cell A1 changes, cell A2 (which depends on cell A1) is recalculated.

In complicated spreadsheet models, many cells may depend on each other, so a series of determinations must be made as to which are recalculated and in what order.

Manual Calculation

You can choose to recalculate only when you specify by setting calculation to manual (on the Tools menu, click Options, and then click the Calculation tab). This means Excel will not recalculate automatically when a cell value changes; rather, you must specifically choose to recalculate by pressing F9 (which recalculates only cells in the workbook that need to be recalculated), SHIFT+F9 (which calculates only the active sheet), or CTL+ALT+F9 (which does a full recalculation, regardless of whether cells need to be recalculated). Manual calculation is a good option when you have large, complicated worksheet models that take a long time to recalculate, and you want to be able to make changes to data quickly and save the recalculation process until after you've made all your changes.

How Excel 2002 Recalculates

As mentioned above, when many cells depend on each other, a series of determinations must be made as to which are recalculated and in what order. To do this, Excel first queues up cells that have changed to determine which cells need to be calculated, and then marks those cells with a recalculation flag. The cells that are marked for recalculation are then recalculated according to an internal list of linked cells, the order of which changes based on the order of dependencies. Once the order of calculation has been established, subsequent recalculations are faster because are arranged in an optimal order.

Example

Consider the following formulas and values on Sheet1 and Sheet2 in Book1:

Aa140058.odc_xlrecalc01(en-us,office.10).gif

Figure 1. Sample formulas and values

The resulting values in each cell would look like this:

Aa140058.odc_xlrecalc02(en-us,office.10).gif

Figure 2. Resulting values

By looking at the Sheet1, you can see that the value in cell A2 supports the value in D1 on Sheet2. The value in D1 on Sheet2, in turn, supports the values in B1 and C1 on Sheet1. And, those values support the value in A1 on the same sheet. Finally, the value in A1 on Sheet1 supports the value in C1 on Sheet2.

Excel keeps an internal list of all supported cells in the workbook. The list for this workbook would initially look like this:

Sheet1!A1 > Sheet1!B1 > Sheet1!C1 > Sheet2!C1 > Sheet2!D1

Notice that A2 on Sheet1 is not in this list. This is because Sheet1!A2 contains a static value rather than a formula.

So what happens if the value in Sheet1!A2 is changed from 3 to a different value, and calculation is set to automatic, or you press F9? First, Excel queues up cells to determine which need to be recalculated, and marks the appropriate cells for recalculation. To do this, Excel looks at the first cell that has changed and asks three basic questions:

  • Does the cell support other cells? If it does, those cells also need to be added to the queue.
  • Is this cell in a range that supports another cell? Again, Excel may need to add other cells to the queue.
  • Does the cell contain a formula? If it does, it should be marked for calculation.

The process Excel would go through to mark appropriate cells for recalculation for the above example is discussed in the following section.

Queue and Mark Appropriate Cells for Recalculation

  1. Cells that change are queued internally. In this example, cell A2 on Sheet1 is the only cell that changes. So the queue has just a single member in the first position: Sheet1!A2.
  2. Excel looks at the first item in the queue to see if it has any supporting cells. In this example, Sheet1!A2 supports Sheet2!D1, so Sheet2!D1 is added to the back of the queue, which now looks like this:
    Sheet1!A2 > Sheet2!D1
  3. Next, Excel checks to see if Sheet1!A2 is in a range that supports other cells. Since it is not, and since it does not contain a formula, it is removed from the queue without being marked for recalculation. So the queue now looks like this:
    Sheet2!D1
  4. Excel checks to see if Sheet2!D1 to see if it supports any single cells. It supports cells B1 and C1 on Sheet1, so those cells are added to the queue:
    Sheet2!D1 > Sheet1!B1 > Sheet1!C1
  5. Excel then checks to see if Sheet2!D1 is in a range that supports other cells. It isn't. Because it contains a formula, it is marked for recalculation and then removed from the queue, which now looks like this:
    Sheet1!B1 > Sheet1!C1
  6. Now Excel looks at Sheet1!B1 to see if it supports any single cells. It doesn't. Excel checks to see if Sheet1!B1 is in a range that supports another cell. Because it is in the B1:C1 range, which the formula in cell Sheet1!A1 refers to, Sheet1!A1 is added to the queue:
    Sheet1!B1 > Sheet1!C1 > Sheet1 > A1
  7. Because the contents of Sheet1!B1 is a formula, it is marked for recalculation and removed from the queue, which now holds Sheet1!C1 and Sheet1!A1:
    Sheet1!C1 > Sheet1 > A1
  8. Excel looks at the next cell on the queue, Sheet1!C1, to see if it supports single cells or is in a range that supports another cell. In this case, Sheet1!C1 supports Sheet1!B1, but since Sheet1!B1 has already been marked for recalculation, it is not queued again. Sheet1!C1 is also in the B1:C1 range, which is in the formula that cell Sheet1!A1 refers to, but A1 doesn't get added to the queue because it's already there. Because Sheeet1!C1 is a formula, it is marked for recalculation and removed from the queue, which leaves just Sheet1!A1 in the queue:
    Sheet1!A1
  9. Excel looks at Sheet1!A1, to see if it supports any single cells. In this example, it supports Sheet2!C1. So that cell is added to the queue:
    Sheet1!A1 > Sheet2!C1
  10. Excel checks to see if Sheet1!A1 is in a range that supports single cells. The answer is no. So, cell Sheet!A1 is marked for recalculation (because it contains a formula) and removed from the queue, leaving just a single member:
    Sheet2!C1
  11. Excel checks to see if Sheet2!C1 supports any single cells. It does not. Is it in a range that supports single cells? Again, the answer is no. So Sheet2!C1 is marked for recalculation (because it contains a formula) and removed from the queue, which leaves an empty queue and cells A1, B1, and C1 on Sheet1 and cells C1 and D1 on Sheet2 marked for recalculation.

The following flowchart illustrates this process:

Aa140058.odc_xlrecalc03(en-us,office.10).gif

Figure 3. Flowchart for queuing and marking appropriate cells for recalculation

Performing the Recalculation

Now that Excel has determined which cells need to be recalculated and marked them for recalculation, it can perform the recalculation, which involves another process of queuing to make sure cells are recalculated in the right order.

Excel starts at the beginning of its internal list of linked cells, which if you remember, looks like this (bold signifies that the cell is marked for recalculation):

Sheet1!A1 > Sheet1!B1 > Sheet1!C1 > Sheet2!C1 > Sheet2!D1

  1. Excel checks to see if Sheet1!A1 is marked for recalculation. In the example above, it is marked (because it was marked in the queuing process outlined above). As Excel tries to calculate A1 (which contains the formula =SUM(B1:C1)), it is determined that B1 and C1 are also marked for recalculation, which means that their values might change. Therefore, A1 cannot be recalculated successfully until B1 and C1 are recalculated. So, B1 and C1 are moved ahead of cell A1 is in the list, and A1 is not yet calculated. So now the list of linked cells looks like this:
    Sheet1!B1 > Sheet1!C1 > Sheet1!A1 > Sheet2!C1 > Sheet2!D1
  2. Excel then tries to calculate cell B1, which contains the formula =C1+Sheet2!D1. Because Sheet2!D1 is marked for recalculation and has not yet been calculated, B1 is skipped and Sheet1!C1 and Sheet2!D1 is moved to the top of the list:
    Sheet1!C1 > Sheet2!D1 > Sheet1!B1 > Sheet1!A1 > Sheet2!C1
  3. Excel then tries to calculate Sheet1!C1. However, it depends on Sheet2!D1. So, Sheet2!D1 is moved to the top of the list:
    Sheet1!D1 > Sheet1!C1 > Sheet1!B1 > Sheet1!A1 > Sheet2!C1
  4. Excel now tries to recalculate Sheet2!D1. Because it depends on Sheet1!A2, which doesn't need recalculation, Sheet2!D1 is recalculated and its recalculation mark removed. So now the list looks like this:
    Sheet1!D1 > Sheet1!C1 > Sheet1!B1 > Sheet1!A1 > Sheet2!C1
  5. Excel looks at Sheet1!C1 and tries to recalculate it. Its formula relies on Sheet2!D1, which has just been successfully recalculated. So, Sheet1!C1 can now be recalculated, and its recalculation mark removed. The list now looks like this:
    Sheet1!D1 > Sheet1!C1 -> Sheet1!B1 -> Sheet1!A1 > Sheet2!C1
  6. Excel tries to calculate B1 again. Because B1 depends on Sheet1!C1 and Sheet2!D1, both of which have been recalculated, Sheet1!B1 can now be recalculated, and its mark removed, leaving the list like this:
    Sheet1!D1 > Sheet1!C1 > Sheet1!B1 > Sheet1!A1 > Sheet2!C1
  7. Now that Sheet1!B1 and Sheet1!C1 have recalculated successfully, Sheet1!A1 can be recalculated, leaving just a single member yet to be recalculated in the list:
    Sheet1!D1 > Sheet1!C1 > Sheet1!B1 > Sheet1!A1 > Sheet2!C1
  8. Sheet2!C1 is supported by Sheet1!A1, which has now been successfully recalculated, so it can be recalculated, also:
    Sheet1!D1 > Sheet1!C1 > Sheet1!B1 > Sheet1!A1 > Sheet2!C1

Excel is done, and the internal list of linked cells is in an optimal order for recalculation, making it fully optimized for future calculations. The list will remain in this order as long as you don't change any of the formulas, but rather just change static data (such as the value in Sheet1!A2 in this example), Excel remembers the order in which the cells need to be recalculated so that they don't have to be ordered again, thus making calculation faster.

The following flowchart illustrates this process:

Aa140058.odc_xlrecalc04(en-us,office.10).gif

Figure 4. Flowchart for performing the recalculation

Optimization

Excel 2002 is designed so that you do not have to do much to ensure that calculation takes place as fast as possible. The exception is when you are using user-defined functions. User Defined Functions (UDFs) are custom functions created using Visual Basic® for Applications (VBA).

  • One way to optimize UDFs is to prevent repeated calls to the UDF by entering them last in order in an on-sheet formula. For example, enter =A1+UDF in a worksheet cell rather than =UDF+A1. This way, if A1 is marked for recalculation, but hasn't recalculated yet, the UDF will be called only after A1 has recalculated, avoiding an unnecessary call.
  • Make sure that the code in the UDF that accesses cells values are as near the top of the code as possible. For example, you might have a UDF that has initial code that takes a long time to run (for instance, it might make a call to an external dynamic link library (DLL) which does some calculation-intensive work), and then code below that which accesses cell values. If the cell values aren't recalculated before you start running code in the UDF, you'll twice pay the penalty of running through the long, slow code at the top of your UDF: first, when the UDF is called, and the not-yet-recalculated cells that you're accessing are moved back to the top of the internal list of linked cells so they be recalculated, and second, after the cells to which the UDF refers have been recalculated.

Recalculation in Excel 2000 and Earlier Versions

Versions of Excel prior to 2002 recalculate in the same way as described above, with one important difference: Excel keeps an internal list of linked cells for each sheet in the workbook rather than just one list of linked cells for the whole workbook. This means that, in a cross-sheet model, Excel must deal with queuing and reordering several lists rather than just one, as well as the ordering of the sheets themselves.

Initially, Excel starts recalculating worksheets in alphabetical order by name. However, as in the process above, the order of calculation depends on the order of references in cells. In a complicated workbook model, Excel might start recalculating a formula on one sheet, only to find that the formula refers to a cell on another sheet that has to be recalculated first, and so on.

Example

Using the example above, the internal lists of linked cells for each sheet initially look like this (bold indicates that the cell is marked for recalculation):

Sheet1 > A1 > B1 > C1

Sheet2 > C1 > D1

As in the previous example, let's take a look at what happens if you change the value in cell A2 on Sheet1 in versions of Excel prior to 2002.

Through the queuing process described previously, all cells with formulas are marked for recalculation.

  1. Excel then looks at the first sheet in the workbook in alphabetical order, which is Sheet1. Excel attempts to recalculate the first cell on the sheet, which is A1. However, the supporting cells for A1 (which are B1 and C1) have not yet been recalculated, so Excel does not recalculate A1. However, it can move B1 and C1 to the top of the list, which now looks like this:
    Sheet1 > B1 > C1 > A1
  2. Excel looks at the next cell on Sheet1, which is B1. It also has supporting cells (C1 and Sheet2!D1) that have not yet been calculated, so that, too, is skipped. Excel looks at C1 and does not recalculate it, either, because it also relies on Sheet2!D1.
    Excel is finished looking at cells on Sheet1 for the time being; the sheet is marked with a flag that indicates that recalculation is still needed on the sheet.
  3. Excel then moves to the next sheet is alphabetical order: Sheet2. Its internal list looks like this:
    Sheet2 > C1 > D1
  4. Cell C1 on Sheet2 can't recalculate because it relies on Sheet1!A1, which has not yet been calculated. So Excel looks at the next cell in the list, which is D1. It can recalculate now because it relies on Sheet1!A2, which doesn't contain a formula or need recalculation. So, D1 is recalculated, and the mark for recalculation is removed from that cell. The internal list of linked cells for Sheet2 now contains only one cell that needs to be recalculated, which is C1:
    Sheet2 > C1 > D1
    Excel is finished with Sheet2 for the time being, and marks it with two flags: one that indicates that some recalculation has been performed, and one that indicates that some recalculation still needs to be done.
  5. Excel returns to Sheet1, where it looks at the first cell in its list, which is B1. B1 relies on C1 and Sheet2!D1. Although Sheet2!D1 has been recalculated, C1 has not. So, B1 is not recalculated and C1 is moved to the top of the list:
    Sheet1 > C1 > B1 > A1
  6. Excel looks at the next cell in the list, which is A1. A1 relies on B1 and C1, neither of which is recalculated yet. So A1 is also not recalculated.
    Excel is again finished with Sheet1 for the time being.
  7. Excel moves back to Sheet2 and looks at cell C1, which is still marked for recalculation. However, C1 relies on Sheet1!A1, which has not yet been recalculated. So, Excel does not recalculate C1, and leaves the flag on Sheet2 that indicates that recalculation still has to be done.
  8. Excel moves back to Sheet1, and attempts to recalculate the first cell in the list, which is C1. C1 relies on Sheet2!D1 which has been recalculated, so C1 can now recalculate. Its recalculation mark is removed, and the updated internal list looks like this:
    Sheet1 > C1 > B1 > A1
  9. Excel can now calculate B1 because both of the cells it relies on (C1 and Sheet2!D1) have been recalculated. Its recalculation mark is removed, and the only cell left to recalculate on Sheet1 is A1:
    Sheet1 > C1 > B1 > A1
  10. Because A1 relies on B1 and C1, both of which have been calculated, A1 can now be calculated. All of the cells on Sheet1 have now been calculated, so Excel removes the flag that says recalculation is needed:
    Sheet1 > C1 > B1 > A1
  11. Excel moves back to Sheet2 and looks at C1, which is the only cell left on that sheet marked for recalculation. Because C1 relies on Sheet1!A1, which has just been calculated, C1 can now be recalculated. Excel removes its recalculation mark, and removes the mark on Sheet2 indicating that it needs recalculation:
    Sheet2 > C1 > D1
  12. Excel moves back to Sheet1, but it is longer marked for recalculation; so Excel moves back to Sheet2, but it, too, is no longer marked for recalculation. Excel is done recalculating Book1.

The following flowchart illustrates this process:

Aa140058.odc_xlrecalc05(en-us,office.10).gif

Figure 5. Flowchart for recalculation in Excel 2000 and earlier versions

Further Optimization

In review, here are two additional steps you can take to optimize your workbook for the most efficient recalculation:

  • Minimize the number of links between worksheets and workbooks. Raw data, or data in supporting cells, should be contained in sheets that are first in alphabetical order within a workbook. Because Excel recalculates sheets based on alphabetical order, the sheets that have the most dependent cells on them should come alpha first, and sheets with the formulas that depend on them alpha last. In other words, sheets with cells that support cells on other sheets should come first, while sheets that don't support other sheets should be last.
  • Optimize any User Defined Functions (UDFs) contained in your workbook.