Documenting and Evaluating Results When Troubleshooting Team Foundation Server
This topic provides guidance on documenting changes that were made to a Team Foundation server, and on evaluating the results of those changes.
You can increase the value of information collected during troubleshooting by keeping accurate and complete records of all work done. You can use your records to reduce redundant effort and to avoid future problems by taking preventive action.
Create a configuration management database to record the history of changes, such as installed software and hardware, updated drivers, replaced hardware, and changed system settings. Periodically verify, update, and back up this data to prevent permanent loss. To maximize use of your database, note details such as:
Changes that were made.
Times and dates changes were made.
Reasons the changes were made.
Users who made the changes.
Positive and negative effects the changes had on system stability or performance.
Information provided by technical support.
When you plan this database, consider the need to balance scope and detail as you decide which items or attributes to track.
Update baseline information after you install new hardware or software to compare past and current behavior or performance levels. If previous baseline information is not available, use System Information, Device Manager, the Performance tool, or industry standard benchmarks to generate data.
Baselines combined with records kept over time enable you to organize experience gained, evaluate maintenance efforts, and judge troubleshooting effectiveness. Analysis of this data can form the basis of a troubleshooting manual or lead to changes in control policy for your organization.
A post-troubleshooting review, or post-mortem, can help you identify troubleshooting areas that need improvement. Some questions you might consider during this self-evaluation period include the following:
What changes improved the situation?
What changes made the problem worse?
Was system performance restored to expected levels?
What work was redundant or unnecessary?
How effectively were technical support resources used?
What other tools or information not used might have helped?
What unresolved issues require additional root-cause analysis?
An action plan is a set of relevant troubleshooting objectives and strategies that fits within your organization's configuration and management strategies. After you identify the problem and find a potential solution or workaround that you have tested on one or more computers, you should create an action plan. Coordinate your plan with supervisors and staff members in the affected areas to keep them informed in advance and to verify that the schedule does not conflict with important activity. Include provisions for troubleshooting during non-peak work hours or dividing work into stages over a period of several days. Evaluate your plan, and as you discover weaknesses, update it to increase its effectiveness and efficiency.
As the number of users grows, the potential loss of productivity because of disruption increases. Your plan must consider dependencies and allow last-minute changes. Factor in contingency plans for unforeseen circumstances.