1.3 Overview

As distributed applications become increasingly complex, so does the problem of diagnosing errors within them. To diagnose an error in a distributed application, a user isolates the problem to a particular component. Each component often produces a trace log that records incoming messages, outgoing messages, and information about its internal state. By analyzing trace logs for each component, a user can reconstruct the sequence of messages that led to the error. The .NET Tracing Protocol facilitates this process by helping to correlate message flows together.

The .NET Tracing Protocol provides two main functions. First, it enables users to map outgoing messages to incoming messages between components in a distributed application. It does this by assigning each message a unique identifier, named the CorrelationId. This identifier is stored in the client component's trace log before it sends a message and in the server component's trace log after it receives a message. The identifier is then used as an index into the client and server trace logs to map the message exchange together. Using a unique identifier to map message flows also has the advantage of avoiding problems with clock skew between components in the distributed application.

The second function of the .NET Tracing Protocol is to provide a way to group related messages together. It does this by generating a second message identifier named the ActivityId. Unlike the CorrelationId, the ActivityId is not unique for each message. Instead, the same ActivityId is propagated between related messages. For example, a client sends a request to a server with "ActivityId A" in the message. The .NET Tracing Protocol states that the server echoes "ActivityId A" in its message response. Future related requests by the client will continue to use the same "ActivityId A". Because all of the related messages have included the same ActivityId, users can infer causality relationships between messages. This information can also be used to determine the set of messages that led up to an error and the set of messages that resulted from the error. This process is specified in section 3.1.5.