Activity Tracing and Propagation for End-To-End Trace Correlation

This topic describes how you can use Activity Tracing and Propagation to assist debugging.

Overview of Activity Tracing

Windows Communication Foundation (WCF) provides instrumentation features for application troubleshooting without the use of a debugger. There are four components to WCF Diagnostics: Eventing, Tracing, which includes Activity Tracing, and Message Logging.

Diagnostics functionality Target Users Purpose

Eventing

IT Professional

View critical errors that have occurred in the system

Tracing

IT Professional and Developers

View both positive and negative events in the system, at various levels of verbosity, in order to diagnose an issue or verify proper behavior.

Message Logging

IT Professional and Developers

View logs of actual messages that go through the wire or/and are dispatched to user code, in order to diagnose an issue or verify proper behavior

Activity Tracing (a subset of the Tracing feature)

IT Professional and Developers

Benefit from trace correlation to locate the root cause of an error.

This topic focuses on Activity Tracing, which provides users trace correlation to help them diagnose the root cause of an error. WCF provides three correlation mechanisms: activities, propagation, and transfers.

Activities

Activities are processing units that help the user narrow down the scope of a failure. Errors that occur in the same activity are directly related. For example, an operation fails because message decryption has failed. The traces for both the operation and message decryption failure appear in the same activity, showing direct correlation between the decryption error and the request error. WCF provides pre-defined activities for processing applications. You can also define activities programmatically to group user traces.

To emit activity traces at run time, use the ActivityTracing setting for the System.ServiceModel or other custom trace sources, as demonstrated by the following configuration code.

<source name="System.ServiceModel" switchValue="Verbose,ActivityTracing">

To understand more about the configuration element and attributes being used, see the Configuring Tracing topic.

You can enable activity tracing on any trace source, including WCF or user-defined ones.

It is possible for an activity to be defined in user code (using System.Diagnostics) before any WCF processing begins. Such an activity is called an Ambient activity.

You can view the activities and their utility in the Service Trace Viewer Tool. This tool takes the traces and sorts them based on activity. When ActivityTracing is enabled, you can also see trace transfers. A trace transfer indicates how different activities are related to each other. You can see that a particular activity caused another to start. For example, a message request started a security handshake to get a Secure Conversation Token. For more information, and to see a graphical view of the Service Trace Viewer tool, see Service Trace Viewer Tool and Using Service Trace Viewer for Viewing Correlated Traces and Troubleshooting.

Propagation

Propagation provides the user with direct correlation of error traces for the same unit of processing across application endpoints, e.g., a request. Errors emitted at different endpoints for the same unit of processing (for example, a request) are grouped in the same activity, even across application domains. This is done through propagation of the activity ID in the message headers. Therefore, if a client times out because of an internal error in the server, both errors appear in the same activity for direct correlation.

To do this, use the ActivityTracing setting as demonstrated in the previous example. In addition, set the propagateActivity attribute for the System.ServiceModel trace source.

<source name="System.ServiceModel" switchValue="Verbose,ActivityTracing" propagateActivity=”true” >

Activity propagation is a configurable capability that causes WCF to add a header to outbound messages, which includes the activity ID on the TLS. By including this on subsequent traces on the server side, we can correlate client and server activities.

Transfers

Transfers between activities represent causal relationships between events in the related activities within endpoints. Two activities are related with transfers when control flows between these activities, e.g., a method call crossing activity boundaries. In WCF, when bytes are incoming on the service, the Listen At activity is transferred to the Receive Bytes activity where the message object is created.

To emit transfer traces, use the ActivityTracing setting on the trace source as demonstrated in the previous example.

See End-To-End Tracing Scenarios for a list of end-to-end tracing scenarios, and their respective activity and tracing design.

E2E Tracing Model

This section provides definitions for activities and their boundaries, activity correlation across endpoints through propagation, and within endpoints through transfers, and activity profiling. It also provides a summary of all activity trace types and shows how to put them together to form a valid sequence of traces.

Activities

Definition and Scope

An activity has a local and a global identifier. It is defined at design time and denotes a logical unit of work. Emitted traces with the same activity identifier are directly related, they are part of the same activity. Because an activity can cross endpoint boundaries (e.g., a request), two scopes for an activity are defined.

  • Global scope, per application. In this scope, the activity is identified by its 128-bit globally unique activity identifier, the gAId. The gAid is what is propagated across endpoints.

  • Local scope, per trace source and process. In this scope, the activity is identified by its gAId, along with the trace source name emitting the activity traces and the process Id. This triplet constitutes the local activity id, lAId. The lAId is used to define the (local) boundaries of an activity.

An activity is identified locally to a trace source by the triplet lAId={ProcessId, TraceSourceName, gAId} and globally in the application by its gAId.

Schema

Traces can be emitted using any schema, and across Microsoft platforms, a common schema referred to as the “e2e” (for “End to End”) schema is used. This schema includes a 128 bit identifier (Activity Identifier) which is unique within the trace source and process, the trace source name, and process id. The Activity Identifier (AID) allows correlation of emitted traces. This is because emitted traces with the same AID are directly related, since they are part of the same activity. In managed code, XmlWriterTraceListener emits traces in the E2E schema.

Developers can set the AID that is emitted with a trace by setting the ActivityId property with a Guid on Thread Local Storage (TLS). The following example demonstrates this.

// set the current Activity ID to a new GUID.
CorrelationManager.ActivityId = Guid.NewGuid();

Setting the gAId in TLS will be evident when traces are emitted using a trace source, as shown by the following example.

TraceSource traceSource = new TraceSource("myTraceSource");
traceSource.TraceEvent(TraceEventType.Warning, eventId, "Information");

The trace emitted will contain the gAId currently in TLS, the trace source name passed as a parameter to the trace source’s constructor, and the current process’s ID.

Correlating Activities in Service Trace Viewer

The Service Trace Viewer tool provides two views of activities as follow.

  • List view, where the gAId is used to directly correlate traces across trace sources and processes. Traces from different trace sources, e.g., client and service, but with the same gAId are grouped in the same activity. Therefore, an error occurring on the service which then causes an error on the client will both show up in the same activity view in the tool.

  • Graph view, where activities are grouped by processes. In this view, a client and service with the same gAId but different lAId (because they are in two different processes) will have their traces in different activities. To correlate activities with the same gAId in different processes, the tool shows message flows across the related activities.

Activity Lifetime

  • Start: Indicates the beginning of an activity. A “Start” trace provides a record of beginning a new processing milestone. It always contains a new AID for a given trace source in a given process, except when the activity ID is propagated across endpoints that use the same trace source name (e.g., System.ServiceModel) and run in the same process. Examples of starting a new activity include creating a new thread for processing, or entering a new public method.

  • Stop: Indicates the end of an activity. A “Stop” trace provides a record of ending an existing processing milestone. It always contains a new AID for a given trace source in a given process, except when the activity ID is propagated across endpoints that use the same trace source name (e.g., System.ServiceModel) and run in the same process. Examples of stopping an activity include terminating a processing thread, or exiting a method whose beginning was denoted with a “Start” trace.

  • Suspend: Indicates suspension of processing of an activity. A “Suspend” trace contains an existing AID whose processing is expected to resume at a later time. No traces are emitted with this AID between the suspend and resume events from the current trace source. Examples include pausing an activity when calling into an external library function, or when waiting on a resource such as an I/O completion port.

  • Resume: Indicates the resumption of processing of an activity. A “Resume” trace contains an existing AID whose last emitted trace from the current trace source was a “Suspend” trace. Examples include returning from a call to an external library function, or when signaled to resume processing by a resource such as an I/O completion port.

  • Transfer: Because some activities are caused by others, or relate to others, activities can be related to other activities through “Transfer” traces. A transfer records the directed relationship of one activity to another

Guidelines for Using Activity Tracing

The following is a guideline of using ActivityTracing traces (Start, Stop, Suspend, Resume, and Transfer).

  • Tracing is a directed cyclic graph, not a tree. You can return control to an activity which spawned an activity, as indicated in the previous example.

  • An activity denotes a processing boundary which can be meaningful to the administrator of the system or for supportability.

  • Each WCF method, both on the client and server, is bounded by beginning a new activity, then (after work is done) ending the new activity and returning to the ambient activity.

  • Long running (ongoing) activities such as listening for connections or waiting for messages are represented by corresponding start/stop markers.

  • Activities triggered by the receipt or processing of a message are represented by trace boundaries.

  • Activities represent activities, not objects. An activity should be interpreted as “this was happening when . . . (meaningful trace emission occurred)."

Using Start and Stop to Mark Activity Boundaries

In strictest terms, evidence of an activity starts the first time the lAId is used in an emitted trace, and ends the last time the lAId is used in an emitted trace.

The life of an activity starts the first time the lAId is used in an emitted trace, and ends at the last time the lAId is used in an emitted trace. A pre-defined set of trace types are provided by System.Diagnostics, which uses two trace types, Start and Stop, to explicitly mark activity boundaries.

Start - The Beginning of An Activity

A “Start” trace provides a record of beginning a new processing milestone. It always contains a new lAId for a given trace source in a given process, except when the activity ID is propagated across endpoints that use the same trace source name (e.g., System.ServiceModel) and run in the same process. Or when The same gAId is propagated across the same endpoints more than once (e.g., the WCF Set Up Security activity). It is recommended to do the latter only if the requests are synchronous and sequential. In such cases, you will see a Start (and Stop) traces with the same gAId for each endpoint. See the “Propagation” section for more details on propagation. WCF set the name of the activity through the Start trace.

Examples of starting a new activity include creating a new thread for processing, or entering a new public method.

Stop - The End of An Activity

A “Stop” trace provides a record of ending an existing processing milestone. It always contains a lAId that will not appear again for a given trace source in a given process, except when the activity ID is propagated across endpoints that use the same trace source name (e.g., System.ServiceModel) and run in the same process. Or when the same gAId is propagated across the same endpoints more than once (the WCF Set Up Security activity). It is recommended to do the latter only if the requests are synchronous and sequential. In such cases, you will see a Start (and Stop) traces with the same gAId for each endpoint. See the “Propagation” section for more details on propagation.

  • Examples of stopping an activity include terminating a processing thread, or exiting a method whose beginning was denoted with a “Start” trace.

If the gAId is propagated across endpoints for correlation, there will be a Start and Stop trace with that gAId from each trace source.

Using Start and Stop Traces

Start and Stop traces are not critical for correlation. However, they can help in increasing performance, profiling, and activity scope validation.

Using these types, the tools can optimize navigating the trace logs to find the immediately related events of the same activity, or events in related activities if the tool follows transfer traces. For example, the tools will stop parsing the logs for a given activity when they see a Start/Stop trace.

The tools use the lAId as the activity identifier, but can use the gAId to identify related activities across loaded files.

These trace types can also be used for profiling. Resources consumed between the start and stop markers represent the activity’s inclusive time including contained logical activities. Subtracting the time intervals between the Suspend and Resume traces provides the actual activity time.

The Stop trace is particularly useful for validating the scope of the implemented activities. If some processing traces appear after the Stop trace instead of inside a given activity, this can suggests code defect.

Start and Stop traces are emitted if the ActivityTracing attribute is set for the trace source.

Using Propagation to Correlate Activities Across Endpoints

An activity provides direct correlation for the traces that relate to the same unit of processing locally to a trace source. In WCF, to relate traces that are part of the same logical unit of processing across endpoints, e.g., a request, the gAId is propagated. When correlated, traces from activities with the same gAId in distinct endpoints can be merged in the same activity.

Activity M’s gAId is propagated to activity N if all of the following conditions apply.

  • N is created because of M

  • M’s gAId is known to N

  • N's gAId is equal to M’s gAId.

The gAId is propagated through the ActivityId message header, as illustrated in the following XML schema.

<xsd:element name=”ActivityId” type=”integer” minOccurs=”0”>
  <xsd:attribute name=”CorrelationId” type=”integer” minOccurs=”0”/>
</xsd:element>

The following is an example of the message header.

<MessageLogTraceRecord>
  <s:Envelope xmlns:s="https://www.w3.org/2003/05/soap-envelope"   
                      xmlns:a="https://www.w3.org/2005/08/addressing">
    <s:Header>
      <a:Action s:mustUnderstand="1">https://Microsoft.ServiceModel.Samples/ICalculator/Subtract
      </a:Action>
      <a:MessageID>urn:uuid:f0091eae-d339-4c7e-9408-ece34602f1ce
      </a:MessageID>
      <ActivityId CorrelationId="f94c6af1-7d5d-4295-b693-4670a8a0ce34" 
                          
               xmlns="https://schemas.microsoft.com/2004/09/ServiceModel/Diagnostics">
        17f59a29-b435-4a15-bf7b-642ffc40eac8
      </ActivityId>
      <a:ReplyTo>
          <a:Address>https://www.w3.org/2005/08/addressing/anonymous
          </a:Address>
      </a:ReplyTo>
      <a:To s:mustUnderstand="1">net.tcp://localhost/servicemodelsamples/service</a:To>
   </s:Header>
   <s:Body>
     <Subtract xmlns="https://Microsoft.ServiceModel.Samples">
       <n1>145</n1>
       <n2>76.54</n2>
     </Subtract>
   </s:Body>
  </s:Envelope>
</MessageLogTraceRecord>

Propagation and Activity Boundaries

In System.ServiceModel, the activity id is propagated across endpoints if the user sets propagateActivity=true in configuration on both sides/

When the activity ID is propagated across endpoints, the message receiver emits a Start and Stop traces with that (propagated) activity ID. Therefore, there is a Start and Stop trace with that gAId from each trace source. If the endpoints are in the same process and use the same trace source name, multiple Start and Stop with the same lAId (same gAId, same trace source, same process) are created.

Synchronization

You can enable synchronization of endpoint traces across machines by setting the ActivityId’s Correlation attribute in the message header

To synchronize events across endpoints that run on different machines, a CorrelationId is added to the ActivityId header that is propagated in messages. Tools can use this ID to synchronize events across machines with clock discrepancy. Specifically, the Service Trace Viewer tool uses this ID for showing message flows between endpoints.

Using Transfer to Correlate Activities Within Endpoints

A transfer trace is emitted from activity M to activity N when there is a flow of control between M and N. For example, N performs some work for M because of a method call crossing the activities’ boundaries.

N may already exist or has been created. N is spawned by M when N is a new activity that performs some work for M.

A transfer from M to N may not be followed by a transfer back from N to M. This is because M can spawn some work in N and do not track when N completes that work. In fact, M can terminate before N completes its task. This happens in the “Open ServiceHost” activity (M) that spawns Listener activities (N) and then terminates.

A transfer back from N to M means that N completed the work related to M.

N can continue performing other processing unrelated to M, e.g., an existing authenticator activity (N) that keeps receiving login requests (M) from different login activities.

Transfers between activities denote causal relationships between events in the related activities. Activities and transfers permit the user to probabilistically locate the root cause of an error. For example, if we transfer back and forth between activities M and N respectively in components M and N, and a crash happens in N right after the transfer back to M, we can draw the conclusion that it is likely due to N’s passing data back to M.

A nesting relationship does not necessarily exist between activities M and N. This can happen due to two reasons. First, when activity M does not monitor the actual processing performed in N although M initiated n. Second, when N already exists.

The following lists two transfer examples.

  • When you create a service host, the constructor gains control from the calling code, or the calling code transfers to the constructor. When the constructor has finished executing, it returns control to the calling code, or the constructor transfers back to the calling code. This is the case of a nested relationship.

  • When a listener starts processing transport data, it creates a new thread and hands to the Receive Bytes activity the appropriate context for processing, i.e., passing control and data. When that thread has finished processing the request, the Receive Bytes activity passes nothing back to the listener. In this case, we have a transfer in but no transfer out of the new thread activity. The two activities are related but not nested.

Activity Transfer Sequence

A well-formed activity transfer sequence includes the following steps.

  1. Begin a new activity, which consists of selecting a new gAId.

  2. Emit a transfer trace to that new gAId from the current activity ID

  3. Set the new ID in TLS

  4. Emit a start trace to indicate the beginning of the new activity by.

  5. Return to the original activity consists of the following:

  6. Emit a transfer trace to the original gAId

  7. Emit a Stop trace to indicate the end of the new activity

  8. Set TLS to the old gAId.

The following code example demonstrates how to do this. This sample assumes a blocking call is made when transferring to the new activity, and includes suspend/resume traces.

// 0. Create a trace source
TraceSource ts = new TraceSource(“myTS”);
// 1. remember existing (“ambient”) activity for clean up
Guid oldGuid = Trace.CorrelationManager.ActivityId;
// this will be our new activity
Guid newGuid = Guid.NewGuid(); 
// 2. call transfer, indicating that we are switching to the new AID
ts.TraceTransfer(667, "Transferring.", newGuid);
// 3. Suspend the current activity.
ts.TraceEvent(TraceEventType.Suspend, 667, "Suspend: Activity " + i-1);
// 4. set the new AID in TLS
Trace.CorrelationManager.ActivityId = newGuid;
// 5. Emit the start trace
ts.TraceEvent(TraceEventType.Start, 667, "Boundary: Activity " + i);
// trace something
ts.TraceEvent(TraceEventType.Information, 667, "Hello from activity " + i);
// Perform Work
// some work.
// Return
ts.TraceEvent(TraceEventType.Information, 667, "Work complete on activity " + i); 
// 6. Emit the transfer returning to the original activity
ts.TraceTransfer(667, "Transferring Back.", oldGuid);
// 7. Emit the End trace
ts.TraceEvent(TraceEventType.Stop, 667, "Boundary: Activity " + i);
// 8. Change the tls variable to the original AID
Trace.CorrelationManager.ActivityId = oldGuid;  
// 9. Resume the old activity
ts.TraceEvent(TraceEventType.Resume, 667, "Resume: Activity " + i-1);

Trace Type Summary

See Source Levels for details of the various trace levels: Critical, Error, Warning, Information, and Verbose, as well as description of the ActivityTracing flag, which toggles the output of trace boundary and activity transfer events. See

TraceEventType for the types of traces which can be emitted from System.Diagnostics. The following table lists the most important ones.

Trace Type Description

Critical

Fatal error or application crash.

Error

Recoverable error.

Warning

Informational message. 

Information

Non-critical problem. 

Verbose

Debugging trace. 

Start

Starting of a logical unit of processing. 

Suspend

Suspension of a logical unit of processing. 

Resume

Resumption of a logical unit of processing. 

Stop

Stopping of a logical unit of processing. 

Transfer

Changing of correlation identity. 

An activity is defined as a combination of the trace types above.

The following is a regular expression that defines an ideal activity in a local (trace source) scope,

R = Start ( (Critical | Error | Warning | Information | Verbose | Transfer)* (Transfer Suspend Transfer Resume)* )* Stop

This means that an activity must satisfy the following conditions.

  • It must start and stop respectively by a Start and Stop traces

  • It must have a Transfer trace immediately preceding a Suspend or Resume trace

  • It must not have any traces between the Suspend and Resume traces if such traces exist

  • It can have any and as many of critical/Error/Warning/Information/Verbose/Transfer traces as long as the previous conditions are observed

The following is a regular expression that defines an ideal activity in the global scope,

R+ 

with R being the regular expression for an activity in the local scope. This translates to,

[R+ = Start ( (Critical | Error | Warning | Information | Verbose | Transfer)* (Transfer Suspend Transfer Resume)* )* Stop]+

See Also

Concepts

Configuring Tracing
Using Service Trace Viewer for Viewing Correlated Traces and Troubleshooting
End-To-End Tracing Scenarios

Other Resources

Service Trace Viewer Tool