Export (0) Print
Expand All

Using Event Windows

In applications that process real-time events, a common requirement is to perform some set-based computation (aggregation) or other operations over subsets of events that fall within some period of time. In StreamInsight, these subsets of events are defined through windows. This topic describes windows and how they are defined, identifies the types of windows that are supported in StreamInsight, and explains how you can use windows with various operators.

A window contains event data along a timeline and enables you to perform various operations against the events within that window. For example, you may want to sum the values of payload fields in a given window as shown in the following illustration.

EventWindowConcept

The previous illustration shows how a hopping window is applied to an event stream, and how an aggregate is applied to the window stream. The shape of the events that carry the aggregation results depends on the window output policy – here they are represented by point events at the window end.

The windowing operation turns the event stream into a window stream (IQWindowedStreamable<T>), which can then serve as the basis for a set-based operation. Each window along the timeline represents a set of events. The type of window that you use determines how events are collocated: windows can be time-based or count-based. Each window type is represented by a windowing operator.

The set-based operation turns a stream of windows back into a stream of events (IQStreamable<T>). Such set-based operations are divided into the following two groups:

  • Aggregations that yield a scalar result for a set of input events.

  • Operations that yield zero or more output events for a set of input events.

Examples of the first group are sum, avg, count, and user-defined aggregates. One or more such aggregations can be applied to a windowed stream, such that one result event corresponds to each input window, with the scalar aggregation results as fields in the resulting event payload. For example, you may want to total the values of one or more payload fields in a window and, based on those values, perform additional processing or create another event stream that contains that aggregated data.

Examples of the second group are TopK and user-defined operators. They are defined over a windowed stream and yield zero or more multiple events per window as a result of their computation. For example, you may want to use the TopK operator to take the top five events from each snapshot window defined for a specific input stream and generate a new event stream for additional processing.

When events from a windowed stream are passed to a set-based operator, as well as when they are output from a set-based operator back into the stream, their timestamps may be transformed. These transformations are called input policy and output policy respectively. These policies affect how the events appear in windows and how the result of the set-based operation is streamed out.

StreamInsight supports the following window types:

As shown in the following illustration, a window specification is made up of three parts:

  • The window definition (timespans for a hopping window, a count for count-based window, no parameter for a snapshot window)

  • A temporal transformation of the input (input policy)

  • A temporal transformation of the output (output policy)

Event streams in user-defined aggregates

The illustration conceptually describes the transformations of a stream as it goes through a set-based operation on top of a window.

  1. At point A, a stream of events is input to the window operator.

  2. At point B, the window operator produces a stream of windows. Each window contains a set of events. The lifetimes of these events may have been altered according to the Input policy. The stream of events is input to a set-based operator such as an aggregation, or a user-defined operator.

  3. At point C, the set-based operator processes each window and produces a stream of events as output.

    • For aggregates, one event is created for each set (or zero if the window is empty). Because the aggregation only specifies a scalar value, the lifetime of the output event is set to the window timespan by default. This applies to built-in aggregations as well as the result of user-defined aggregates.

    • For user-defined operators and TopK, zero or more events are produced. Time-sensitive UDOs also specify the output event lifetimes. For time-insensitive UDOs and TopK operators, the lifetime of the output event is set to the window timespan by default.

  4. At point D, an output policy can be applied to the output events. This allows the query author to modify the temporal properties of the events and override the default lifetime values produced by the set-based operator.

Programmatically, the three white boxes in the illustration are manifested as parameters to the window operators.

Window Policies

Window operators create streams of windows, which are the required input for any set-based operation. Apart from the definition of the window itself (in terms of time or count), the query author can influence how 1) the windowing operation affects the lifetimes of the events that are contained in the window when they are passed to the set-based operation and 2) how the lifetimes of the operation's result events are to be adjusted.

Both policies are specified by the query author as part of the window operator in order to control or override the default timestamps of the aggregation or UDO on top of the window.

Input Policies

StreamInsight supports the single input policy of clipping both the start time and end time of events in the window to the window start time and end time. This means that any (time-sensitive) set-based operation will only see event timestamps inside the window, even though the original events might have overlapped outside the window before the input policy was applied.

The specification of the input policy is optional. For convenience, the class WindowInputPolicy provides a static property that returns a corresponding instance (WindowInputPolicy.ClipToWindow).

Output Policies

StreamInsight supports the following output policies:

  • Snapshot windows: The end times of the resulting events will be clipped to the window end time.

  • Hopping windows: The resulting events are point events aligned with the window end time.

  • Count windows: The resulting event is turned into a point event at the window end.

Note Note

In StreamInsight 2.0 and earlier, for CepStream streams, a separate output policy class or classes exist for each window type, including an additional policy for hopping windows. The output policy classes each provide a static property that returns a corresponding instance:

  • SnapshotWindowOutputPolicy.Clip (default if not specified) - The end times of the resulting events will be clipped to the window end time

  • HoppingWindowOutputPolicy.ClipToWindowEnd - The end times of the resulting events will be clipped to the window end time

  • HoppingWindowOutputPolicy.PointAlignToWindowEnd (default if not specified) - The resulting events are point events aligned with the window end time

  • CountWindowOutputPolicy.PointAlignToWindowEnd (default if not specified) - The resulting event is turned into a point event at the window end

Here is a summary of all available windows and their effect on the result of the set-based operation:

Snapshot window:

Output policy: Clip to window end

Output lifetimes:

Clip to window end

Built-in aggregates

window size

TopK

window size

Time-insensitive UDA

window size

Time-insensitive UDO

window size

Time-sensitive UDA/UDO

n/a for snapshot windows

Hopping window:

Output policy: Point at window end (or Clip to window end - CepStream only)

Output lifetimes:

Point at window end

Clip to window end (CepStream only)

Built-in aggregates

point at window end

window size

TopK

point at window end

window size

UDA

point at window end

window size

UDO

Point(s) at window end

window size

Time-sensitive UDO

point(s) at window end

returned lifetimes, clipped to window end

Count window:

Output policy: Point at window end

Output lifetimes:

Point at window end

Built-in aggregates

n/a for count windows

TopK

n/a for count windows

UDA

point at window end

UDO

point(s) at window end

Time-sensitive UDO

point(s) at window end

Community Additions

ADD
Show:
© 2014 Microsoft