Using Event Windows
In applications that process real-time events, a common requirement is to perform some set-based computation (aggregation) or other operations over subsets of events that fall within some period of time. In StreamInsight, these subsets of events are defined through windows. This topic describes windows and how they are defined, identifies the types of windows that are supported in StreamInsight, and explains how you can use windows with various operators.
Types of windows
A window contains event data along a timeline and enables you to perform various operations against the events within that window. For example, you may want to sum the values of payload fields in a given window as shown in the following illustration.
The previous illustration shows how a hopping window is applied to an event stream, and how an aggregate is applied to the window stream. The shape of the events that carry the aggregation results depends on the window output policy – here they are represented by point events at the window end.
The windowing operation turns the event stream into a window stream (IQWindowedStreamable<T>), which can then serve as the basis for a set-based operation. Each window along the timeline represents a set of events. The type of window that you use determines how events are collocated: windows can be time-based or count-based. Each window type is represented by a windowing operator.
The set-based operation turns a stream of windows back into a stream of events (IQStreamable<T>). Such set-based operations are divided into the following two groups:
Aggregations that yield a scalar result for a set of input events.
Operations that yield zero or more output events for a set of input events.
Examples of the first group are sum, avg, count, and user-defined aggregates. One or more such aggregations can be applied to a windowed stream, such that one result event corresponds to each input window, with the scalar aggregation results as fields in the resulting event payload. For example, you may want to total the values of one or more payload fields in a window and, based on those values, perform additional processing or create another event stream that contains that aggregated data.
Examples of the second group are TopK and user-defined operators. They are defined over a windowed stream and yield zero or more multiple events per window as a result of their computation. For example, you may want to use the TopK operator to take the top five events from each snapshot window defined for a specific input stream and generate a new event stream for additional processing.
When events from a windowed stream are passed to a set-based operator, as well as when they are output from a set-based operator back into the stream, their timestamps may be transformed. These transformations are called input policy and output policy respectively. These policies affect how the events appear in windows and how the result of the set-based operation is streamed out.
StreamInsight supports the following window types:
As shown in the following illustration, a window specification is made up of three parts:
The window definition (timespans for a hopping window, a count for count-based window, no parameter for a snapshot window)
A temporal transformation of the input (input policy)
A temporal transformation of the output (output policy)
The illustration conceptually describes the transformations of a stream as it goes through a set-based operation on top of a window.
At point A, a stream of events is input to the window operator.
At point B, the window operator produces a stream of windows. Each window contains a set of events. The lifetimes of these events may have been altered according to the Input policy. The stream of events is input to a set-based operator such as an aggregation, or a user-defined operator.
At point C, the set-based operator processes each window and produces a stream of events as output.
For aggregates, one event is created for each set (or zero if the window is empty). Because the aggregation only specifies a scalar value, the lifetime of the output event is set to the window timespan by default. This applies to built-in aggregations as well as the result of user-defined aggregates.
For user-defined operators and TopK, zero or more events are produced. Time-sensitive UDOs also specify the output event lifetimes. For time-insensitive UDOs and TopK operators, the lifetime of the output event is set to the window timespan by default.
At point D, an output policy can be applied to the output events. This allows the query author to modify the temporal properties of the events and override the default lifetime values produced by the set-based operator.
Programmatically, the three white boxes in the illustration are manifested as parameters to the window operators.
Window operators create streams of windows, which are the required input for any set-based operation. Apart from the definition of the window itself (in terms of time or count), the query author can influence how 1) the windowing operation affects the lifetimes of the events that are contained in the window when they are passed to the set-based operation and 2) how the lifetimes of the operation's result events are to be adjusted.
Both policies are specified by the query author as part of the window operator in order to control or override the default timestamps of the aggregation or UDO on top of the window.
StreamInsight supports the single input policy of clipping both the start time and end time of events in the window to the window start time and end time. This means that any (time-sensitive) set-based operation will only see event timestamps inside the window, even though the original events might have overlapped outside the window before the input policy was applied.
The specification of the input policy is optional. For convenience, the class WindowInputPolicy provides a static property that returns a corresponding instance (WindowInputPolicy.ClipToWindow).
StreamInsight supports the following output policies:
Snapshot windows: The end times of the resulting events will be clipped to the window end time.
Hopping windows: The resulting events are point events aligned with the window end time.
Count windows: The resulting event is turned into a point event at the window end.
Here is a summary of all available windows and their effect on the result of the set-based operation:
Output policy: Clip to window end
|Clip to window end|
|Built-in aggregates||window size|
|Time-insensitive UDA||window size|
|Time-insensitive UDO||window size|
|Time-sensitive UDA/UDO||n/a for snapshot windows|
Output policy: Point at window end (or Clip to window end - CepStream only)
|Point at window end||Clip to window end (CepStream only)|
|Built-in aggregates||point at window end||window size|
|TopK||point at window end||window size|
|UDA||point at window end||window size|
|UDO||Point(s) at window end||window size|
|Time-sensitive UDO||point(s) at window end||returned lifetimes, clipped to window end|
Output policy: Point at window end
|Point at window end|
|Built-in aggregates||n/a for count windows|
|TopK||n/a for count windows|
|UDA||point at window end|
|UDO||point(s) at window end|
|Time-sensitive UDO||point(s) at window end|