Export (0) Print
Expand All

StreamInsight Server Concepts

This topic describes the manner in which data is represented, operated on, brought into, and transferred out of the StreamInsight server. It is designed to give you familiarity with the basic concepts associated with complex event processing in Microsoft StreamInsight. The topic begins by describing data structures and then describes the StreamInsight server components that act on or process the data.

All data in StreamInsight is organized into streams. Each stream describes a potentially unending collection of data that changes over time. For instance, a stock ticker stream provides the price of different stocks in an exchange as they change over time, and a temperature sensor stream provides temperature values reported by the sensor over time.

Consider a power-monitoring scenario in which the goal is to monitor a collection of power meters that measure the power consumption of various devices. Periodically, these power meters transmit data that includes their power consumption in tenths of a watt and an associated time stamp of the reading. The following table shows the power meter readings from 3 meters and assumes that each power meter emits a power reading every second.

Time

MeterID

Consumption

2009-07-27 10:27:23

1

100

2009-07-27 10:27:24

1

200

2009-07-27 10:27:51

2

300

2009-07-27 10:28:52

2

100

2009-07-27 10:27:23

3

200

Because this information can be represented as values that change over time, the data can be represented in a stream. Given the data in this stream, a query against this stream could return the meter with the highest or lowest consumption values over a given time period, or the query could return a top-10 list of meters with the highest power consumption over time.

The underlying data represented in the stream is packaged into events. An event is the basic unit of data processed by the StreamInsight server. Each event consists of the following parts:

  • Header. An event header contains metadata that defines the event kind and one or more timestamps that define the time interval for the event. The timestamps are application-based and supplied by the data source rather than a system time supplied by the StreamInsight server. Note that the timestamps use the DateTimeOffset data type, which has time zone awareness and is based on a 24-hour clock. The StreamInsight server normalizes all times to UTC datetime and verifies on input that the UTC flag is set on the timestamp fields.

  • Payload. A .NET data structure that holds the data associated with the event. The fields defined in the payload are user-defined. Their types are based on the .NET type system.

Events in the stream whose application timestamps correspond to their order of arrival into the query are said to be “in order”. When this is not the case, events are said to arrive “out of order”. The StreamInsight server guarantees that if events arrive out of order, the output of the query is the same as if the events arrived in order, unless the query writer explicitly specifies otherwise. Within a stream, typical event arrival patterns are:

  • A steady rate, such as records from files or tables.

  • An intermittent and random rate, such as data from a retail barcode scanner.

  • An intermittent rate with sudden bursts, such as Web clicks or weather telemetry.

Event Header

The header of an event defines the event kind as well as the temporal properties of the event.

Event Kind

The event kind indicates whether the event is a new event in the stream or it is declaring the completeness of the existing events in the stream. StreamInsight supports two event kinds: INSERT and CTI (current time increment).

The INSERT event kind adds an event with its payload into the event stream. In addition to the payload, the header of the INSERT event identifies the start and end time for the event. The following diagram shows the layout of an INSERT event kind.

Header

Payload

Event kind ::= INSERT

StartTime ::= DateTimeOffset

EndTime ::= DateTimeOffset

Field 1 … Field n as CLR types

The CTI event kind is a special punctuation event that indicates the completeness of the existing events in the stream. The CTI event structure consists of a single field that provides a current timestamp. A CTI event serves two purposes:

  1. First, it enables a query to accept and process events whose application timestamps do not correspond to their order of arrival into the query. When a CTI event is issued, it indicates to the StreamInsight server that no subsequent incoming INSERT events will revise the event history before the CTI timestamp. That is, after a CTI event has been issued, no INSERT event can have a start time earlier than the timestamp of the CTI event. This indication of "completeness" of a stream of events enables the StreamInsight server to release the results of windowing or other aggregating operators that have accumulated state, thus ensuring that events flow efficiently through the system.

  2. The second purpose of CTI events is to maintain the low latency of the query. Frequent CTIs will make the query pump out the results at a higher frequency.

Important note Important

Without the presence of CTI events in the input stream, no output will be generated from the query.

For more information, see Advancing Application Time.

The following diagram shows the layout of a CTI event kind.

Header

Event kind ::= CTI

StartTime ::= DateTimeOffset

Event Models

The event model defines the event shape based on its temporal characteristics. StreamInsight supports three event models: interval, point, and edge. Interval events can be seen as the most generic type, of which edge and point are special cases.

Interval

The interval event model represents an event whose payload is valid for a given period of time. The interval event model requires that both the start and end time of the event be provided in the event metadata. Interval events are valid only for this specific time interval. It is important to note that start times are inclusive, whereas end times are exclusive regarding the validity of the event's payload.

The following diagram shows the layout of an interval event model.

Metadata

Payload

Event kind ::= INSERT

StartTime ::= DateTimeOffset

EndTime ::= DateTimeOffset

Field 1 … Field n as CLR types

Examples of interval events include the width of an electronic pulse, the duration of (validity of) an auction bid, or a stock ticker activity in which the bid price for the stock is valid for a specific time period. In the power monitoring example described above, the power meter event stream may be represented with the following interval events.

Event Kind

Start

End

Payload (Consumption)

INSERT

2009-07-15 09:13:33.317

2009-07-15 09:14:09.270

100

INSERT

2009-07-15 09:14:09.270

2009-07-15 09:14:22.255

200

INSERT

2009-07-15 09:14:22.255

2009-07-15 09:15:04.987

100

Point

A point event model represents an event occurrence as of a single point in time. The point event model requires only the start time for the event. The StreamInsight server infers the valid end time by adding a tick (the smallest unit of time in the underlying time data type) to the start time to set the valid time interval for the event. Considering that event end times are exclusive, point events are valid only for the single instant of their start time.

The following diagram shows the layout of a point event model.

Metadata

Payload

Event kind ::= INSERT

StartTime ::= DateTimeOffset

Field 1 … Field n as CLR types

Examples of point events include a meter reading, the arrival of an email, a user Web click, a stock tick, or an entry into the Windows Event Log. In the power monitoring example described above, the power meter event stream may be represented with the following point events. Note that the end time is calculated as the start time plus 1 tick (t).

Event Kind

Start

End

Payload (Consumption)

INSERT

2009-07-15 09:13:33.317

2009-07-15 09:13:33.317 + t

100

INSERT

2009-07-15 09:14:09.270

2009-07-15 09:14:09.270 + t

200

INSERT

2009-07-15 09:14:22.255

2009-07-15 09:14:22.255 + t

100

Edge

An edge event model represents an event occurrence whose payload is valid for a given interval of time, however, only the start time is known upon arrival to the StreamInsight server; so the end time is set to the maximum time into the future. The end time of the event is known later and updated. The edge event model contains two properties: occurrence time and an edge type. Together, these properties define either the start or end point of the edge event. 

The following diagram shows the layout of an edge event model.

Metadata

Payload

Event kind ::= INSERT

Edge time ::= DateTimeOffset

Edge type ::= START | END

Field 1 … Field n as CLR types

Examples of edge events are Windows processes, trace events from Event Tracing for Windows (ETW), a Web user session, or quantization of an analog signal. The valid time interval for the payload of an edge event is the difference between the timestamp of the Start event and the timestamp of the End event. In the following diagram, notice that the event with a payload value of 'c' does not have a known end date at this point in time.

Event Kind

Edge Type

Start Time

End Time

Payload

INSERT

Start

t0

DateTimeOffset.MaxValue

a

INSERT

End

t0

t1

a

INSERT

Start

t1

DateTimeOffset.MaxValue

b

INSERT

End

t1

t3

b

INSERT

Start

t3

DateTimeOffset.MaxValue

c

… and so on

The following illustration shows the quantization of an analog signal using edge events based on the start and end times defined in the table above. Such a continuous signal implies that for every new value, both an END as well as a START edge must be submitted. The described edges in the illustration refer to the event from time t1 to t3.

EdgeEvent

Performance Considerations Related to the Choice of Event Model

It is important to choose the right event model for your problem. For instance, if you have events that last for a period of time, and your application has the ability to determine both the start and end times of the event, it is better to use interval events to model this. If you have a scenario where you do not know the end time of an event at event arrival, you could consider modeling the event as a point event, alter its lifetime to extend for a period of time, and then use the Clip operation to modify the lifetime when that event’s end is recognized. The other alternative to consider is to model these events as edge events.

While edge events are a very convenient event model, there are a couple of performance implications you should be aware of. Processing edge events works best when these events arrive fully ordered – i.e. all start edges are ordered on start time and end edges on end time and the combined sequence of events is also ordered on time. For example, if you have a sequence of edge events as follows:

Event Kind

Edge Type

Start Time

End Time

Payload

INSERT

Start

1

DateTimeOffset.MaxValue

a

INSERT

End

1

10

a

INSERT

Start

3

DateTimeOffset.MaxValue

b

INSERT

End

3

6

b

INSERT

Start

5

DateTimeOffset.MaxValue

c

INSERT

End

5

20

c

This sequence is unordered on timestamps (1, 10, 3, 6, 5, 20). Instead if the edge events were fully ordered - as in (1, 3, 5, 6, 10, 20) – this will have a lesser performance impact on query processing. Enabling such an ordering followed by processing is easily achieved. Split the problem into two queries. The first query can be an empty query that receives edge events as input, fully orders them, and outputs these ordered edge events. The second query can take this input and perform the main logic. Note that these should be defined as two separate queries, and then joined together using dynamic query composition. For more information, see Composing Queries at Runtime.

Event Payload

The payload of an event is a .NET data structure that contains the data associated with the event. The fields in the payload are user-defined and their types are based on the .NET type system. Most CLR scalar and elementary types are supported for payload fields. Nested types are not supported.

Adapters translate and deliver incoming and outgoing event streams to and from the StreamInsight server. StreamInsight provides a highly flexible adapter SDK that enables you to build adapters for your domain-specific event sources and output devices (sinks). Adapters are implemented in the C# programming language and stored as assemblies. The adapter classes are created as templates during design time, registered in the StreamInsight server, and instantiated in the server during run time as adapter instances.

Input Adapters

An input adapter instance accepts incoming event streams from external sources such as databases, files, ticker feeds, network ports, sensors, and so on. The input adapter reads the incoming events in the format in which they are supplied and translates this data into the event format that is consumable by the StreamInsight server.

You create an input adapter to handle the specific event sources for your data source. If the event source produces a single event type only, the adapter can be typed. That is, it can be implemented to emit events of one particular event type. With a typed adapter, all instances of the adapter produce the same fixed payload format in which the number of fields and their types are known in advance. Examples of such events are ticker feed data or sensor data emitted by a specific device. If your event source emits different types under different circumstances, that is, the events might contain different payload formats or the payload format might not be known in advance, implement an untyped adapter. With an untyped (generic) adapter, the event payload format is provided to the adapter as part of a configuration specification at query bind time. Examples of such sources include CSV files that contain a varying number of fields where the type of data stored in the file is not known until query instantiation time, or an adapter for SQL Server tables where the events produced depends on the schema of the table. It is important to note that, at runtime, a single adapter instance, whether typed or untyped, always emits events of one specific type. Untyped adapters provide a flexible implementation to accept the specification of event type at query bind time, rather than defining the event type at the time the adapter is implemented.

Output Adapters

You create an output adapter to receive the events processed by the StreamInsight server, translate the events into a format expected by the output device (sink), and emit the data to that device. Designing and creating an output adapter is similar to designing and creating an input adapter. Typed output adapters are designed against a specific event payload, whereas untyped output adapters are supplied with the event type only at runtime when the query is instantiated.

For more information, see Creating Input and Output Adapters. The core adapter API provides the maximum flexibility to implement against any event source or event sink. In addition, StreamInsight supports event sources and sinks at a higher level of abstraction that implement the IObservable or IEnumerable interfaces. For more information, see Using Observable and Enumerable Event Sources and Event Sinks (StreamInsight).

With StreamInsight, event processing is organized into queries based on query logic that you define. These queries take a potentially infinite feed of time-sensitive input data (either logged or real time), perform some computation on the data, and output the result in an appropriate manner.

Query Templates

A query template is the fundamental unit of query composition. It is the structure that defines the business logic required to continuously analyze and process events submitted to the StreamInsight server from the input adapter and generate an event stream that is consumed by the output adapter. For example, you may want to evaluate incoming power consumption events for maximum or minimum values over a given time period that exceed certain thresholds that you establish.

Query templates can be written to perform specific units of work and then composed into more complex query templates. Query templates are written in LINQ combined with the C# language. LINQ is a language platform that enables you to express declarative computation over sets in a manner that is fully integrated into the host language. This gives you the power to combine declarative processing of events with the flexibility of procedural programming in the same development platform, without the concern of impedance mismatch between these two programming paradigms.

The StreamInsight server provides the following functionality to write expressive queries and analytics:

  • Calculations to introduce additional event properties

    Use cases such as unit conversions require you to perform calculations on top of the events that you receive. Using the projection operation in the StreamInsight server, you can add additional fields to the payload and perform calculations over the fields in the input event. For more information see, Projection.

  • Filtering events

    In use cases such as alert notifications, you may want to check whether a certain payload field exceeds the operating thresholds for the piece of equipment that you are monitoring. In general, only a subset of events that satisfy certain characteristics is relevant for these use cases. Events that do not have these characteristics do not need to be processed and can be discarded. The filter operation allows you to express Boolean predicates over the event payload and discard events that do not satisfy the predicates. For more information, see Filtering.

  • Grouping events

    Consider an event stream that gives you temperature readings from all of your temperature sensors. If all the events are provided through a single event stream, you may want to partition the incoming events based on the sensor location or the sensor ID. The StreamInsight server provides a grouping operation that allows you to partition the incoming stream based on event properties such as location or ID and then apply other operations or complete query fragments to each group separately. For more information, see Group and Apply

  • Windows over time

    Grouping events over time is a powerful concept that enables many scenarios. For instance, you may want to check the number of failures that occur during a fixed period of time and raise an alarm if they exceed a threshold. Hopping and sliding windows allow you to define windows over your event streams to perform this kind of analysis. For more information, see Using Event Windows.

  • Aggregation

    When you do not care about each single event, you might want to look into aggregate values such as averages, sums, or counts instead. The StreamInsight server provides built-in aggregations for sum, count, min, max, and average that typically operate on time windows. For more information, see Aggregations.

  • Identifying TOP N candidates

    A special kind of aggregation operation is needed in use cases where you want to identify the candidate events that rank highest according to a specific metric in an event stream. The TopK operations allows you to check for those based on an order that you establish over the event fields in the stream. For more information, see TopK.

  • Matching events from different streams

    A common use case is the need to reason about events received from multiple streams. For example, because event sources provide timestamps in their event data, you may want to make sure that you only match events in one stream with an event in the second stream if they are closely related in time. In addition, you may have additional constraints on which events to match, and when to match them. The StreamInsight server provides a powerful join operation that performs both tasks: first, it matches events from the two sources if their times overlap and second, it executes the join predicate specified on the payload fields. The result of such a match contains both the payloads from the first and the second event. For more information, see Joins.

  • Combining events from different streams in one

    Multiple data sources may provide events of the same type that you may want to feed into the same query. The union operation provided by the StreamInsight server allows you to multiplex several input streams into a single output stream. For more information, see Unions.

  • User defined extensions

    The built-in query functionality of the StreamInsight server may not be sufficient in all cases. To allow for domain-specific extensions, queries in the StreamInsight server can invoke user-defined functionality that is provided through .NET assemblies. In addition to user-defined functions, you can define and implement custom aggregations or query operators. For more information, see User-Defined Functions (Stream Insight) and User-defined Aggregates and Operators.

For more information, see Writing Query Templates in LINQ. For detailed guidance on writing LINQ queries for StreamInsight, see A Hitchhiker’s Guide to StreamInsight Queries.

Query Instances

Binding a query template with specific input and output adapters registers a query instance in the StreamInsight server. Bound queries can be started, stopped, and managed in the StreamInsight server. Once data has been brought into the StreamInsight server via input adapters, computation may be continuously performed over the data. In other words, as individual events arrive in the server, these events are processed by standing queries, which emit output events in response to the arrival of input events. The following illustration shows the StreamInsight queries and adapters at runtime. The StreamInsight server consumes and processes the event when the instance of the input adapter is bound to an instance of a query. The processed data is then pushed to the instance of the output adapter that is bound to the same query instance.

CEP query and adapter ecosystem

Community Additions

ADD
Show:
© 2014 Microsoft