Direct Manipulation
You can use the Direct Manipulation APIs to process touch input on a region or object, generate output used by a composition engine when applying transforms to the related rendering surface, and optimize response and reduce latency through off-thread input processing, optional off-thread input hit testing, and predictive output based on the rendering time of the compositor.
By default, any application that uses Direct Manipulation to process touch interactions gets the fluid Windows 8 animations and interaction feedback behaviors that conform to all recommendations described in the Guidelines for common user interactions.
Developer audience
The Direct Manipulation API is intended for experienced developers who know C/C++, have a solid understanding of the Component Object Model (COM), and are familiar with Windows programming concepts.
Run-time requirements
Direct Manipulation was introduced in Windows 8. It is included in both 32-bit and 64-bit versions.
Overview
Straightforward and consistent With Direct Manipulation, you have a straightforward and consistent way to handle and process complex touch input. Each touch contact is filtered from within the stream of pointer input messages and tracked until the contact is lifted. The pointer messages for a touch contact are passed to an internal Interaction Context object that performs recognition on the manipulation natively without the need for you to implement your own Interaction Context objects. Your application is notified of all interactions involving the tracked contacts through a callback infrastructure.
Responsive To avoid unpredictable performance, minimize latency, and optimize responsiveness, Direct Manipulation processing occurs on a delegate thread rather than the UI thread. As a result, user interactions are processed, transforms calculated, and manipulations applied to UI elements in parallel to UI updates.
Flexible The interfaces provided with Direct Manipulation provide comprehensive support for input handling, interaction recognition, feedback notifications, and UI updates. Your implementation can work with a native compositor for screen updates that runs outside the UI thread or, for low latency, the application process itself. You can also provide your own compositor and use Direct Manipulation to handle the complex processing of touch input only. You can also use the many interfaces implemented by Direct Manipulation itself, which incorporate system services such as DirectComposition.
Basic concepts
Here we discuss some of the fundamental concepts behind the design of Direct Manipulation.
Manipulations and gestures
There are two types of interactions Direct Manipulation is designed for: manipulations and gestures.
In Windows 8, Direct Manipulation defines a manipulation as the scrolling or zooming of some part of the application UI. These manipulations are performed by dragging one or more fingers across the screen to pan, or pinching and stretching two or more fingers to zoom in and out, respectively. You can also support these manipulations through the mouse and keyboard (rolling the wheel button or pressing the arrow keys).
Note The semantics of a manipulation do not change based on the content or application: on-screen content is manipulated in strict accordance to how the touch contacts/mouse wheel/keyboard input are moved.
A gesture is defined within Direct Manipulation as an interaction that causes some non-trivial reaction from a UI element. For example, a press-and-hold gesture is typically used for placing the caret in a text input field or to initiate a context menu. The semantics of a gesture can change based on the UI element. Tapping a button triggers the click event whereas tapping a property tab brings the tab to the fore.
In some cases, an interaction might be either a manipulation or a gesture depending on the configuration of the UI. For example, a vertical swipe on a Windows 8 start menu item selects the item (a gesture) while a horizontal swipe or slide scrolls the menu (a manipulation). This behavior is referred to as cross-slide (for more information, see Guidelines for cross-slide).
Direct Manipulation and the HWND
Direct Manipulation manages interactions on UI elements through references to their Win32 HWNDs. The pointer input messages for each element are consumed by Direct Manipulation, which makes asynchronous callbacks to the Direct Manipulation Component Object Model (COM) objects implemented in your application. These objects process the input data from the user interactions and respond to the gesture or manipulation accordingly.
You can coordinate interactions between different instances of Direct Manipulation objects associated with different HWNDs using interfaces provided by Direct Manipulation, such as IDirectManipulationContactHandler.
Viewports, contents, and contact assignment
Direct Manipulation uses viewports, contents, and contacts to describe the interactive elements of the UI.
A viewport is a region within a window that you declare able to receive and process input from user interactions. This might be an HTML div element (scrolling), a pannable list (the Windows 8 start screen), or the pop-up menu for a select control. You can specify any region as a Direct Manipulation viewport if it supports the required interactions.
Content represents the element that gets transformed in response to the interaction handling. The content is what moves or scales as the user swipes or pinches. There are two types of content:
- Primary content is created at the same time as the viewport. This is the only content where feedback behavior can be further customized with snap points. The primary content is the single, intrinsic element within a viewport. Primary content cannot be added or removed from a viewport.
- Secondary content is created separately from the viewport. Secondary content can be added and removed from a viewport, which can have an unlimited amount of secondary content. All secondary content transforms are derived from those supported by the primary content but specific rules can be applied based on how the intended purpose of the element, identified by its CLSID during creation.
A contact represents a touch point identified by the pointerId provided in the WM_POINTERDOWN message. The contact is active from when it is first detected until it goes out of detection range. Your app notifies Direct Manipulation of the contacts it wants handled and the viewports that should react to those contacts. Keyboard and mouse input have special pointerId values so they can be handled appropriately by Direct Manipulation.
At a minimum, Direct Manipulation requires that you identify the viewports and content of interest and assign contacts as they come into existence.
Hit testing and viewport hierarchy
You must decide which viewports are affected by touch input. You do this through hit testing: taking the screen location of the input and determining which viewport rectangle the touch contact hits.
If the touch contact hits more than one viewport, you specify the order in which a viewport handles the input. This order forms a hit testing hierarchy for handling user interactions within your app. In addition, chaining and parent promotion calculate outcomes based on this hierarchy.
Targeting the correct viewport
One or more contacts can perform a manipulation or gesture with any number of viewports the target of the interaction. Each viewport can be configured to support specific interactions and active contacts can be assigned to any viewport, as required.
Based on these settings, Direct Manipulation identifies which viewport handles the input with the first viewport in the hit testing hierarchy typically handling the input. However, if multiple contacts are active, Direct Manipulation selects the first ancestor or parent viewport in the hit testing hierarchy that is a common target for all contacts. Chaining and parent promotion are used to manage these requirements and assign the manipulation or gesture to the appropriate viewport.
Chaining is enabled on a viewport to handle cases where the user continues a manipulation after the content of a viewport reaches a content area boundary. The closest ancestor viewport in the hit testing hierarchy that also supports the manipulation, handles the continuing input as if chained to the original viewport. If the direction of the manipulation is reversed such that the ancestor viewport returns to the point where chaining was triggered and the user continues the manipulation, the viewport is unchained and the manipulation resumes on the original viewport.
Parent promotion is enabled when two or more contacts are assigned to the hit testing hierarchies of different viewports. In this case, the contacts are assigned to the first ancestor viewport common to the hit testing hierarchies of all contacts that support the interaction being performed. A viewport handling any of these contacts individually has its state set to DIRECTMANIPULATION_SUSPENDED unless it is the common ancestor.
Unlike chaining, parent promotion cannot be reversed. The ancestor viewport continues to process the interaction input until all contacts are lifted (initial viewports do to regain their contacts). This is the default behavior and cannot be changed.
Input processing
Direct Manipulation supports two models for processing input:
| Term | Description |
|---|---|
|
Automatic/Independent |
Window messages are automatically intercepted by Direct Manipulation on the delegate thread and handled without running application code (independent of the application). |
|
Manual/Dependent |
Window messages are received by the window procedure running in the UI thread, which then calls Direct Manipulation to process the message (dependent on the application). |
Note Mouse and keyboard input are always processed in dependent mode.
Windows messages are processed by Direct Manipulation in the following order:
- The window message reaches the delegate thread first, where Direct Manipulation hit tests against all running viewports (currently undergoing interaction or in inertia, explained later). Direct Manipulation automatically assigns the contact to the appropriate viewports identified by hit testing. Because Direct Manipulation takes action according to implicit user intent (flicking an already moving piece of content moves the same content), this is referred to as implicit contact hit testing and assignment.
- The window message then gets passed to the window procedure of an optional, application-owned window. This window performs hit testing on a non-UI thread (RegisterHitTestTarget), if implemented. This provides an opportunity for the application to parallelize the hit test and contact assignment process from normal window message processing through the window procedure in the UI thread (including any hooks and other special processing prior to the window procedure).
- If the above phase does not respond before a given time threshold is passed, the message is forwarded to the UI thread and the normal message processing pipeline.
Due to this workflow, the application has two chances to assign a contact:
- During hit testing on the delegate thread before the window message reaches the UI thread (optional).
- In the window procedure running on the UI thread.
Composition engine
A compositor enables an application to offload the content transformations due to user interactions on the UI thread. To achieve a high level of responsiveness, Direct Manipulation can work with a compositor on the delegate thread to process input in parallel to the application logic in the UI thread.
This compositor can also control the timing of updates to inertia animations to coincide with updates to composition frames.
The application can provide a custom compositor if it implements all required Direct Manipulation interfaces. This compositor can also implement functionality to notify Direct Manipulation of any latency in frame composition. Direct Manipulation can use predictive logic to compensate for this delay.
Viewport, content, and output transforms
There are three main coordinate systems employed by Direct Manipulation:
| Term | Description |
|---|---|
|
Client coordinate system |
The origin is at the top-left corner of the client rectangle. |
|
Viewport coordinate system |
The origin is at the top-left corner of the viewport rectangle. |
|
Content coordinate system |
The origin is at the top-left corner of the content rectangle. |
2-D transforms describe the spatial relationship between the client, viewport, and content coordinates. These transforms can convert a coordinate value between coordinate systems. There are transforms that always apply, while other transforms only apply when a compositor is used.
- The viewport transform is between the client and viewport coordinate systems.
- The content transform is between the viewport and content coordinate systems.
- The output transform is between the viewport and content coordinate systems, which is applied by the compositor.
The viewport and content rectangles contribute to their respective transforms because the left and top coordinates of the rectangles identify an offset from the coordinate systems of their respective parents.
Inertia, snapping, and boundaries
Inertia is the gradual deceleration of a manipulation after all contacts have been lifted (similar to sliding to a stop on a slippery surface). When the user swipes slides the content, it does not immediately stop after the contact is lifted. Instead, the content continues with its current heading and velocity slowing gradually to a stop.
Snapping causes content to move to a predetermined location at the end of inertia (when specific conditions are met). You can specify where and when primary content should snap.
There are two ways to specify where snapping occurs in primary content:
- Snap points, which indicate a distance relative to a fixed point in the content at which to stop the manipulation and ensure a specific subset of content is displayed in the viewport.
- Snap intervals, where you provide a starting point and an initial distance used to identify a snap location at multiples of that distance from the starting point, until the end of the content.
Snapping provides the following options:
- Mandatory, where the content must come to rest at a snap location.
- Optional, where the content must come to rest on a snap location only if it is within a distance threshold at the end of inertia.
- Single, where the content must come to rest on the next snap location in the direction of travel.
- Multiple, where the content can move past multiple snap locations before coming to rest.
The content moves inside the viewport. The boundaries where it can move are defined by the rectangles of the viewport and content. Panning momentum during deceleration causes a slight bounce-back effect if either a snap point or a content area boundary is reached.
Nearby snap points, content alignment, and chaining can all affect where the content comes to rest.
Supporting documentation
Send comments about this topic to Microsoft
Build date: 10/27/2012