Touch behavior

[This documentation is preliminary and is subject to change.]

Touch interfaces can be found today across myriad devices ranging from mobile phones to slates to kiosks to horizontal/vertical displays of 30’’. Each of these direct touch interfaces has its own requirements, restrictions, and affordances. But to end users, what matters the most is the smooth, responsive, and natural experience of interacting with a device using touch. That touch experience is at the front of the user’s satisfaction with the device.

Windows 8.1 continues the emphasis on optimizing for touch. Touch, as an input method, is a first-class citizen on Windows, just as the keyboard and the mouse are today. For all ARM form factors, touch is the primary and preferred input method. Achieving a smooth, responsive, and natural touch experience requires a combination of good hardware and good software. Hardware and software vendors, including producers of touch hardware, displays, touch input stacks, applications, and so on, need to continue to work well together. For the success of Windows and the Windows ecosystem, optimal touch user experience needs to be guaranteed. This document focuses on areas dependent on hardware vendors to achieve optimal touch user experience.

Touch interaction principles

Natural, intuitive, and confident

Beginning with Windows 8, the touch became based on manipulations, which supports a more natural and intuitive touch user experience with basic gestures supplementing the experience. The manipulations model will continue in Windows 8.1, and no new gestures are defined.

Touch interactions

In Windows 8, there are seven primary touch interactions which were defined for users to interact across all form factors. Those will remain unchanged for this version of Windows. Users can perform these interactions across all the form factors (slate, convertible notebook, all-in-one (AIO), and so on) in all applications and expect to have consistent touch user experience.

For example, the user should not have to change the way to swipe, such as speed and swipe distance, to select the objects between his slate and his AIO. Both devices should recognize the “swipe” interaction for the same finger movement consistently.

Touch interactions

You can learn more about the touch language and the seven interactions from these BUILD conference sessions:


A manipulation is a real-time, direct handling of an object. A manipulation differs from a gesture in that the manipulation corresponds in a direct and one-to-one manner with how the user expects an object would react in the physical world. For example, using your fingers to pan the contents of a website up or down directly maps to using your fingers to slide a piece of paper up and down on a table.


A touch gesture is a quick movement of one or more fingers on a screen that the computer interprets as command or shortcut. Gestures often don’t directly map to any real world conventions, which creates an additional cognitive load for the user to learn and execute different gestures. This is why the gesture set defined in Windows 8 is not being expanded in Windows 8.1. Common touch gestures include “tap” to open and press and hold to invoke a contextual menu.

Touch performance

Fast and fluid performance that is perceivable and obvious – touch is direct so performance issues are felt more directly and viscerally compare to other input devices like a mouse.

The user can distinguish “responsive” from “non-responsive” touch devices easily. This does not require an expert in the touch hardware domain. For example, when an object is dragged around the screen, the user expects the object to stay under the finger to feel natural, fast, and fluid. The larger the distance between the finger and object, the more the user feels the experience is atypical and has to adjust to the device. This video created by the Microsoft Applied Sciences Group demonstrates the visible touch latency.

Application promise

Consistent experience across form factors – a user can download any application from the Windows Store and it runs great on the user’s machine. There is no application that runs great on one device but not on another. This means developers can target all Windows 8 and this version of Windows touch devices without worrying about the quality of touch devices depending on the type of form factor.

For example, all Windows 8 touch devices require supporting a minimum of 5 simultaneous touches. All touch points require meeting requirements of 25 ms initial touch-down hardware latency and 15 ms subsequent contacts hardware latency. Game developers can design features based on fast and responsive 5 simultaneous touch points support across all Windows 8 touch devices.

Touch user experience guidelines for touch targeting

There are two distinctive targets for touch, unambiguous and ambiguous. Users should not have any trouble touching the unambiguous targets.

Slate-like form factors have more of a parallax effect, making it difficult for users to touch targets. When the device is placed on a flat surface, the user’s viewing angle is slanted compared to the normal standing display. You can reduce the parallax effect by reducing the space between the surface of the cover glass and the display.


Unambiguous targets meet these guidelines:

  • Elements that meet the minimum touch user experience guidelines for size (9 mm x 9 mm).
  • Elements that don’t meet the minimum touch user experience guidelines for size, but are separated by at least 9 mm of space measured from center to center of each element.


Ambiguous targets are elements that don’t meet the minimum touch user experience guidelines for size and are not separated by at least 9 mm of space measured from center to center of each element.

Finger size

The size of the contact touching the screen varies depending on the size of the finger, posture of the finger contacting the screen, and how hard the finger is pressed against the screen. We collected large samples in the lab to measure the “contact area“ size.

  • The average size of the contact area using an index finger was 9.4 mm x 8.8 mm (bounding box width x height). One of the smallest observed in the lab was 5 mm x 6 mm, and the largest was 17 mm x 20 mm.
  • For “normal tap“ gestures, the average size observed in the lab was 10.8 mm x 10.5 mm.

Combining this information with various other data, including non-index finger usage such as side of a thumb hitting a space bar in the software keyboard, the sizes to use are:

  • Average – 9 mm x 9 mm
  • Smallest – 7.5 mm x 7.5 mm
  • Largest – 30 mm x 30 mm


The core gestures described in this section differ from manipulations in that they can all be considered static. Static gestures refer to gestures that don’t require drags of the finger(s) across the screen and are detected based on touch input, where the user’s finger is down and not moving (within a threshold of distance called the drag threshold).


Tap action

A tap occurs when a finger is placed on the screen, remains within the well-defined drag threshold and radius, and is removed. Two successive taps constitute a “double tap” gesture. Both of these gestures can trigger a command or shortcut if defined by the foreground control or the application.

When a touch up (TU) occurs after x ms (the value is predefined by Windows), this becomes a hold. When a TU location is farther than y mm (the value is predefined by Windows), this becomes a drag.

Drag thresholdFinger down moves <= Y mm radius from initial (x,y) of down (The distance is defined in the sub millimeters.)
TimeNo time limit


Drag timeDrag radius

Targeting and contact geometry

Windows uses contact geometry data for the targeting. When the touch hardware reports accurate and reliable contact geometry data, this improves the user’s targeting experience.

Each digitizer reports the point that it deems to have been intended by the user. When this point does not fall over a valid target, Windows will look at any reported geometry data to identify the most probable target.

User experience

Accuracy and responsiveness across the screen are two important factors for a good user experience. Users have a specific target they want to tap, and that target can be small, such as a hyperlink in a browser that today is designed for the precision of a mouse. Touch devices that correctly recognize taps make the user feel confident while interacting with the device.

Users can use multiple fingers at the same time in different parts of the UI. The position of the finger during the tap varies for applications as well as for the environment in which the device is used. In some scenarios, like using a software keyboard, the user uses the side of the thumb to touch the space bar and uses tip of fingers for other keys. In a paint application, the user tends to apply a larger contact area. When the PC is placed on the table, users tend to tap more strongly with a larger contact area compared to when the PC is placed on the lap while sitting on a couch.


Common usage scenarios for tap gestures include:

  • Text selection – tapping on a word
  • Active hyperlinks in the browser
  • HTML controls in the browser
  • Navigating the system and launching apps

Key hardware accountability

  • Contact geometry
  • Pixel accuracy for all areas
  • Sampling rate, reporting rate for all fingers
  • Hardware latency
  • Noise suppression, no phantom contacts

Double tap

A double tap is the equivalent of the left double click of a mouse. A single finger taps the screen twice within a well-defined period of time and with a specific distance between taps.

Double tap is a key interaction on all the desktop scenarios.

Drag threshold

Tap1: Finger down moves <= Y mm radius from initial landing position (x,y) (black circle)

Tap2: Finger down within X mm of Tap1 up (red circle) and moves <= Z mm (blue-dotted circle). The distance is defined in the sub millimeters.


Tap1: Up1 is <= M ms after Down1

Tap2: Down2 begins within M ms after the end of Tap1 (F1_Up) and Up2 begins within M ms of Down2.


Double tap threshold and radius

User experience

Accuracy and responsiveness across the screen are two important factors for a good user experience. The user needs to be able to target an object on the screen originally designed for a mouse. The expected experience is the same as for a tap.


Common usage scenarios for tap gestures include the following:

  • Desktop – double tap on a folder icon to open.
  • Desktop – double tap on a shortcut to start the app.
  • Internet Explorer – double tap to zoom in.

Press and hold

A hold gesture is used to learn about the UI or invoke the context menu. This gesture replaces “press and tap” as the “right-click“ gesture from the previous versions of Windows. In Windows 8, this gesture brings up a UI to discover and learn about the object under the finger.

Hold is fundamentally on the same path of detection as a “long” tap. Because a tap is not bounded by time, but rather by the user’s finger movement, a consistent experience should be afforded for both taps and holds. Hold is considered a “marker“ or “flag“ in the detection path for tap to allow apps and controls to have the ability to gain consistent hold timing throughout the operating system.

For objects that accept taps (such as URLs), the hold event can be used to provide the users a visual cue to aid their decision to perform a tap or to drag away to cancel. For objects that only accept taps (such as buttons), the hold event is no different than a tap (provided the finger is still within the drag threshold), and the button can simply ignore the hold flag.

User experience

Accuracy and responsiveness across the screen are two important factors for a good user experience. Users have a specific target that they want to hold, and that target can be small, such as a hyperlink in the browser which today is designed for the precision of a mouse. The UI to respond to this gesture needs to appear without delay, because users are already waiting by holding down a finger. The user feels confident with visual confirmation that the object or the operating system reacts to this gesture precisely with the timing defined by the Windows gesture recognition engine.


Common usage scenarios for press and hold gestures include the following:

  • Picker – press and hold on a file
  • Press and hold on a tile on the Start screen to show a tooltip with the full name of the app
  • Desktop – press and hold on the open space in the desktop
  • Desktop – press and hold on a file in File Explorer


Swipe to select

This is very similar to “slide” except the distance between finger down and up is very short. Users perform this gesture very quickly on tiles such as application tiles in the Start menu, photos in the photo picker, and files in the file picker.

Use a single finger to swipe the object vertically to select.

Swipe to select

User experience

Accuracy and responsiveness are key factors for a good select experience by swipe.

For example, when a user is selecting the tile, this should not be mistaken for a tap that launches the app. This causes extreme annoyance for the user. In most cases, there isn’t a simple interaction to cancel the tap and go back to the original screen where user was selecting objects.

The user expects the UI to responds as a finger swipes the screen. The UI response indicates to the user that the interaction is recognized successfully. This also helps users to move to the next item to swipe and select with minimum delay between the swipes. Very fast flicking interactions that only touch the screen for a very short distance and time are expected by users to work for selection. A “flick” is a short, quick motion meant to quickly pan or select content. Often, on modern touch stack-ups, this motion will result in a very small amount of meaningful data coming to the operating system and this typically conflicts with jitter. Digitizer must follow the guidance under the Performance Considerations section of this document to provide a clean velocity data to enable fast and fluid flicks.


  • Start menu – selecting a tile for advanced options such as unpin, uninstall, making the tile smaller
  • Photo picker

Swipe from edge

Starting with Windows 8 and continuing with this version of Windows, key UIs hide under the edge of the screen. A user can bring up or hide the UI with a simple swipe action. By not always showing the Windows UI on the screen, this maximizes the display area applications can use. Left and right edges are used by Windows, and top and bottom are used by application.

To bring up Windows UI from the edge, user will place one finger on the bezel, and drag toward the center of the screen. User must be able to perform this action comfortably at any position of the edge; the location can vary depending on the posture of a user holding a device. This is like “dragging” an object hiding under the bezel to the display area; directly manipulating an object. The difference between a typical drag is that user does not target an object to start dragging. Instead, user simply places a finger at any part of the edge and drag toward the center of the display. You may perceive this as a single finger swipe starting on the bezel and moved toward the center.

Edge dragAction
From right to centerWindows UI called “charms bar” shows up.
From left to center

This action is called “back and snap“.

Short drag:

- Back: Similar to Windows button + Tab, toggles through the running applications.

Long drag and hold:

- Snap: Turns into arrange mode – user can drag the application and drop to the side bar or main area to snap.

From top/bottom to center

This is assigned to application user experience.

Short drag from top or bottom:

- Application can show its menu on top, or bottom, or both.

Long drag from top and hold:

- Snap: Turns into arrange mode – user can drag the application and drop to the side bar or main area to snap.


User experience

The device’s form factor can impact the user experience around device edges. For example, the physical transition from the bezel to the screen is different when the bezel height is not same as the screen.

When a user drags a finger from a bezel, the user feels as if she is dragging out the menu bar from underneath the bezel. The user is confident to perform this on all four edges at any location. The user does not have to press the finger harder to the screen or try to target the very edge of the screen to drag out the menu bar. When a user places a finger on the bezel, the menu bar is stacked to the user’s finger and as the finger is dragged out to the screen, the menu bar is dragged out.

The observed average finger moving speed is 160 mm/second and goes as fast as 730 mm/second. The 95th percentile (that is, 95% are slower than this) is 400 mm/second. The user can perform edge UI drag interaction at any given speed between (slow) to (fast).

Swiping backward is also very important. The user can toggle through the running applications by swiping back to the edge and swinging in again to bring the next application. The application should not mistakenly drop to snap to the screen.


  • Windows UI
  • Application UI

Swipe from edge (application)

Dragging can start or end at the edge of the screen. It can even start from outside of the screen if the user places a finger on the bezel, and it can also end on the bezel.

User experience

The device’s form factor can impact the user experience around device edges. For example, the physical transition from the bezel to the screen is different when the bezel height is not same as the screen.

Everything described in the Drag section applies to this section.


  • Full screen application (paint application, game)
  • Desktop – dragging in the tablet input panel

Slide (panning)

Slide panning startSlide panning end

Panning occurs when a user slides the finger – the user’s finger(s) is placed on the screen and moved beyond a distance threshold. The user’s finger may continue to move in any direction for any duration. The panning ends when the user lifts the finger(s).

A single finger pan moves an object. One or more fingers moving in the same direction either directly manipulates an object or pans scrollable content. During panning, the distance between fingers remains constant within a threshold. If a second finger is placed on the screen and the distance between two fingers goes beyond the threshold, the panning stops and zooming begins. If the fingers converge, the manipulation is a zoom out; if they diverge, the manipulation is a zoom in. If distance is changed while dragging, it becomes compound manipulation (see the next section).

The pannable objects in Windows have inertia. Users can drag objects with inertia (velocity). Inertia occurs when the user is manipulating content (for example, panning a webpage) and the finger slides off the display, creating a sense of velocity and thus inertia. The object stops moving gradually, calculated from the inertia. The object stops moving when it reaches the wall defined by the application, such as end of the list or when it is out of momentum. The ability to create such a believable slow-down effect is predicated on having accurate tracking information. When the information is inaccurate, we break the illusion and the feeling that users are manipulating physical objects.

Users can repeatedly apply inertia on top of the panning object and have the object keep panning with velocity.

Compound manipulation

Panning can be done as part of other manipulations. For instance, users can pan and zoom at the same time.

Compound interactionsResult action
Pan and convergeRotate and diverge
Pan and converge and divergeRotate and converge
Pan and converge and rotateRotate and converge and diverge
Pan and converge and diverge and rotateConverge and diverge
Pan and diverge
Pan and diverge and rotate


User experience

Responsiveness is the most important factor for a good panning experience. Other factors, such as pixel accuracy identifying where a finger touched the screen and is lifted, is less important.

When a user places a finger on the screen and moves an object, the object stays under the finger. When the finger stops, the object stops at the same time and location with no visually noticeable lag to a user. When finger is moved with velocity and lifted, the object continues to move in the direction the finger was moved. The user feels as if the object was tossed on a smooth physical surface in the real world and the object’s speed decreases as if due to friction.

  • The user does not need to worry about how many fingers are used.
  • The user typically does not worry about the exact location where finger(s) touches the screen.
  • The user notices objects start to “slip away” by seeing the object following after the finger movements. Ideally the distance of the object from the contact point should be less than the diameter of the in-contact surface. See Performance Considerations for details.
  • The user can start panning at any time.
  • It feels natural to the user to apply the same manipulation over the same moving object and that the object continues to move with applied velocity.


  • Webpage browsing/navigation – Internet Explorer.
  • Photo manipulation – Windows Live Photo Gallery.
  • Used heavily for system browsing and navigation.

Slide (drag)

User experience

Pixel accuracy for the entire screen area is one of the most important factors of the drag experience. The user targets an exact point (location) on the screen to touch and start dragging the object. The user also targets an exact point on the screen to lift the finger to stop dragging the object.

Contact tracking is another important factor. When the user drags an object from one location to another, the user can pause the movement but the finger stays touching the screen. With poor contact tracking, the object can drop out from the finger and require the user to retarget the object to drag.

  • The user can target an object to touch anywhere in the entire screen.
  • While the user drags an object with various speeds, the object does not jitter. The user sees the object move smoothly.
  • The user can drag an object from one location to another in any part of the screen with one single drag, as long as the finger remains on the screen. While the user drags the finger, the dragging speed can change and pause. The object under the finger should not jitter at any time.
  • The user notices objects start to “slip away” by seeing the object follow after the finger movements. Ideally the distance between the object and the contact point should be less than the diameter of the in-contact surface.


  • Start menu – rearranging tiles
  • Desktop – rearranging icons
  • Desktop – window resizing

Pinch (converge, diverge)

Pinch actionStretch action

Converge and diverge manipulations require the use at least two fingers to perform. The key component is the delta and vector between each contact. This manipulation does not require both contacts to be moving, just that the delta is either increasing or decreasing along opposing vectors.

The following graphic shows two- and three-finger diverge and converge.

Converge and diverge


The converge gesture happens when two or more fingers placed on an object are dragged closer together (that is, toward each other). For example, a user places the thumb and forefinger on the top and bottom edges of a picture and pinches the fingers together.


The diverge gesture happens when two or more fingers placed on an object are dragged apart.

User experience

Responsiveness is very important for this experience. The object changes its scale in real time and simultaneously with the finger movement. The user does not notice any visual delay between the fingers and the object.

  • The user doesn't need to worry how many fingers are used or if two hands are used.
  • In direct touch interactions, users notice that objects start to “slip away” by seeing the object following after the finger movements. Ideally, the distance between the object and the contact point is less than the diameter of the in-contact surface.


  • Photo manipulation – Windows Live Photo Gallery
  • Webpage browsing/navigation – Internet Explorer
  • System browsing and navigation


Rotate action

The rotate gesture happens when two or more fingers on an item are dragged in clockwise or counter-clockwise directions along an arc.

The rotate manipulation requires at least two fingers to perform. The action is fundamentally two contacts moving in opposite directions along an arc, as shown in the second rotation action. However, it’s not necessary that both fingers are moving. A “pivot“ rotation is when one finger remains stationary and the other finger(s) move along an arc around it, as shown in the first rotation action.

Rotation types

User experience

The object changes its orientation in real time and simultaneously with the finger movement. The user doesn't notice any visual delay between the movement of the fingers and the object.

  • The user can perform rotation with other direct manipulation such as converge and diverge. For example, the user can zoom into the photo and rotate the photo simultaneously.
  • In direct touch interactions, users notice objects start to “slip away” by seeing the object following after the finger movements. Ideally, the distance between the object and the contact point is less than the diameter of the in-contact surface. See General Performance Considerations for details.


  • Photo Viewer
  • Map application

Select and pan

In the new Start menu, a user can select a tile (grab for rearrangement) and pan the menu to find a location for the selected tile to drop using a second contact or several contacts. This is a multi-finger gesture.

Another example may be a photo gallery application. The user selects one photo by dragging it out from a list of displayed pictures, and while holding on to the photo, uses another hand (finger) to pan the list to find the location for rearrangement.

User experience

The object selected stays under the finger without jittering while the whole list (menu) is panned by another hand (finger). The panning performance is not degraded.

Touch keyboard

Touch keyboard

Text entry is an integral part of the Windows experience. Even though the touch experience lends itself more toward content consumption, there are still times when text entry is required. Typing in a URL, posting on Facebook, tweeting, searching for artists, or filling out the credit card number in an online store are all common tasks.

On a physical keyboard, the average typing speed is 39 – 40 words per minute (wpm). Administrators can type up to 70 – 80 wpm. Professionals can type up to 90 to 120 wpm.

User experience

Tapping on a software keyboard key needs to be very responsive without any delay to ensure a good user experience. Users will tap rapidly. To type upper-case letters, users will hold the shift key with one finger and tap on the alphabet key rapidly then lift the finger off the shift key. The next tap can come while the previous finger is still down or just being lifted.

  • All the keys tapped appear correctly in the order tapped.
  • The user can hover the finger over the screen (not touching the screen) when moving from one key to another, and keys under the hovering fingers are not tapped.

A common bad experience is a case where two quick taps can be mistaken as a short drag particularly as touch resolution goes down or latency goes up. This means taps are sometimes change to moves and this is a bad typing experience. Rapid taps should not be recognized as drags.

Text selection

Text selection action

Today’s Windows users have grown accustomed to editing and manipulating text using keyboard and mouse input. This works well because the keyboard and mouse are highly granular methods of input; keys are discrete data points that are either depressed or not depressed, and mouse clicks can easily be mapped to the most granular screen measurement—a single pixel. Touch, on the other hand, has an area of effect—even with a perfect digitizer, it’s very hard for a touch user to pinpoint a specific (x,y) location on the screen, which is the kind of granularity needed when selecting text and placing the text cursor.

User experience

Users need to be able to target small elements such as a single word in the webpage very accurately. The target may be ambiguous (elements that don't meet the minimum guidelines for size and aren't separated by at least 9 mm of space measured from center to center of each element) and may require more information from the touch digitizer than simple x,y coordinates to target. For users to reliably target small objects, the touch digitizer must generate accurate contact geometry data (the contact area of the finger touching the screen).

Good contact tracking is also very important. The user can drag the finger in all directions to select and manipulate content. When contact tracking is broken, the user will lose the selection while dragging the finger that is manipulating the selection.


  • Highlighting a word – look up its definition, change fonts
  • Selecting a word in an edit field – correcting mistyped text
  • Highlighting a line – copy to email, blog, tweet
  • Cursor placement – placing at the middle of a word to correct spelling

Windows narrator

Windows Narrator is a text-to-speech utility that assists users with vision impairments. Narrator reads what is displayed on the screen—the contents in the active window, menu options, or text that has been typed. To assist navigation by touch, Narrator has a unique gesture recognition system, relying on good multi-touch functionality.

User experience

Gesture recognition rate is critical to the success of this scenario. Visually-impaired users should not have to fall back to mouse or keyboard. They must be able to navigate on a slate confidently using touch.

In performing gestures like a 4-finger swipe, the user doesn't need to worry how close 4 fingers are placed to each other. Some users may put fingers very close to each other, and some may have some space in between. In all cases, users can place their fingers in comfortable positions to perform gestures that are recognized correctly.


FingersSingle tapDouble tapTriple tap


(read what is under the finger)

Primary action

(activate a button)

Secondary action

(select an item)

2Element-specific commandsSearch and selectN/A
3Public speakingBring up a list of the available narrator commandsN/A
4Keyboard (open, close)N/AN/A



Swipe is a gesture users can perform to command the narrator. When this is performed, instead of the direct manipulation of the contents under the finger, the user is commanding Narrator to perform an assigned task.










3Read from hereShift-TabTabN/A
4Scroll (Up)Scroll (Right)Scroll (Left)Scroll (Down)



  • Windows Narrator

Performance considerations

Digitizer of high performance must follow the considerations below for fast and fluid touch experience.

Finger contacts screen (down)

Windows is focusing on making the touch experience cleaner and smoother. To help, partners can ensure that they provide realistic velocity modeling in their touch reports, i.e., digitizer must report the realistic velocity of the user between any two consecutive reports for any given contact.

Finger begins to move (disambiguation)

As the contact begins to move, we must make a trade-off between holding steady in the presence of noise or involuntary movement and reacting quickly to an intended movement.

We assume the existence of a “disambiguation zone” within which movement is discarded and the contact is reported as stationary. If this area is too large, the user will perceive a lag, or jump, in following her motion across the screen, and any program that attempts to smooth velocity will be disadvantaged by the sudden change in position of the reported contact.

Finger continues to move (pan or flick)

Once the contact is deemed to be in motion, it is very important to maintain a realistic representation of its progress across the screen. If the contact is moving at a constant speed of 200 mm/s, then we expect the digitizer to report a change in position of approximately 200 mm/(seconds/report interval). For example, given constant report interval of 10 ms, we would expect the digitizer to represent a change in position of 2.0 mm per report.

In the case of a flick, motion is often accelerating in one direction, and in this situation we expect that the digitizer will represent this as a uniformly increasing (or decreasing) change in position between consecutive each reports. When this is not the case, the user will see a misrepresentation of her intended speed in an application, as the program will need to overcome what appears to be stoppage or backtracking.

Finger leave screen (up)

It is vitally important to report the removal of a contact as quickly as possible and optimally within 15 ms, as this latency will manifest itself to the user in the form of improper speed representation in touch applications. An ideal stream of reports for a moving contact would appear as a sequence of TIP-down reports with position data reflecting the true rate of motion, followed by a single report with the TIP flag removed, and the same position as the final TIP-down report. Refer to Windows Pointer Device Data Delivery Protocol for state transition details.

Palm detection

The digitizer shall represent a palm as one contact with the confidence bit set to zero from the very first report. Once it’s set, the confidence bit should not be changed during the lifetime of the contact. For the details on confidence bit, refer to Windows Pointer Device Data Delivery Protocol.

There is a known trade-off between low latency (only occasional accidental touch as a downside) and palm/accidental-touch rejection (but with higher latency or sensitivity). Fast reporting of contacts is of higher priority over palm detection.

Technical details

This section describes key technical areas on the touch hardware and firmware that can impact Windows touch user experience.

Touch report rate

The Touch Report Rate is defined as the rate of sustained reporting rate of the touch contacts by the firmware/driver to the host. The requirement for this value is based on the consideration of the higher level scenario-specific recognition, in that a high enough report rate enables the host to catch or distinguish the motion details of moving fingers, for the purpose of recognition and differentiation as described in the previous sections.

Failure cases:

  • Manipulation – a low rate can cause tracking to lose motion details of the physical finger moves.
  • Gesture – a low rate could lead to incorrect interpretation of the gestures.

Touch scanning rate (sampling rate)

The touch scanning rate is defined as the rate of scanning through the entire frame of the sensor grid for the raw sensor signal. The requirement for this value is mainly for correct tracking of the touch IDs when the fingers are moving fast.

  • The touch report events are a subset of the touch scanning events, in the sense that multiple scanning samples can map to a single touch report, and as such the touch report rate should never be higher than the touch scanning rate.

Failure cases:

  • Software keyboard (SKBD) – a low scanning rate can cause difficulty in distinguishing 2-finger tapping from single finger fast move.
  • Multi-touch – a low scanning rate can cause contact ID switching/hopping between multiple contacts.

Initial touch latency

This is defined as the first touch event reported in relation to the first physical contact in time. Sometimes the firmware delays reporting the first touch contact up to a few frames in order to gain better accuracy by using certain kinds of temporal filtering. This causes fast motion panning to have a delayed start, and the end result is a far smaller range than that of physical finger motion, leading to a “sluggish” panning experience.

Responsiveness is critical to the end user touch experience. Immediately report the first touch registered even when it is less accurate, and indicate so in the report. The subsequent reports can still use the previously buffered samples for filtering to gain accurate reports. This way the buffering delay can be eliminated altogether.

Note  This scheme requires a gestural recognition engine to be aware of the potential inaccuracy of the initial touch.

Faliure cases:

  • Panning – A large initial touch latency can cause “sluggishness“ in the panning experience, as the range of the panning is much reduced.

Sustained report latency

This is defined as the total time duration from the touch contact on the touch screen (finger down) to the visual response on screen. It can be caused by a number of factors compounded together (hardware, firmware, driver, input stack, apps, graphics subsystem, and so on). This latency manifests itself as a moving lag of touch points on screen when the physical touch moves. For the purpose of this report, the focus should be a minimal latency in milliseconds.

Failure cases:

  • Manipulation – Failure to have a small enough sustained latency can cause visual delay of the tracking which causes an unsatisfactory effect in scenarios of manipulation.

Contact tracking between fast tapping and flicker

This issue is caused by two consecutive taps happening next to each other at different locations, and the “contact up“ on the first tap is immediately followed by a “contact down“ at a different location in the next scanning frame.

A higher scanning rate can reduce the chance of this occurring.

Failure cases:

  • Software keyboard – a user typing fast on a soft keyboard with alternating key strokes (“ABABABAB…“) can cause two successive individual key strokes to be interpreted as a touch move event.

Contact tracking with merge

The contact tracking model must be robust enough to handle the situation when multiple contacts merge into a single component in the raw sensor image. To this end, a typical connected component analysis fails to work, and a level-based approach may not bring a robust solution, either.

The reverse of splitting from a single contact component to multiple contacts should also be taken care of along with the merge case, and a properly treated merge/split pair would avoid the issue of contact ID switching.

Failure cases:

  • Failure to do so may cause contact ID switching when the two fingers physically touch together, which could happen while performing a converge gesture.

Contact position stability

Contact position stability requires certain filtering to smooth the noise when touched stationary.

Failure cases:

  • Manipulation such as pinching and rotation – unsmoothed targeting can lead to confusion of certain gesture detection. For example, pinching could be confused with rotation.

Balance of sensitivity and ghost reports

  • The balance is defined through the ROC curve on the false positives (ghost reports) and the false negatives (missed reports).
  • The balance point is determined by a combination of a judicious selection of the signal thresholds and an accurate baseline removing process.
  • There may be multiple levels of thresholds for events such as finger touch down, up, and non-finger detection. Furthermore, the thresholds may be region dependent if such information can be properly fed back to the firmware.
  • The baseline removing process should promptly update the baseline to minimize the perceived noise level for the purpose of contact detection.
  • Partners should follow the state transition guidelines.

Failure Cases – when the activation threshold is set too low:

  • Ghost points – unwanted triggering of links due to sensitivity being too high.
  • Software keyboard – unwanted keys tapped.

Failure Cases – when the activation threshold is set too high:

  • Loss of contacts – dragging of a contact can be lost due to sensitivity being too low at certain areas.
  • Change of contact ID – dragging of a contact changes the contact ID, causing the app to interpret a different gesture.
  • Software keyboard – false double taps.

Non-finger detection

  • A non-finger is defined as a component that is likely not a finger due to its size and shape, as well as the distribution of its sensor pixel values.
  • A non-finger can be detected through certain approaches. A simple approach is to use a certain set of heuristic tests such as area test, shape test, fractional dimension test, and so on.
  • Non-finger detection can also be used to report the touch as palm with the palm indicator set.

Failure cases:

  • Massive interference with all scenarios – causes non-fingers to be interpreted as finger touches.
  • Baseline removal – failure to identify a non-finger can lose the chance to promptly update the baseline, resulting in suboptimal balance between the sensitivity and ghost reports.

Baseline removal

Properly and promptly identify the baseline case within a short time that such updating will not interfere with user attention of baseline change and remove it immediately.

Related issues:

  • Suboptimal balance of sensitivity and ghost reports.

Selective suspend

All USB connected Windows Touch Screen controllers shall support selective suspend and report this capability via a Microsoft OS descriptor. For additional details see Microsoft OS Descriptors.

A device may elect to reduce scan rate in this mode to reduce overall power consumption while still adhering to the contact down latency requirement for this mode.

To ensure no data is lost while the device is resuming from Selective suspend, the touch device manufacturer has two options:

Option 1 (Ideal solution): Buffering

Once the device has detected contact activity, it shall signal remote wake. From that event, the device shall buffer at least 100 ms worth of contact reports to ensure that little to no input is lost while the USB host controller is resuming. This option works in Windows 8.1 as well as in Windows 8.

Option 2 (Secondary solution): Registry key

This solution should only be used if Option 1 is not feasible (i.e. older touchscreens).

In the case that buffering is not feasible, HID USB touch controllers should not be selectively suspended by the Host after 5 seconds of IDLE time; instead, they should be suspended by the Host only on screen off. To ensure this happens, the following registry values must be set by the OEM for a given device instance:

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Enum\USB\<VID>\<Device Instance>\Device Parameters

<VID> refers to the vendor ID (or vendor ID/product ID combination) of the HID USB touch digitizer, and <Device Instance> refers to the instance of the device to which these settings should apply.

If the following values do not exist, they must be added:

Value nameTypeValue


In order for the registry keys to be preserved across OS upgrades, they should be set via third party device INF. This device INF should only be setting the registry value and the third party should not install a driver via the INF. Please note that in order to meet the Device.Digitizer.Touch.HIDCompliantFirmware HCK requirement while implementing this INF, the touch digitizer has to report using the inbox Win32k.sys driver and conform to the HID specification as defined in Windows Pointer Device Data Delivery Protocol. If the INF installs any kind of bridge driver, it will fail HCK tests and Touch Hardware Quality Assurance Certification.

It should also be noted that there is no Sleep state for USB touch controllers as they are not wakeable devices and should therefore be as close to OFF as possible when the system has entered S3 or CS. We recommend USB controllers not be used on CS capable systems.

Touch suspend

Idle state

The Idle State is defined as the device operating mode when no contacts have occurred within a host defined period and has therefore been suspended. This is referred to as USB selective suspend.

If the device is using USB to connect the touch digitizer, the USB hub for the digitizer must not be the same as the USB hub for the storage. Storage devices consume a lot of bandwidth so they are not appropriate for input buses.

If the device is using I2C, the device is required to meet the Device.Digitizer.Touch.ResponseLatency requirement.

Power adapters

Different power adapters can generate different noise levels that impact touch performance. Each unique power adaptor configuration should be considered as a unique touch device. Thus if the same touch device is shipped with two different power modules (e.g. 2 pins or 3 pins for different regions), each should be considered as a unique touch device.

Bezel guidance

Tablet system

Tablet Systems include Slates and Convertible Notebooks. These are required to have flush bezel. Flush bezels are also called “Edge to Edge” glass.

Tablet system bezel

Below is an illustration of a flush bezel which is a cutaway on edge view. If there is a non-active border area between display area and bezel area, the non-active border area is to be considered as part of the bezel area for this illustration and bezel guidance for flush bezels. For details on flush bezel, refer to Device.Digitizer.Touch.Bezel.

Flush bezel

For a tablet system, the main purpose of the bezel and non-active border area is to allow enough space for users to grip the device to reduce accidental touches. Flush bezel guidance is provided below under the heading Flush Bezel.

Non-tablet system

Non-Tablet Systems include All-In-Ones and Clamshells and these can have non-flush bezels.

Non-tablet system bezel

Below is the illustration where it is not a flush bezel. In this case, the bezel is higher than the display area.

Non-flush bezel

Non-tablet are not required to have a flush bezel, but if the bezel is not flush and introduces a raised edge, a 20 mm border area must be implemented to allow for uninterrupted edge access. For details on the requirements on non-flush bezel devices, refer to Device.Digitizer.Touch.Bezel and System.Client.Tablet.BezelWidth.

If a Non-tablet has a flush bezel, the main purpose of the bezel and non-active border area is to allow enough space for responsive swipe from edge.

Flush bezel

The bezel size cannot be bigger than 26 mm. The recommended bezel size is 17.5 mm. This is to allow enough space for the users to comfortably hold the tablet in variety of postures without accidentally activating screen UI.

If your form factor is a Small Tablet and it demands your bezel size to be smaller, it can be as small as 7 mm provided that your device meets the accidental touch input requirement as defined in System.Client.Tablet.BezelWidth. Refer to Display section for the full definition of Small Tablet.

If your form factor is an All-In-One or Clamshell and it demands your bezel size to be smaller, it can be as small as 7 mm.

If there is a non-active border, the allowable maximum and minimum applies to the combination of the bezel area and non-active border area.

Contact geometry

Contact geometry data is used throughout the operating system to disambiguate touch targets and improve users’ ability to reliably hit on-screen UI. Applications make use of geometry in painting scenarios and they can use the contact geometry to implement palm rejection or palm detection scenarios in their application layer.

Digitizer must report contact geometry where T (x,y), C (x,y), h’ and w’ are required variables to represent contact geometry.

HID requirements

The table summarizes the HID requirement to report contact geometry. The details are available in Windows Pointer Device Data Delivery Protocol.

Tx, TyIntended target coordinate
Cx, CyCenter of the mass X/Y coordinate of the finger contact area. At the same time, this is the center of the rectangle bounding the finger contact area. Can be equal to T.

Height of the oriented (pink) rectangle bounding the finger contact area

Reported in physical units (mm).

This has the accuracy requirement of +/- 1mm.

Do not report zero.


Width of the oriented (pink) rectangle bounding the finger contact area

Reported in physical units (mm).

This has the accuracy requirement of +/- 1mm.

Do not report zero.


Counter clock Angle of the oriented (pink) rectangle against the X axis.

Range [0, 360)

Optional Variable. If omitted, assumed axis-aligned


This figure shows the parameters to represent a contact.

Contact parameters

Cover glass considerations

For tablet systems which include Slates and Convertible Notebooks, Microsoft recommends that the cover glass application be one of the following:

  • A one glass solution (OGS)
  • A discrete cover glass application in which the glass serves as a protective display over situated on top of the touch sensing layer, but not as a physical carrier or substrate for the touch sensing layer itself.

The following glass cover design characteristics are highly recommended:

  • The cover glass should be chemically strengthened to exhibit non-frangible behavior where the glass does not energetically fragment into a large number of small pieces when impacted with sufficient penetration force.
  • Particular care should be made to strengthen the edge area of the glass and holes and/or slots should be used in the machined glass cover parts.
  • The cover glass should be scratch and smudge/fingerprint resistant.
  • The glass bond is sufficient to prevent pooling (ripple effects around the contact) on touch.
  • Latency of touch reports is reduced to have optimal Windows touch user experience.
  • It is important to be able to slide a finger smoothly across the glass.


To provide better servicing and compatibility with future developments in touch technologies, touch controller firmware must be updatable in the field. Touch controllers should comply with the HID/I2C and/or HID/USB protocol.

Device field firmware update (FFU) capability is an HCK requirement as it is an essential mechanism for allowing devices to receive critical bug fixes and potential feature additions. Touch devices are required to be HID compliant which means that device specific fixes cannot be undertaken via a driver update and hence OEMs will rely on targeted device firmware updates.

The best-in-class Windows systems utilize UEFI to perform device firmware update via a UEFI capsule. There are significant advantages in this mechanism in that system firmware, ACPI, bus controller firmware and device firmware dependencies can all be addressed via a single package which is applied atomically. The role of the input device, be it a touch, pen or precision touchpad is to expose an interface for pushing the update payload to the device. This interface is then leveraged by the UEFI implementation.

It is highly recommended that the interface be based on a vendor specific top-level HID collection specifically used for host transfers of firmware payloads to the device. This collection would define an output report optimally sized for payload transfers to the device (see below).

Irrespective of interface implementation, it is essential that device firmware/bootloaders are constructed to recover from power loss during firmware update. If a firmware update does not complete successfully, the device should revert to the previous version of firmware on power cycle.

HID report model


The following tables summarize the hardware features of a multi-touch solution.

FeatureRequirement or recommendation

Power Draw

(7” – 15.6”)

The maximum power draw for a digitizer/controller combination.

Active – The state where the touch controller is fully powered and functioning per device requirements.

Idle – The state transitioned to from 'Active' when the touch controller has not received input for a specified period of time.

Off – The state where the touch controller is powered down.

For details, refer to power-handling-windows8-touch-controllers.docx.

Communication Bus

I2C and USB only.

Connected Standby systems must support I2C.

If I2C, must be independent bus for digitizer and required to meet Device.Digitizer.Touch.ResponseLatency requirement.

If USB, Digitizer’s USB Hub must be separate from USB Hub for Storage and must meet selective suspend requirement as described in the documentation earlier.

Contact Geometry

Digitizer must report contact geometry as described earlier in this documentation.

Palm Detection

Digitizer must report Palm as one contact with contact geometry and confidence bit.

Velocity Data

Digitizer must report velocity data as described in Performance Consideration section of this documentation


Required to meet all of Device.Digitizer.Base and Device.Digitizer.Touch requirements. Refer to WHCR.


Related topics

Building great touch systems
Touch firmware development: Talking HID to Windows
Building great Windows 8 systems
Make Great Windows Store Apps that are Touch Optimized using HTML5
Build advanced touch apps
Designing Windows Store apps that are touch-optimized
Make Great Touch Apps using XAML
Building and delivering a great Windows Store app for your device
Windows Pointer Device Data Delivery Protocol
Windows Touch Drivers
Windows Hardware Certification
Windows 8 Hardware Certification Requirements and Policies
Touch in Windows 8