Dr. Charles B. Kreitzberg and Ambrose Little
Usability testing is easy to do, and it yields a great deal of information. If you have never done it, you will be amazed at how valuable a tool it can be in clarifying what works and what doesn’t. Watching people interact with your designs and listening to them describe their mental processes when they run into problems gives you a lot of clues about the fit between the UI and users’ mental models.
Watching usability tests will not only improve your UIs but sharpen your UI skills as well. In his classic book The Trouble with Computers (MIT Press, 1995), Th omas K. Landauer recounts an experiment by Jakob Nielsen in which developers (computer science students) were asked which of two UIs created for a product they would recommend to management. One UI was known to be highly usable as a result of extensive testing, while the other was an early version with a lot of usability problems. The developers were divided evenly—essentially choosing randomly. A second group of developers observed usability testing with four participants (two on each UI), and eight out of 10 were able to pick the better UI. Landauer concludes, “Even without usability expertise, someone who has tested just four users, two on each version, can usually make the right choice” (page 315).
An alternative to usability testing is “inspection” or “heuristic review.” Inspection methods are analogous to code walkthroughs or reviews. Th ey are economical, fast, and eff ective. Th e problem with inspection methods is that the reviewers need to be able to recognize usability problems when they encounter them. Observing usability testing is a good way to build these skills. The classic book on inspection methods is Nielsen and Mack’s Usability Inspection Methods (Wiley, 1994). A team-based approach to inspection that uses scenarios to guide the process is described in Lucy Lockwood and Larry Constantine’s article, “Usability by Inspection” (foruse.com/ articles/inspections2003.pdf). Inspection methods can be useful tools throughout the development process, but they do not have the same credibility as usability testing.
So usability testing is good for your UI and good for your professional development. How do you do it?
The core process is straightforward: you place users in front of the UI and ask them to perform specifi c tasks. Th en you observe their interaction with the UI and determine where problems exist. Th e simplicity of the paradigm is apparent from Figure 1.
As long as you follow the basic paradigm, there are a lot of variations you can introduce. I’ve run multiday tests in laboratories with one-way mirrors and complex video equipment and 15-minute minitests in which we enlisted someone in the hallway and ducked into a conference room with a paper prototype of the UI. I’ve also conducted many usability tests over the Web where I’ve never met the participants face to face.
Figure 1 Usability Tests Are Simple
You don’t need a fully functioning UI to get good information. You can test with technology as simple as paper and pencil sketches of the screens. With these paper prototypes, you can have the user point to controls and fl ip the pages to simulate navigation. You can create bitmaps with hot spots, or you can create UI prototypes with simulated databases and transactions.
Some developers might be uncomfortable with the need to recruit participants and facilitate the testing sessions. In addition, you generally shouldn’t test designs when you have a stake in the outcome because of the risk of subtly infl uencing the users and introducing bias. Finally, usability testing can add another set of tasks to juggle.
If these are problems for you, there are many consultants available to help or to undertake the complete job. If you have a budget, you might want to consider getting outside help. Don’t let the logistics or concerns about your skill set keep you from using this important design tool.
One of the most frequent questions I’m asked is how many participants are needed. Th e number, of course, depends on circumstances. Conventional wisdom suggests (and my experience strongly supports) that for run-of-the-mill tests during development, you need six participants to get useful results. Statistical theory suggests it would be more appropriate to use a larger group. It’s true that a group of six might miss some issues, but in test aft er test, I fi nd that aft er four users I’ve identifi ed most of the issues and adding additional participants rarely provides much new information. Oft en I’m working with mockups that have limited functionality or even screen drawings (paper prototypes) that have no interaction capability. Limited functionality also limits how much testing you can do. So typically, I’ll recruit eight participants for one-hour sessions, expecting that a couple will fail to show. Th is lets me get a test done in a day. When I’m conducting a major test (for example, design for a corporate Web site), and I need to report quantitative results back to management, I double the number of participants and conduct the test over two days.
What is critical is that the participants be asked to complete specifi c tasks. I frequently encounter people who tell me that they tested the usability of their soft ware by showing it to users and asking them how they liked it. Sorry, but usability tests are not opinion polls. It’s useful to ask for the users’ impressions of the product, usually at the end of the test, but the test itself must be task based if you want to get useful results.
A similar error is showing users a Web site and asking them to explore it. Again, there can be a lot of value in this type of task, and I’ll oft en use it at the beginning of a test to learn what participants fi nd compelling, but it’s still important to construct specifi c tasks that exercise the elements of the UI that you want to test. It’s usually impossible to test every UI element in a product, so you have to decide which tasks are the most important.
Of course, the specifi c tasks that you construct will diff er depending on the site or application you are designing. Early-stage testing typically focuses on the high-level navigation and information architecture.
In the early stages of a project, and when a diffi cult UI problem occurs, you might also use the testing to decide which of two designs is better. I oft en use a “repeated measures” design, where I ask the participant to complete similar tasks on two competing UIs. Th e value of repeated measures is that by using the same participant with both UIs, you keep the human variation to the minimum (and you have to recruit only one-half the number of participants). Th e downside is that in a repeated measures design there might be practice eff ects. Th at is, working on the fi rst UI might aff ect the participant’s performance on the second. To deal with practice eff ects, you should always counterbalance the order in which the participants are introduced to the two UIs. In fact, you might want to consider counterbalancing any time you think that the experience of testing one task might aff ect performance on a subsequent task. Deborah Mayhew has a good discussion of some of the issues around task defi nition in her article, “Usability Testing: You Get What You Pay For” (taskz.com/ ucd_usability_testing_indepth.php).
In a typical usability test, the participant works with a facilitator. Th e facilitator is responsible for conducting the test and helping the participant stay on track without off ering too much guidance. When possible, the facilitator should be a neutral party—not the designer, developer, or a business stakeholder. Th e facilitator must provide enough instruction to keep the test moving but be careful not to bias the results.
For consistency, the facilitator should work from a script that lays out how the user will go though the tasks. It’s not necessary to write out the entire script word for word; the scripts I use are typically written as bullet points. In the script, I include the test set up and the instructions to be communicated to the participant so that each participant starts the test with the same context. I notate each task in the script along with the instructions I will give to the participant. I also include notes on key questions I want to ask. Finally, I include any post-test questions I want ask. Creating a script ensures that everyone participating has a smooth, comparable experience. Asking the same questions of each participant makes note taking easier and simplifi es the process of aggregating responses for the fi nal report.
In the tests I conduct, I ask the participant to think aloud. Th e goal is to gain insight into the participant’s cognitive processes as they relate to task performance. Th inking aloud is not natural for most people, so the facilitator must gently ask questions and probe issues without introducing bias.
At the beginning of a test, I tell the participant that we are testing the soft ware—not them—and that any problems they experience are areas where we need to consider changes. Of course, people are still easily embarrassed by errors that they might make, so I reinforce throughout the test that the problems are with the UI.
When a participant encounters a problem, I gently probe to understand what the participant was thinking and where the confusion occurred. But I’m careful not to distract the participant from the task list. And I am always careful not to react to anything the participant says or does.
As the participant goes through the tasks that you’ve set up, you should be watching in an unobtrusive way. Th ere are soft ware products available, like TechSmith’s Morae (techsmith.com), that enable you to capture screens and mouse movements along with a video of the user’s face. What’s nice about a product like Morae is that it can distribute the video to other networked computers so that stakeholders can observe the test remotely. Remote observers can place bookmarks in the video to identify areas that record potentially important or interesting interactions for later review.
It is valuable to create a video of the test that captures the screen and the participant’s facial expressions, which are oft en revealing. If you are using soft ware like Morae, the video will be captured automatically.
An alternative is a clever, low-cost design suggested by Bob Hurst of Southwest Airlines. His setup uses a laptop connected to a second monitor with a single video camera capturing both screen activity and facial expressions. You can see it at usabilityprofessionals.org/upa_publications/ upa_voice/volumes/2005/february/low_cost_usability.html.
When I am testing a Web application or Web site, I might use remote usability testing. It’s fast and easy because the participant can work from his or her own computer. Th ere is specialized soft ware available for remote usability testing, but I oft en just use Web conferencing soft ware to share the user’s desktop. I can see the mouse movements and capture the session by recording the Web conference. What you don’t get with remote usability testing is the ability to view (and record) facial expressions. I fi nd that I’m using remote usability testing a lot because of its low cost and convenience, but it can be diffi cult for an inexperienced facilitator. It’s oft en better to learn the techniques face to face.
Whether you are testing in person or remotely, if you can record video in some fashion, it’s generally useful to do so. Sometimes it’s worthwhile to go back and confi rm what a participant actually said. It’s also useful to make brief video extracts into a highlight tape that can be shown to other developers or to management to help them understand what the test revealed. A well-designed highlights tape can go a long way to helping you make your points.
Th ere are a number of quantitative measures that you can record: success vs. failure, number of errors, time to complete tasks, and such. You can also ask users to rate their opinions on a numerical scale.
For me, much of the value of usability testing is qualitative— observing the interactions and probing the participants’ thinking. Aft er the tasks are completed, I initiate a discussion with the participants and try to learn as much as possible about their subjective experience, what they liked and disliked, and what would have made the product more compelling, useful, or easier to use.
When I’m facilitating a usability test, I’m very focused on paying close attention to the participant. Th at makes it diffi cult to take notes, so I usually work with a note taker who is responsible for documenting actions and comments that will fi nd their way into the report and recommendations. I set up a video feed for the note taker so that no one is in the room except for the participant and me, which avoids distracting or creating anxiety for the participant.
How much reporting I do depends on how I am sharing the results. For some companies, it’s important to produce extensive written reports. In most cases, I present results using PowerPoint. A report will typically cover the following:
If you are ready to try usability testing, here is a list of the steps to help you get started. Edit them as needed to suit your own situation:
There are several great introductions to usability testing that are worth a read. Natalie Downe has posted a nice summary, “Fast and Simple Usability Testing,” which can be found at 24ways.org/2006/fastand- simple-usability-testing. A more formal approach was presented by the University of Texas at Austin: utexas.edu/learn/usability/index.html. Jared Spool has produced a list of Seven Common Usability Testing Mistakes at uie.com/articles/usability_testing_mistakes/.
Try usability testing. You will be amazed at how this simple process can improve the quality of your products.
DR. CHARLES KREITZBERG is CEO of Cognetics Corp. (cognetics.com), which off ers usability consulting and user experience design services. His passion is creating intuitive interfaces that engage and delight users while supporting the product’s business goals. Charles lives in Central New Jersey, where he moonlights as a performing musician.
Charlie mentioned “Usability by Inspection” by Lockwood and Constantine. I saw Mr. Constantine present this method at SD Best Practices East a while back, and it really resonated with me as a way for teams that can’t or won’t invest in actual usability testing to get partway toward finding usability issues.
Charlie mentions that one of the difficulties with most inspection/ review methods is that they rely on the expertise of the person doing the inspection. The nifty thing about Constantine’s method is that it seems to lie in between that sort of expert review and usability testing. It does this by using a collaborative approach in which people not familiar with the soft ware work through specifi c scenarios. As the “user” is working through the scenarios, the other participants on the inspection team help identify and record apparent usability issues.
Constantine himself says in the referenced paper that his method is not a substitute for usability testing but a complementary tool. He also compares it to other inspection methods, noting relative pros and cons, but it seems to me that if you can’t, for whatever reason, do real usability testing, this approach is a great way to start identifying many issues that you would not otherwise discover via standard development testing.
While, as Charlie notes, you can get valuable insights by just testing paper prototypes or mock up click-throughs with hotspots, you inevitably come to a point when you want or need to start building higher-fidelity prototypes, even perhaps the “real thing.” As you increase fidelity, the cost of making changes increases, of course. That’s why it’s important to test early and often—with the hope that you can cut out any major design fl aws before investing too much.
At the same time, to really refi ne and optimize for the experiences you’re designing for, you need to continue doing usability testing as you increase fidelity. In fact, the higher fi delity you have, the closer you get to the real end experience, and that’s what people will be dealing with in your fi nal product, not sketches on paper. So, just as the observed information you get from usability testing becomes more accurate and focused, so, too, does your audience’s engagement and ability to provide more accurate and focused feedback increase. This has been borne out in Agile theory and practice for a long time. Even though usability testing and user experience were not necessarily top of mind, we’ve found (and it makes common sense) that folks provide better feedback once they have something real to engage with.
So what does this mean for a project team that buys into the idea of improving experiences through usability testing? It means that you need to plan not only for working it in logistically as part of your project plan, but also for technically designing your solution to minimize the cost of changes to the interaction layer.
If you don’t think about changes and plan for them, you may not leave enough time for the testing itself, nor for incorporating and making changes based on that feedback, or you may just make things too costly to change even if you leave some time for it. Any of these results will make it that much easier to fall back into leaving out usability testing (likely the status quo already) or—worse— burn resources on it without being able to make the needed changes because somebody’s “something important” is on the line.
For planning an iterative product schedule, you should think about leaving time at the end of each iteration—not just for stakeholder validation but for usability validation as well. You then need to incorporate feedback into your iteration/sprint planning for the next iteration/sprint.
I think “interaction layer” is a better term than “user interface layer” because it changes the thinking from the common “this thing we slap on top of our internal archictecture to give the user a view into our (beautiful) system” to “what people interact with as part of their larger lived experience.” In other words, I think it pushes us more into thinking about our systems as a (small) part of people’s lives, rather than people being a small part of what we’re concerned with.
In terms of technical design, I think planning for usability testing calls for a more nuanced approach to coupling within the UI itself. It’s rote best practice these days to talk about separating layers based on higher-level concerns (for example, UI/presentation, business/domain, data/persistence), but too oft en the UI layer becomes a boilerplate implementation of the Work With pattern (in Quince), surfacing the domain or database model more or less directly. Generally speaking, that’s a very suboptimal approach to interaction layer design, but specifi cally in the context of designing for usability testing, doing so lets us fall into a simplistic and tightly coupled approach to development within the interaction layer itself.
Thankfully, in recent years, there has been growing enlightenment in this area, and we now see broader usage of design patterns that address, in a somewhat tangential way, coupling in the UI. My favorite among these so far is the Model-View-ViewModel (MVVM) pattern, and chiefl y the View Model (VM) part of that, because in theory it better enables designing the interaction layer in ways that make sense to people while still providing for a more rationalized domain model.
The funny thing is that we came to the MVVM pattern out of our concern to isolate the underlying beloved domain models from pesky UI changes, but I’d suggest that a better way to think about it is that it enables us to more adroitly design for better and better experiences, which in the business world usually translates to increased customer and/or employee happiness, satisfaction, and loyalty; operational efficiency; and ultimately, a healthier bottom line.
The MVVM pattern goes some way toward facilitating changes to the interaction layer. The worse case is that you toss out the V and VM entirely for a particular set of interactions. More commonly, though (if you’ve done usability testing on lower-fidelity prototypes), you would need only to tweak those or change out, say, one implementation of a UX design pattern for another.
Thinking in terms of UX design patterns can help as well, not just in making better design choices initially, which is their most obvious benefit, but in being able to optimize designs based on testing. Using these patterns can help you better identify how to chunk your design, better identify natural boundaries (between patterns that need to play together), and thereby establish the right formalized boundaries in code (á la controls, pages, views, windows, dialogs, and so on). Then, within the context of particular patterns, you can more easily substitute alternative implementations that better suit your context, which would be increasingly informed by usability testing.
This doesn’t mean that you’ll make usability testing changes at no cost, but planning time for usability testing; planning time for incorporating usability testing; designing for loose coupling between not only higher-level application development concerns but specifi cally within the interaction layer through the separation of core functionality, structure and styling, and behavior; and using patterns to inform your design can help reduce the cost of these changes, which enables you to more feasibly incorporate usability testing feedback even as you build out the real thing and ultimately results in solutions that more closely fit the context of your problem domain. This means happier users and better business. And world peace.
More MSDN Magazine Blog entries >
Browse All MSDN Magazines
Subscribe to MSDN Flash newsletter
Receive the MSDN Flash e-mail newsletter every other week, with news and information personalized to your interests and areas of focus.