Exploratory Software Testing
James Whittaker is a Partner Development Manager at Bing, Microsoft. He is a former engineering director at Google who was responsible for testing Chrome, maps, and Google web apps. James is one of the best-known names in testing the world over and has authored several bestsellers on software testing. He also hosts a popular blog on MSDN.
This topic is an excerpt from the book Exploratory Software Testing: Tips, tricks, tours and techniques to guide test design.
In exploratory testing, testers may interact with the application in whatever way they want and use the information the application provides to react, change course, and generally explore the application’s functionality without restraint. It may seem ad hoc to some, but in the hands of a skilled and experienced exploratory tester, this technique can prove powerful. Advocates argue that exploratory testing allows the full power of the human brain to be brought to bear on finding bugs and verifying functionality without preconceived restrictions.
The drawback to exploratory testing is that testers risk wasting a great deal of time wandering around an application looking for things to test and trying to find bugs. The lack of preparation, structure, and guidance can lead to many unproductive hours and retesting the same functionality over and over. One can easily see that completely ad hoc testing is clearly not the best way to go about testing. Testers who learn about inputs, software environments, and the other things that can be varied during a test pass will be better equipped to explore their application with purpose and intent. This knowledge will help them test better and smarter and maximize their chances of uncovering serious design and implementation flaws.
To gain an understanding of how an application works, what its interface looks like, and what functionality it implements: Such a goal is often adopted by testers new to a project or those who want to identify test entry points, identify specific testing challenges, and write test plans. This is also the goal used by experienced testers as they explore an application to understand the depth of its testing needs and to find new unexplored functionality.
To force the software to exhibit its capabilities: The idea is to make the software work hard and to ask it hard questions that put it through its paces. This may or may not find bugs, but it will certainly provide evidence that the software performs the function for which it was designed and that it satisfies its requirements.
To find bugs: Exploring the edges of the application and hitting potential soft spots is a specialty of exploratory testing. The goal is purposeful, rather than aimless, exploration to identify untested and historically buggy functionality. Exploratory testers should not simply stumble across bugs, they should zero in on them with purpose and intent.
Software testing is complicated by an overload of variation possibilities from inputs and code paths to state, stored data, and the operational environment. Indeed, whether one chooses to address this variation in advance of any testing by writing test plans or by an exploratory approach that allows planning and testing to be interleaved, it is an impossible task. No matter how you ultimately do testing, it’s simply too complex to do it completely.
However, exploratory techniques have the advantage that they encourage testers to plan as they test and to use information gathered during testing to affect the actual way testing is performed. This is a key advantage over plan-first methods. Imagine trying to predict the winner of the Super Bowl or Premier League before the season begins. This is difficult to do before you see how the teams are playing, how they are handling the competition, and whether key players can avoid injury. The information that comes in as the season unfolds holds the key to predicting the outcome with any amount of accuracy. The same is true of software testing, and exploratory testing embraces this by attempting to plan, test, and replan in small ongoing increments guided by full knowledge of all past and current information about how the software is performing and the clues it yields during testing.
Exploratory testing is especially suited to modern web application development using agile methods. Development cycles are short, leaving little time for formal script writing and maintenance. Features often evolve quickly, so minimizing dependent artifacts (like pre-prepared test cases) is a desirable attribute. If the test case has a good chance of becoming irrelevant, why write it in the first place? Are you not setting yourself up for spending more time maintaining test cases than actually doing testing?
(For examples of the agile tools for exploratory testing in Visual Studio and TFS, see Exploratory testing using Microsoft Test Manager, Testing Windows Metro style apps Running on a Device Using the Exploratory Test Window, and Running Tests.)
It isn’t necessary to view exploratory testing as a strict alternative to script-based manual testing. In fact, the two can co-exist quite nicely. Having formal scripts can provide a structure to frame exploration, and exploratory methods can add an element of variation to scripts that can amplify their effectiveness. The best way that I have found to combine the two techniques is to start with formal scripts and use exploratory techniques to inject variation into them. This way, a single script may end up being translated into any number of actual exploratory test cases.
Traditional script-based testing usually involves a starting point of user stories or documented end-to-end scenarios that we expect our eventual users to perform. These scenarios can come from user research, data from prior versions of the application, and so forth, and are used as scripts to test the software. The added element of exploratory testing to traditional scenario testing widens the scope of the script to inject variation, investigation, and optional user paths.
Scenario-based exploration will cover cases that simple scenario testing will not and more accurately mimics real users, who often stray from the main scenario: After all, the product allows many possible variations. We should not only expect that they get used, we should test that they will work.
The idea behind scenario-based exploratory testing is to use existing scenarios much as real explorers use a map to guide themselves through a wilderness or other unfamiliar terrain. Scenarios, like maps, are a general guide about what to do during testing, which inputs to select, and which code paths to traverse, but they are not absolutes. Maps may describe the location of your destination but offer multiple ways to get there. Likewise, the exploratory tester is offered alternate routes and even encouraged to consider a wide range of possible paths when executing a scenario. In fact, that’s the exact purpose of this form of exploratory testing: to test the functionality described by the scenario, adding as much variation as possible. Our “map” isn’t intended to identify the shortest route, it’s intended to find many routes. The more we can test, the better; this leads to more confidence that the software will perform the scenario robustly when it is in the hands of users who can and will deviate from our expectations.
In general, a useful scenario will do one or more of the following:
Tell a user story
Describe a requirement
Demonstrate how a feature works
Demonstrate an integration scenario
Describe setup and installation
Describe cautions and things that could go wrong
Exploratory testers should work hard to ensure they gather as many scenarios as possible from all of these categories. It is then our task to follow the scenarios and inject variation as we see fit. It is how we choose to inject this variation that makes this task exploratory in nature and that is the subject we turn to next.
(For an example of using exploratory testing using the agile tools in Visual Studio and TFS, see How to: Start an Exploratory Test Session in Microsoft Test Manager.)
Suppose you are visiting a large city like London, England, for the very first time. It’s a big, busy, confusing place for new tourists, with lots of things to see and do. Indeed, even the richest, most time-unconstrained tourist would have a hard time seeing everything a city like London has to offer. The same can be said of well-equipped testers trying to explore complex software; all the funding in the world won’t guarantee completeness.
Tourism benefits from a mix of structure and freedom, and so does exploratory testing. There are many touring metaphors that will help us add structure to our exploration and get us through our applications faster and more thoroughly than freestyle testing alone. Many of these tours fit into a larger testing strategy and can even be combined with traditional scenario-based testing that will determine exactly how the tour is organized.
Any discussion of test planning needs to begin with decomposition of the software into smaller pieces that are more manageable. But testing features independently may preclude finding bugs that surface only when features interact with each other. Fortunately, the tourist metaphor insists on no such decomposition. Instead, it suggests decomposition based on intent rather than on any inherent structure of the application under test. Like a tourist who approaches her vacation with the intent to see as much as possible in as short a period of time as possible, so the tester will also organize her tours. An actual tourist will select a mix of landmarks to see and sites to visit, and a tester will also choose to mix and match features of the software with the intent to do something specific. This intent often requires any number of application features and functions to be combined in ways that they would not be if we operated under a strict feature testing model.
Guidebooks for tourists identify the best hotels, the best bargains, and the top attractions, without going into too much detail or overwhelming a tourist with too many options. The analogous artifact for exploratory testing is the user manual, whether it is printed or implemented as online help (in which case, I often call this the F1 tour to denote the shortcut to most help systems). For this tour, we will follow the user manual’s advice just like the wary traveler, by never deviating from its lead.
Every location that covets tourists must have some good reasons for them to come. For Las Vegas, it’s the casinos and the strip, and for Egypt it’s the pyramids. For exploratory testers finding the money features leads directly to the sales force. Sales folk spend a great deal of time giving demos of applications and are a fantastic source of information for the Money tour. To execute the tour, simply run through the demos yourself and look for problems. As the product code is modified for bug fixes and new features, it may be that the demo breaks and you’ve not only found a great bug, but you’ve saved your sales force from some pretty serious embarrassment.
As a boy growing up in the fields, meadows, and woods of Kentucky, I learned to use a compass by watching my older brother. The process was simple. Use the compass to locate a landmark (a tree, rock, cliff face, and so forth) in the direction you want to go, make your way to that landmark, and then locate the next landmark, and so on and so forth. As long as the landmarks were all in the same direction, you could get yourself through a patch of dense Kentucky woods.
The Landmark tour for exploratory testers is similar in that we will choose landmarks and perform the same landmark hopping through the software that we would through a forest. Choose a set of landmarks, decide on an ordering for them, and then explore the application going from landmark to landmark until you’ve visited all of them in your list. Keep track of which landmarks you’ve used and create a landmark coverage map to track your progress.
I was once on a walking tour of London in which the guide was a gentleman in his fifties who claimed at the outset to have lived in London all his life. A fellow tourist happened to be a scholar who was knowledgeable in English history and was constantly asking hard questions of the guide. He didn’t mean to be a jerk, but he was curious, and that combined with his knowledge ended up being a dangerous combination … at least to the guide. When applied to exploratory testing, this tour takes on the approach of asking the software hard questions. How do we make the software work as hard as possible? Which features will stretch it to its limits? What inputs and data will cause it to perform the most processing? Which inputs might fool its error-checking routines? Which inputs and internal data will stress its capability to produce any specific output?
FedEx is an icon in the package-delivery world. They pick up packages, move them around their various distribution centers, and send them to their final destination. For this tour, instead of packages moving around the planet through the FedEx system, think of data moving through the software. During this tour, a tester must concentrate on this data. Try to identify inputs that are stored and “follow” them around the software. For example, when an address is entered into a shopping site, where does it gets displayed? What features consume it? If it is used as a billing address, make sure you exercise that feature. If it is used as a shipping address, make sure you use that feature. If can be updated, update it. Does it ever get printed or purged or processed? Try to find every feature that touches the data so that, just as FedEx handles their packages, you are involved in every stage of the data’s life cycle.
Those who collect curbside garbage often know neighborhoods better than even residents and police because they go street by street, house by house, and become familiar with every bump in the road. However, because they are in a hurry, they don’t stay in one place very long. For software, this is like a methodical spot check. We can decide to spot check the interface where we go screen by screen, dialog by dialog (favoring, like the garbage collector, the shortest route), and not stopping to test in detail, but checking the obvious things (perhaps like the Supermodel tour). We could also use this tour to go feature by feature, module by module, or any other landmark that makes sense for our specific application.
Every city worth visiting has bad neighborhoods and areas that a tourist is well advised to avoid. Software also has bad neighborhoods—those sections of the code populated by bugs. Clearly, we do not know in advance which features are likely to represent bad neighborhoods. But as bugs are found and reported, we can connect certain features with bug counts and can track where bugs are occurring on our product. Because bugs tend to congregate, revisiting buggy sections of the product is a tour worth taking. Indeed, once a buggy section of code is identified, it is recommended to take a Garbage Collector’s tour through nearby features to verify that the fixes didn’t introduce any new bugs.
Museums that display antiquities are a favorite of tourists. Antiquities within a code base deserve the same kind of attention from testers. In this case, software’s antiquities are legacy code. Older code files that undergo revision or that are put into a new environment tend to be failure prone. With the original developers long gone and documentation often poor, legacy code is hard to modify, hard to review, and evades the unit testing net of developers (who usually write such tests only for new code). During this tour, testers should identify older code and executable artifacts and ensure they receive a fair share of testing attention.
In many peoples’ eye, a good tour is one in which you visit popular places. The opposite of these tours would be one in which you visited places no one else was likely to go. In exploratory testing terms, these are the least likely features to be used and the ones that are the least attractive to users. If your organization tracks feature usage, this tour will direct you to test the ones at the bottom of the list. If your organization tracks code coverage, this tour implores you to find ways to test the code yet to be covered.
Also known as the Clubbing tour, this one is for those folks who stay out late and hit the nightspots. The key here is all night. Exploratory testers on the All-Nighter tour will keep their application running without closing it. They will open files and not close them. Often, they don’t even bother saving them so as to avoid any potential resetting effect that might occur at save time. They connect to remote resources and never disconnect. And while all these resources are in constant use, they may even run tests using other tours to keep the software working and moving data around. If they do this long enough, they may find bugs that other testers will not find because the software is denied that clean reset that occurs when it is restarted.
For this tour, I want you to think superficially. Whatever you do, don’t go beyond skin deep. This tour is not about function or substance; it’s about looks and first impressions. During the Supermodel tour, the focus is not on functionality or real interaction. It’s only on the interface. Take the tour and watch the interface elements. Do they look good? Do they render properly, and is the performance good? As you make changes, does the GUI refresh properly? Does it do so correctly or are there unsightly artifacts left on the screen? If the software is using color in a way to convey some meaning, is this done consistently? Are the GUI panels internally consistent with buttons and controls where you would expect them to be? Does the interface violate any conventions or standards?
There’s always one person on a group tour who just doesn’t participate. He stands in the back with his arms folded. He’s bored, unenergetic, and makes one wonder exactly why he bothered paying for the tour in the first place. A Coach Potato tour means doing as little actual work as possible. This means accepting all default values (values prepopulated by the application), leaving input fields blank, filling in as little form data as possible, never clicking on an advertisement, paging through screens without clicking any buttons or entering any data, and so forth. If there is any choice to go one way in the application or another, the coach potato always takes the path of least resistance.
OCD testers will enter the same input over and over. They will perform the same action over and over. They will repeat, redo, copy, paste, borrow, and then do all that some more. Mostly, the name of the game is repetition. Order an item on a shopping site and then order it again to check if a multiple purchase discount applies. Enter some data on a screen, then return immediately to enter it again. These are actions developers often don’t program error cases for. They can wreak significant havoc.
Developers are often thinking about a user doing things in a specific order and using the software with purpose. But users make mistakes and have to backtrack, and they often don’t understand what specific path the developer had in mind for them, and they take their own. This can cause a usage scheme carefully laid by developers to fall by the wayside quickly.
Testing is complex, but effective use of exploratory techniques can help tame that complexity and contribute to the production of high-quality software.