User Testing
Kinds of User Tests
|
Here are three common kinds of user tests.
You’ll be doing formative evaluations with the prototypes you build in this class. The purpose of formative evaluation is to find usability problems in order to fix them in the next design iteration. Formative evaluation doesn’t need a full working implementation but can be done on a variety of prototypes. This kind of user test is usually done in an environment that’s under your control, like an office or a usability lab. You also choose the tasks given to users, which are generally realistic (drawn from task analysis, which is based on observation) but nevertheless fake. The results of the formative evaluation are largely qualitative observations, usually a list of usability problems.
A key problem with formative evaluation is that you have to control too much. Running a test in a lab environment on tasks of your invention may not tell you enough about how well your interface will work in a real context on real tasks. A field study can answer these questions, by actually deploying a working implementation to real users, and then going out to the users’ real environment and observing how they use it. We won’t say much about field studies in this class.
A third kind of user test is a controlled experiment, whose goal is to test a quantifiable hypothesis about one or more interfaces. Controlled experiments happen under carefully controlled conditions using carefully-designed tasks—often more carefully chosen than formative evaluation tasks. Hypotheses can only be tested by quantitative measurements of usability, like time elapsed, number of errors, or subjective ratings. We’ll talk about controlled experiments in a future reading.
Ethics of User Testing
|
Let’s start by talking about some issues that are relevant to all kinds of user testing: ethics. Human subjects have been horribly abused in the name of science over the past century. Here are some of the most egregious cases:
These cases have led to several reforms. The Nazi-era experiments led to the Nuremberg Code, an international agreement on the rights of human subjects.
Basic Principles (Belmont Report)
|
The Tuskegee study drove the US government to take steps to ensure that all federally-funded institutions follow ethical practices in their use of human subjects. The result was the Belmont Report, which describes three principles, summarized on the left.
Institutional Review Boards
|
In particular, every experiment involving human subjects must be reviewed and approved by an ethics committee, usually called an institutional review board. The review board is only required to review “research,” however, which is defined as work leading to generalizable knowledge (suitable for publication in a scientific conference or journal). The user testing you’re doing in this class would be characterized instead as practice.
But even though IRB doesn’t want your paperwork for projects that aren’t research, you should still follow its ethical guidelines.
Pressures on a User
|
Experiments involving medical treatments or electric shocks are one thing. But what’s so dangerous about a computer interface?
Hopefully, nothing—most user testing has a minimal physical or psychological risk to the user. But user testing does put psychological pressure on the user. The user sits in the spotlight, asked to perform unfamiliar tasks on an unfamiliar (and possibly bad!) interface, in front of an audience of strangers (at least one experimenter, possibly a roomful of observers, and possibly a video camera). It’s natural to feel some performance anxiety, or stage fright. “Am I doing it right? Do these people think I’m dumb for not getting it?” A user may regard the test as a psychology test, or more to the point, an IQ test. They may be worried about getting a bad score. Their self-esteem may suffer, particularly if they blame problems they have on themselves, rather than on the user interface.
A programmer with an ironclad ego may scoff at such concerns, but these pressures are real. Jared Spool, a usability consultant, tells a story about the time he saw a user cry during a user test. It came about from an accumulation of mistakes on the part of the experimenters:
When she started struggling with the first task, everybody in the room realized how stupid the task was, and burst out laughing—at their own stupidity, not hers. But she thought they were laughing at her, and she burst into tears. (Story from Carolyn Snyder, Paper Prototyping)
Treat the User With Respect
|
The basic rule for user testing ethics is respect for the user as an intelligent person with free will and feelings. We can show respect for the user in 5 ways:
Before a Test
|
Let’s look at what you should do before, during, and after a user test to ensure that you’re treating users with respect.
Long before your first user shows up, you should pilot-test your entire test: all questionnaires, briefings, tutorials, and tasks. Pilot testing means you get a few people (usually your colleagues) to act as users in a full-dress rehearsal of the user test. Pilot testing is essential for simplifying and working the bugs out of your test materials and procedures. It gives you a chance to eliminate wasted time, streamline parts of the test, fix confusing briefings or training materials, and discover impossible or pointless tasks. It also gives you a chance to practice your role as an experimenter. Pilot testing is essential for every user test.
When a user shows up, you should brief them first, introducing the purpose of the application and the purpose of the test. To make the user comfortable, you should also say the following things (in some form):
You should also inform the user and get their consent if the test will be audiotaped, videotaped, or watched by hidden observers. Any observers actually present in the room should be introduced to the user.
At the end of the briefing, you should ask “Do you have any questions I can answer before we begin?” Try to answer any questions the user has. Sometimes a user will ask a question that may bias the experiment: for example, “what does that button do?” You should explain why you can’t answer that question, and promise to answer it after the test is over.
During the Test
|
During the test, arrange the testing environment to make the user comfortable. Keep the atmosphere calm, relaxed, and free of distractions. If the testing session is long, give the user bathroom, water, or coffee breaks, or just a chance to stand up and stretch.
Don’t act disappointed when the user runs into difficulty, because the user will feel it as disappointment in their performance, not in the user interface.
Don’t overwhelm the user with work. Give them only one task at a time. Ideally, the first task should be an easy warmup task, to give the user an early success experience. That will bolster their courage (and yours) to get them through the harder tasks that will discover more usability problems.
Answer the user’s questions as long as they don’t bias the test.
Keep the user in control. If they get tired of a task, let them give up on it and go on to another. If they want to quit the test, pay them and let them go.
After the Test
|
After the test is over, thank the user for their help and tell them how they’ve helped. It’s easy to be open with information at this point, so do so.
Later, if you disseminate data from the user test, don’t publish it in a way that allows users to be individually identified. Certainly, avoid using their names.
If you collected video or audio records of the user test, don’t show them outside your development group without explicit written permission from the user.
Formative Evaluation
|
OK, we’ve seen some ethical rules that apply to running any kind of user test. Now let’s look in particular at how to do a formative evaluation.
Here are the basic steps:
Roles in Formative Evaluation
|
There are three roles in a formative evaluation test: a user, a facilitator, and some observers.
User’s Role
|
The user’s primary role is to perform the tasks using the interface. While the user is actually doing this, however, they should also be trying to think aloud: verbalizing what they’re thinking as they use the interface. Encourage the user to say things like “OK, now I’m looking for the place to set the font size, usually it’s on the toolbar, nope, hmm, maybe the Format menu…” Thinking aloud gives you (the observer) a window into their thought processes, so you can understand what they’re trying to do and what they expect.
Unfortunately, thinking aloud feels strange for most people. It can alter the user’s behavior, making the user more deliberate and careful, and sometimes disrupting their concentration. Conversely, when a task gets hard and the user gets absorbed in it, they may go mute, forgetting to think aloud. One of the facilitator’s roles is to prod the user into thinking aloud.
One solution to the problems of think-aloud is constructive interaction, in which two users work on the tasks together (using a single computer). Two users are more likely to converse naturally with each other, explaining how they think it works and what they’re thinking about trying. Constructive interaction requires twice as many users, however, and may be adversely affected by social dynamics (e.g., a pushy user who hogs the keyboard). But it’s nearly as commonly used in industry as single-user testing.
Facilitator’s Role
|
The facilitator (also called the experimenter) is the leader of the user test. The facilitator does the briefing, gives tasks to the user, and generally serves as the voice of the development team throughout the test. (Other developers may be observing the test, but should generally keep their mouths shut.)
One of the facilitator’s key jobs is to coax the user to think aloud, usually by asking general questions.
The facilitator may also move the session along. If the user is totally stuck on a task, the facilitator may progressively provide more help, e.g. “Do you see anything that might help you?”, and then “What do you think that button does?” Only do this if you’ve already recorded the usability problem, and it seems unlikely that the user will get out of the tar pit themselves, and they need to get unstuck in order to get on to another part of the task that you want to test. Keep in mind that once you explain something, you lose the chance to find out what the user would have done by themselves.
Observer’s Role
|
While the user is thinking aloud, and the facilitator is coaching the think-aloud, any observers in the room should be doing the opposite: keeping quiet. Don’t offer any help, don’t attempt to explain the interface. Just sit on your hands, bite your tongue, and watch. You’re trying to get a glimpse of how a typical user will interact with the interface. Since a typical user won’t have the system’s designer sitting next to them, you have to minimize your effect on the situation. It may be very hard for you to sit and watch someone struggle with a task, when the solution seems so obvious to you, but that’s how you learn the usability problems in your interface.
Keep yourself busy by taking a lot of notes. What should you take notes about? As much as you can, but focus particularly on critical incidents, which are moments that strongly affect usability, either in task performance (efficiency or error rate) or in the user’s satisfaction. Most critical incidents are negative. Pressing the wrong button is a critical incident. So is repeatedly trying the same feature to accomplish a task. Users may draw attention to the critical incidents with their think-aloud, with comments like “why did it do that?” or “@%!@#$!” Critical incidents can also be positive, of course. You should note down these pleasant surprises too.
Critical incidents give you a list of potential usability problems that you should focus on in the next round of iterative design.
Recording Observations
|
Here are various ways you can record observations from a user test. Paper notes are usually best, although it may be hard to keep up. Having multiple observers taking notes helps.
Audio and video recording are good for capturing the user’s think-aloud, facial expressions, and body language. Video is also helpful when you want to put observers in a separate room, watching on a closed-circuit TV. Putting the observers in a separate room has some advantages: the user feels fewer eyes on them (although the video camera is another eye that can make users more self-conscious, since it’s making a permanent record), the observers can’t misbehave, and a big TV screen means more observers can watch. On the other hand, when the observers are in a separate room, they may not pay close attention to the test. It’s happened that as soon as the user finds a usability problem, the observers start talking about how to fix that problem—and ignore the rest of the test. Having observers in the same room as the test forces them to keep quiet and pay attention.
Video is also useful for retrospective testing—using the videotape to debrief the user immediately after a test. It’s easy to fast forward through the tape, stop at critical incidents, and ask the user what they were thinking, to make up for gaps in think-aloud.
The problem with audio and videotape is that it generates too much data to review afterward. A few pages of notes are much easier to scan and derive usability problems.
Screen capture software offers a cheap and easy way to record a user test, producing a digital movie. It’s less obtrusive and easier to set up than a video camera, and some packages can also record an audio stream to capture the user’s think-aloud.
This material is a derivative of MIT's 6.813/6.831 reading material, used under CC BY-SA 4.0. Collaboratively authored with contributions from: Elena Glassman, Philip Guo, Daniel Jackson, David Karger, Juho Kim, Uichin Lee, Rob Miller, Stephanie Mueller, Clayton Sims, and Haoqi Zhang. This work is licensed under CC BY-SA 4.0. |