Deciding how to measure usability.
Estimating task times.
- Try out each task.
- Estimate the time it would likely take a user to complete. (Educated
guess for the first round of testing.)
- Consider time to complete for an experienced person.
- Consider the problems a typical participant might experience.
- The estimated time can be a range instead of a single value.
- With the second round of testing and subsequent rounds, you will have
established minimum and maximum times to complete each task.
- Mean (averages) from the first test round become the benchmark
to gauge future performance.
- If the times go up whatever changes made didn't work.
- If the times go down whatever changes made helped.
- Mean (averages) from the first test round become the benchmark
to gauge future performance.
General methodology in a formal usability test:
- Do a test on the original (control) site to establish benchmarks for task completion.
- Run the test.
- Make changes based on data and observations.
-
Run the same test again with a different group of
participants (treatment group).
- If their is a statistically significant increase in usability from the control group to the treatment group, you can reasonably be sure (assuming that the test design is sound) that the changes made, not some other random act of chance, are responsible for the improvement.
- Repeat.
Typical web site usability measures.
- Performance (quantitative) measures.
- Requires careful observation but not judgmental decisions.
- Examples:
- Counting how much time people take to do a task. (Time on task is the total time needed to carry out a task, whether the time is spent pondering instructions or waiting for web pages to load.)
- Counting errors.
- Counting how many times participants make the same mistake.
- Counting how much time participants take navigating menus.
- Percent of tasks completed.
- Ratio of successes to failures.
- Time spent on errors.
- Subjective measures (via post-test questionnaire and/or
comments made during the test, using thinking aloud protocol).
- Can be quantitative or qualitative.
- Examples:
- You can use a likert scale and ask people to judge how easy or difficult the site is to use. Then you would average collective responses of all participants.
- Observations of spontaneous comments of frustration.
- Observations of spontaneous comments of confusion.
- Observations of spontaneous comments of satisfaction.
Types of data that can be measured.
- Nominal data can only be named. It is derived from response in the form of words and phrases. You can count the number of responses that fit a particular descriptor.
- Ordinal data is data that is rank ordered. For instance if you give participants a list of items and ask them to rate the importance of each item.
- Interval data falls on an interval scale, in which numbers on the scale have equal distance-value from one another. Interval data has no absolute zero. (e.g. thermometer uses an interval scale).
- Ratio data is the most precise sort of data because it does have an absolute zero and like interval data its units are equally sized. (e.g. time, number of clicks, number of errors).
Measures of central tendencies.
- The mean is the arithmetical average of all data.
- The median is the value that occurs in the middle of the data when it is arranged in order.
- The mode is the most commonly occurring value in the data set.
Matching measures to goals and concerns.
- Performance measures should be directly tied to the set goals and concerns.