Applied Computing AC303/AC507 Notes 7

Human-Computer Interaction & Usability Engineering

Evaluation And Evaluation Techniques

 

Evaluation

What Is Evaluation?
Evaluating is finding or judging the quality of a system, taking into consideration the system hardware, the functionality of the design and the user interface design. This involves gathering information about the usability or potential usability of the system.

Why Evaluate?
As user interface designers, we evaluate the interface to improve it, to compare it and to ensure that it is usable in real-life settings. Make sure it is adequate for the purpose it’s been designed for and that it meets the required standards.

When To Evaluate?
Evaluation can occur at many different points in the design and development cycle. There are two kinds of evaluation - formative and summative.
Formative evaluation goes on throughout development and is used to monitor and change the design before the end. This type of evaluation is highly specific.
Summative evaluation is more general and is done at the end of stages of development. It records, describes and summarises each stage.

Who Does Evaluation?
The evaluation process can be carried out by the user, the designer or an outsider. It can be expensive to get the user involved in the evaluation process, but the designer can only go so far. It is unlikely that he or she will be a typical user. An outsider brought in to do an evaluation can be a domain expert (someone who knows a lot about the subject) or a professional evaluator (someone who knows how to carry out an evaluation from a technical point of view).

Measuring

What Do We Measure?
When evaluating interface design, we need to measure things about the user environment and the user. In the environment, we can measure the presence and availability of information.

Examples:
Colleagues,relevant professionals, manuals, on-line help, other pieces of software.

We can also measure the speed of response, the size of the screen, ergonomic factors and compliance with guidelines.
When it comes to the user, we can measure things like their knowledge, attitudes and values. And what their long and short term goals are. In terms of how they use the system, it is possible to take useful measurements.

Examples:
In computing, we can measure keystrokes, commands used, actions done, how long it takes to perform tasks, how quickly each action is learned (and retained) and how many errors occur.

We can also get an estimate from the user of how they think they performed.

What Questions Do We Ask?
Typical evaluation questions to gain measurements are:
· How easy is the system to learn and remember?
· How well do users accomplish task goals when using the system?
· How satisfied are users with the operation of the system?
· How well does the system measure up against current knowledge about design and performance?

How Do We Measure?
There are 3 ways we can take measurements. We can get a device to do it, we can bring in an observer or we can get the users to do it themselves.

Who Do We Do Measurements On?
We can measure users who rarely use the system, intermittent users and expert users. Within these categories the users may be new or experienced.

What Type Of Measurements?
There are two types of measurements that can be obtained - qualitative and quantitative. Qualitative concerns opinions and the quality of a product and this is expressed in words. This kind of measuring takes place early on in the development cycle and helps the designer get at what’s really going on between the user and the machine. It also helps identify the relevant quantitative measures.

Quantitative measuring provides data in a consistent way. This allows performances to be compared and statistical analysis to take place. Gathering quantitative data requires getting the user to tell you what you want to know. You can then manipulate conditions and measure performance changes directly.

Methods For Collecting Data - Observation

Collecting data by observing people using the system can be done in a lab or in the field i.e. the place where the system will actually be used. Observing users in their normal environment is an unobtrusive way of seeing how the system is operated in a “real life” setting and these observations can be recorded. However, it is important that the observer tries to blend into the background so that the user is as unaware as possible of being monitored.

Strengths Of Observation Method
As well as being unobtrusive (hopefully) the great strength of the observation method for collecting data is that extraneous factors can be independently recorded in a realistic setting.

Examples:
Factors such as: what the environment is like; how many interruptions the user has to put up with; what the workplace atmosphere is like (noisy, stressful or calm); what other systems are in use; what other distractions there are; and what the management style is like.

Limitations Of The Observation Method
It may be difficult for the observer to be as unobtrusive as we would like. Users may feel they are being spied on. A further limitation of this method is that there are unspoken things that are factors for the user but which are missed.

Examples:
This method doesn’t record the intentions or feelings of the user. If there is a sense of satisfaction with all or part of the system, this is also missed.

This method is good for recording what is happening, but not why.

Methods For Collecting Data - Think-Aloud Protocol

Think-aloud protocol involves getting the user to make spoken observations while using the system. Again, this can be done in the lab or the field and can be used to obtain a much wider range of information than observation alone.

Strengths Of Think-Aloud Protocol
This method of collecting data is the closest we can get to measuring what’s going on in the user’s head. It enables us to record many things which could otherwise be missed.

Examples:
We can record: the user’s intentions; the nature of the puzzles encountered; why errors occurred; why wrong selections were made; and how much the user has understood.

This method is immediate and good for finding problems. It is particularly effective when used in the early stages and with new users.

Limitations Of Think-Aloud Protocol
Using the think-aloud protocol method is unavoidably obtrusive because the user has to speak out and so is constantly aware that performance is being monitored. Therefore, this method is most likely to change the behaviour of the user (known as the Hawthorne Effect). Another limitation is that users often find it difficult to put thoughts into words while trying to solve a difficult problem. Finally, as with observation, this method doesn’t give answers to pre-determined questions.

Example Recording Sheet
· What does the subject notice?
· What is the subject thinking now?
· What is the subject trying to do (goal)?
· How is the subject trying to do it (plan)?
· Is this achieved?
· Problems encountered.
· Suggestions made by the subject

Methods For Collecting Data - Feature Checklist

Creating a features checklist involves listing every possible function that is available within the system which is being evaluated. The users then tick off each feature as and when they use it. This lets us see which functions are being used. And, if they’re not being used, are they difficult to find or just not needed?

Strengths Of Feature Checklist Method
This method produces a quantitative estimate of features used. This enables us to check the knowledge of the user and how needed various features are.

Limitations Of Feature Checklist Method
Unfortunately, creating and using feature checklist can be long and cumbersome. Users can find it very boring (and obtrusive) to keep the checklist up to date.

Methods For Collecting Data - Structured Diary

Another way of collecting data is to get users to keep a structured diary. This involves getting them to describe, in writing, any problems encountered.

Strengths Of Structured Diary Method
This method helps uncover long term usage problems. It is also an inexpensive way to collect highly specific information.

Limitations Of Structured Diary Method
Unlike some other methods, you must have a working system to use structured diaries. Also, it may be difficult to get users to complete these and training may be required.

Methods For Collecting Data - Questionnaires

To collect data from a large number of people, it is helpful to send out questionnaires for people to fill in anonymously and in their own time. Questionnaires are made up of structured and uniform questions designed to make it as easy as possible to answer. Questionnaires are used to gather information which is both objective and subjective. Objective information is factual and subjective information is based on opinions, beliefs, feelings, etc.

Examples:
Objective - How many children do you have?
Subjective - How interesting did you find the game?

Strengths Of Questionnaire Method
The good thing about questionnaires is that it is a non labour intensive way of gathering “course” data from large numbers of people. Direct questions can be asked and the responses pre-categorised. The data gathered can then be statistically analysed.

Limitations Of Questionnaire Method
When producing questionnaires, it is very easy to ask wrong or bad questions. Also, the data provided is “course” and there is no interaction for elaboration purposes. It is also difficult to enthuse people into completing and returning questionnaires. It is therefore common to have a low response which can mean that a bias is introduced.

Methods For Collecting Data - Interviewing

Interviewing is carried out for the purpose of information gathering in an environment where the interviewer controls the situation, sets the purpose of the interview and controls the pacing. An interview may involve just 2 people (interviewer and interviewee), or an entire group of people.

Generally speaking, an interview will be successful if the interviewee is clear about the aim of the interview and has access to the right information to answer the questions. Also, the interviewer should remain neutral (though interested) throughout the interview and have a debriefing session with the interviewee at the end.

Strengths Of Interviewing Method
Carrying out interviews is a good way to collect high quality data. The interviewer can ask direct questions and can probe further where required. This interactive approach allows the interviewer to gain more information and uncover any hidden agendas.

Limitations Of Interviewing Method
This method is expensive, time consuming and requires highly skilled individuals to be involved. Also, there is a need to ensure a standardisation amongst interviewers. As with questionnaires, there is the problem that a response bias may emerge.

Methods For Collecting Data - Experiments

Experiments are used to establish the general principles of interface design. These are carried out in the lab setting and allow specific aspects of the interface to be measured. However, to be of any value, experiments must be properly prepared.

Examples:
Decide where and when to conduct the experiment and who will be included. Decide what to test, what to control for and how to test and log the data. Make sure the instructions are right and described at the right level of detail. Make sure the time of the tasks is correct and the data logging tools are adequate. Ensure that all the data you want is collected.

Strengths Of Experiments Method
Experiments allow us to extract concise and convincing results. We can use this method to measure very specific aspects of a system and we can generalise over many users. It is also possible to isolate out any individual differences.

Limitations Of Experiments Method
This method is very expensive to do and it is difficult to isolate relevant variables. As mentioned before, the value of the results can be questionable, and the ecological validity of such experiments is also doubted.

On to the next chapter ... Guidelines and standards