The study of emotions is becoming increasingly important in the practice of user-centered design. The analysis of emotional reactions to a product goes beyond traditional measures of usability; it puts more emphasis on what the user feels over the course of their experience of use of the product.
Certain software applications are able to analyze a person’s facial expressions so as to identify some of their emotions. Could these applications be an added value to usability testing? Yu Centrik recently had the opportunity to try the FaceReader emotional analysis application in the context of tests on a mobile application for measuring sodium levels in meals. We present here the results of this experiment. We also propose recommendations for integrating FaceReader into the usability evaluation process for interactive applications.
We asked 10 people between 25 – 50 years old to use a free application that calculates the amount of sodium in their food. They had to select their favourite meal and then determine the amount of sodium that they were ingesting when they ate that meal. Participants were guided by a facilitator. They were invited to verbalize everything they were thinking as they executed their task. A camera filmed their facial expressions for the analysis software.
The video recordings were analyzed by the FaceReader application, which detects different facial expressions and categorizes them based on six basic emotions (joy, sadness, surprise, fear, anger, disgust). A calibration profile for each participant had been determined previously in FaceReader. The calibration allowed the application to adapt to the facial anatomy of each individual and to minimize analysis errors.
For the purpose of our test, given the large amount of variation in facial expressiveness between participants, an expression was considered an emotion when it produced a peak in the graphic of more than four times the amount of background noise detected by the software. The results obtained with the software were compared to our own manual evaluation (identification and categorization) of expressions observed in the videos. The goal of this comparison was to discover other emotion models (patterns), in addition to the six base emotions, which could be used for future usability tests.
Among 277 emotions identified by FaceReader over all of the tests, we evaluated that 30 emotions (10.8%) were correctly identified and categorized, whereas 25 emotions (9.0%) were mis-categorized. In addition, three emotions (1.1%) corresponded to false negatives (emotions that were not detected by the software) and 219 emotions (79.1%) corresponded to false positives (incorrectly identifying a non-emotion as an emotion).
Our evaluation wasn’t able to identify any emotional models that would have been satisfactory for an effective use of the FaceReader tool.
The high rate of false positives can be attributed to two factors: the calibration quality supported by FaceReader and the ability of FaceReader to analyze videos in which the participant doesn’t constantly look toward the camera or communicate verbally with someone.
The calibration supported by FaceReader is limited: the software analyzes a series of images or a video clip and automatically creates a profile for each participant. This calibration sometimes introduces a bias for certain participants. People who always have their mouth open, for example, can be interpreted by the software as always smiling. For a more accurate analysis of facial expressions, it would be important to calibrate the analysis software by manually identifying the base expressions of each participant.
In addition, certain movements or orientations of the head introduce false positives. Lowering the eyes to look at the keyboard or to get closer to a text is often recorded as an emotion. Hand movement in front of the face or simply speaking can equally bias results.
The quantity of mis-categorizations might be justified by the context of the study, which did not lend itself to strong emotions. FaceReader succeeded in detecting the existence of an emotion in most cases where one was expressed, but these cases were generally categorized as the wrong emotion. We had access to all of the visual markers used by the software (raised eyebrow, lowered eyebrow, eye open, etc.). It would have been interesting to be able to define other emotions based on new combinations of markers to avoid the laborious process of reading the raw data.
We tested the FaceReader application for emotion analysis in the context of a typical usability test: execution of tasks on a software application, guidance by a facilitator and then discussion with the facilitator. We established that in the context of typical usability tests, the software has difficulty precisely detecting the participants’ emotions. Furthermore certain emotions that are important for usability evaluation, such as confusion, are not tracked.
We recommend using FaceReader in a context where participants are not exposed to any distractions, so as to minimize the detection of events that are not relevant to the tests. A video game or other strongly immersive experience would probably be an ideal subject of study. In addition, the absence of a facilitator could equally reduce the distraction caused by the facilitator’s interaction with the participant.
It would also be interesting to extend the abilities of FaceReader to allow manual calilbration – in particular to be able to tell the software to ignore certain expressions or to consider certain facial movements as extra emotions.
In a similar vein, for usability testing it would be interesting to research new facial expression markers that could correspond to states other than the six base emotions. These could include states of stress, confusion, high mental workload or the feeling of strong engagement (mental flow).