A Similar Basis for Judging Confidence in Monkeys and Humans

A variety of animals have been shown to make confidence judgments about their own knowledge or performance, but the mechanism for these metacognitive decisions is still debated. Much of the work on animal metacognitive abilities has been to rule out alternative, non-introspective mechanisms such as associative learning, behavioral cue association, or environmental cue association. However, the human metacognition literature has shown that even humans often do not use true introspection or directly access their own memory to make metacognitive judgments–they sometimes use heuristic strategies based on perceptual salience. Often these heuristic strategies are inaccurate and cause metacognitive errors. Here we offer a new route to testing animal metacognitive abilities by comparing the fragility of human and animal metacognition. We show that monkeys’ confidence judgments, like those of humans, are at least partly based on salient perceptual features of the stimuli and susceptible to faulty heuristics.

The study of animal metacognition has often been from the view of testing self-reflective cognition, or introspection (Basile, Schroeder, Brown, Templer, & Hampton, 2015;Hampton, 2001;Smith, Shields, & Washburn, 2003). These tests measure whether animals can make judgements about their own performance, or confidence judgments. Studies have tested if animals can opt-out of difficult trials, seek information when ignorant, and make bets on their own performance (Beran & Smith, 2011;Call & Carpenter, 2001;Hampton, 2001;Hampton, Zivin, & Murray, 2004;Kornell, Son, & Terrace, 2007;Rosati & Santos, 2016;Smith et al., 1995) The first task developed to investigate animal metacognitive abilities was the "opt-out task." This task was originally tested in dolphins (Smith et al., 1995), but has since been adapted to test a variety of non-human animals with varying success, including macaques and capuchins (Beran, Smith, Coutinho, Couchman, & Boomer, 2009;Shields, Smith, & Washburn, 1997). In the opt-out task, subjects are first trained to make a perceptual discrimination. In the case of the dolphins, they had to make a judgment about whether a pitch was above or below a frequency threshold, by pressing one of two paddles. If they chose correctly, they received a food reward, and if they chose incorrectly they receive a time out. Once trained on this primary task, an additional paddle was added which allowed the animal to skip the trial and move onto a new trial, similar to an "I don't know" response. These opt-out trials are used as a way of assessing the animal's metacognitive ability. The dolphin chose to skip hard trials, the ones near the discrimination boundary, suggesting that they know when they are unsure of an answer. This dolphin, numerous macaques, and humans have all been shown to pass these types of tasks Lyons & Ghetti, 2011;Shields et al., 1997;Smith et al., 1995).
Other tasks have since been designed to separate the confidence choice from the primary discrimination task, in an effort to induce reflective uncertainty (Hampton, 2009;Kornell et al., 2007). One such task is the gambling task (Ferrigno, Kornell, & Cantlon, 2017;Kornell et al., 2007;Morgan, Kornell, Kornblum, & Terrace, 2014). In this task monkeys are first trained on a primary task, either a perceptual task such as a line length discrimination (e.g., touching the largest of visually presented lines), or a memory task such as a match-to-sample (MTS) task. Once trained on the primary task, a betting screen is added. In this betting screen, the subject has the option of a large bet which either gives or takes away three tokens in a token bank based on the subject's accuracy on the primary task, and a low bet option which gives one token regardless of accuracy on the primary task. When the subject has gained enough tokens in "the bank" it receives a food reward. This betting screen is presented either after responding to the primary task (e.g., matching on the MTS task), but before receiving any feedback (retrospective judgment) or immediately after seeing the sample before making the primary task response (prospective judgment). Monkeys' performance on this task has shown that they tend to use the high bet more often when they correctly answered the primary task.
These studies all show remarkable capabilities of non-human animals to make judgments about their own confidence, or information states. However, metacognition is not all-or-nothing (Hampton, 2009;Kornell, 2014;Smith, Couchman, & Beran, 2012). There are many different ways of evaluating one's own cognitive state, from the most basic -using associative learning cues based on prior rewards, to relying on internal cues beyond mere associative strategies, to explicitly and analytically judging one's own memory or cognitive abilities (Hampton, 2009;Kornell, 2014). Questions about the evolutionary origins of metacognitive abilities require understanding not only whether subjects have metacognitive abilities but also how these judgments are made.

Proposed Mechanisms of Animal Metacognitive Judgments
There has been much debate about what mechanisms underlie non-human animals' metacognitive judgments. Often the goal of this work is to rule out alternative, non-introspective accounts to test if nonhuman animals are capable of introspection (Basile et al., 2015;Hampton, 2001;Smith et al., 2003). One of the most significant debates has been whether animals make their metacognitive decisions using associative learning (Beran, Perdue, & Smith, 2014;Carruthers, 2008;Couchman, Coutinho, Beran, & Smith, 2010;Le Pelley, 2012;Smith, Beran, Couchman, & Coutinho, 2008;. A proposed mechanism for associative learning is response competition. In the "optout" style metacognition tasks, subjects see both the primary responses and the opt-out button on the same screen. In these tasks the animal is required to make some type of decision about a stimulus (e.g., does the dot array contain greater than or less than N number of dots). It is possible that subjects use the opt-out button not to signify they are unsure, but rather as an intermediate or middle response (when the number of dots is near the category boundary). This would lead to a greater number of overall rewards (i.e., when an animal is likely to get a trial wrong and receive a time out, it will receive a food reward quicker if it chooses to skip the trial and move to an easier trial). This direct link between reward rate and the "optout" button allows for the possibility that associative learning may be driving the use of the "opt-out" button (although see Beran et al., 2009, for evidence against this account).
However, not all tasks have the opt-out button on the primary response screen, and thus this specific associative response competition strategy cannot account for animals' metacognitive responses across studies (e.g., prospective or retrospective betting paradigms; Morgan et al., 2014). However, other associative accounts have been proposed for these tasks (Le Pelley, 2012). Since reward rate increases with more accurate metacognitive judgments, it is possible that subjects associate a particular response pattern with reward and are not accessing internal uncertainty but rather stimulus-response associations. Closely controlled studies requiring animals to make prospective judgments (Hampton, 2001;Morgan et al., 2014), showing immediate transfer to novel tasks (Kornell et al., 2007), and dissociating responses and rewards  have shown that associative strategies are unlikely to account for all of animals' responses, but alternative associative explanations of animal performance remain in play (Couchman et al., 2010;Le Pelley, 2012;Smith, Couchman, & Beran, 2012).
Another associative account of animal metacognitive performance is that animals use environmental cue association or behavioral cue association to make their confidence judgments. For example, an animal could use a cue like the length of delay between the sample and the response screen (environmental cue association) or their own response time on the primary task (behavioral cue association) as a cue to their own accuracy without monitoring their internal states at all (Hampton, 2009;Smith, Zakrzewski, & Church, 2016). Although many well designed, and closely controlled studies have tried to rule out these alternative, non-introspective strategies, it does not follow that the animals must be introspective if they still make accurate confidence judgments (Kornell, 2014). In fact, much work has found that even humans' confidence judgments do not need to be introspective (Koriat, 1997;Kornell, Rhodes, Castel, & Tauber, 2011). Humans' confidence judgments have been shown to be based on heuristic cues which might not be considered sufficiently "metacognitive" in the animal literature, such as response time (Zakay & Tuvia, 1998) or stimulus features (Rhodes & Castel, 2008, 2009).

Accounts of Human Metacognitive Decisions
In the human metacognition literature, the focus has not been on attempting to rule out associative strategies for metacognition (this is an undue burden in the animal literature) but rather the varied mechanisms underlying metacognitive judgments. There are two main accounts of humans' metacognitive decisions: Direct Access Accounts and Inferential Accounts.
Direct Access Account. The direct access account posits that humans have direct access to their own memory traces and use this information directly to make metacognitive decisions (King, Zechmeister, & Shaughnessy, 1980). This is similar to some versions of the introspection account of animal metacognition (Basile et al., 2015;Hampton, 2001;Smith et al., 2003). Although this account is quite intuitive it cannot explain a variety of phenomenon in the human metacognition literature such as the effects of past performance on confidence (Martí, Mollica, Piantadosi, & Kidd, 2018) or factors like fluency which differentially affect accuracy and confidence (Ferrigno et al., 2017;Rhodes & Castel, 2008, 2009. Inferential accounts for human metacognitive judgments. In contrast, to the direct access account, inferential accounts suggest that humans use cues or heuristics to make judgments about their own memory rather than directly accessing the contents of their memory) or traces of memory (Koriat, 1997). These heuristics typically work because the cues used are thought to be effective in generating accurate metacognitive judgments because they often correlate with actual memory performance in real world situations. These cues can come in a variety of forms such as visual or auditory ease of perceiving/processing, retrieval ease (or how easy an answer comes to mind), familiarity of the stimuli, or many other "fluency" cues (see Alter & Oppenheimer, 2009, for review).
Although these cues are usually quite accurate, they can lead to metacognitive errors. Researchers have identified a number of "metacognitive illusions." For example, one study presented human subjects with a list of sequentially words to remember (Rhodes & Castel, 2008). For each word they rated their confidence about whether they would remember the word on a later recall task. Some of the words were presented in easy to see large print font and others were presented in a harder to see smaller font. They found that the font size affected subjects' confidence even though it did not affect their recall accuracy. That is, subjects did not actually remember the large font better but they thought they would. Similar studies have shown that auditory fluency (Rhodes & Castel, 2009), retrieval fluency or how easy it is to bring to mind an answer (Benjamin, Bjork, & Schwartz, 1998), and priming on related concepts (Schwartz & Metcalfe, 1992) all affect confidence choices in humans. Metacognitive illusions show that humans use heuristic cues like visibility and loudness to determine the strength of their memory rather than directly accessing their memory traces.

Metacognitive Illusion in Monkeys
In order to test if monkeys make metacognitive decisions in a similar way as humans, we adapted a metacognitive illusion to test with monkeys (Ferrigno et al., 2017). The monkeys were first trained on a prospective and retrospective betting paradigm (see Figure 1). In this task the subjects' primary task was a match-to-sample task using dark grey line drawings. On some sessions the monkeys were required to make prospective betting judgments (make a bet about whether they will get the answer correct on the next screen), or retrospective judgments (make a bet on whether they answered correctly). As with previous studies, we found that the monkeys made more high bets when they got the answer correct, and more low bets when they were incorrect. Figure 1. Trial protocol for retrospective and prospective conditions. The trial starts with a start button, followed by the sample image. In the retrospective condition subjects are then shown a screen with four images: one target and three distractors. After choosing one of these the subject is taken to the betting screen. In the prospective condition, the betting screen is presented directly after the sample, and the subject is shown the target and distractors after making a bet.
Next, we added a fluency manipulation, or the "metacognitive illusion." To do this we manipulated the contrast of the sample and the distractor images in the match-to sample task. Half of the trials were in low fluency, or low contrast (light grey on a white background), and the other half were in high fluency, or high contrast (black on a white background). What we found was that the monkeys were more likely to make high bets on high fluency trials compared to low fluency trials. In contrast there was no effect of fluency on accuracy. Additionally, the effects of fluency on risk held even when controlling for accuracy. These results showed a qualitatively similar result to the metacognitive illusions in humans (see Figure 2). This work is important for two reasons. The first is that this work provides strong evidence that animals' metacognitive judgments are not just based on associative learning through reinforcement and punishment. The effects of this metacognitive illusion actually made the animals worse at the task (less accurate betting) and they received fewer overall rewards when basing their judgments on this noninformative, experimentally manipulated cue. According to the associative learning account animals should be maximizing their rewards, not using cues which decrease their total rewards. Secondly, this work suggests that like humans, monkeys use heuristics to make their metacognitive judgments, and that these heuristics are at least partially based on similar cues, like visual fluency.  Rhodes and Castel (2008). B. The effects of a fluency illusion (image contrast) in monkeys. Similar to humans, fluency affected the proportion of high bets, but not accuracy on the match-to-sample task. Adapted from Ferrigno et al., 2017. Error bars represent the standard error of the mean.

Effects of Additional Visual Features
Although this work showed that the manipulated visual feature of contrast affected the monkeys' confidence (but not their accuracy), the effect size was small. It is possible that monkeys used visual features of the stimuli beyond contrast to make their confidence judgments. To examine what other features of the stimuli might be affecting the monkeys' confidence judgments, we tested whether image complexity (calculated using the Sobel operator in MATLAB, which calculates the number of edges in an image), image uniqueness (or how different from all the other stimuli a given sample was), and average similarity between the sample and the distractors (both calculated using the pixel sum of squared differences) affected confidence judgments in monkeys. Throughout these analyses incorrect trials are excluded. Thus, any effects seen on monkeys' confidence judgments is cannot be explain by a riskaccuracy correlation.
To test whether image complexity predicted confidence, we conducted a logistic regression using complexity as a predictor of risk. We found an effect of complexity on risk such that as complexity increased the monkeys were more likely to bet 'high confidence' in both the retrospective and prospective trials even when controlling for response time (Retrospective:  = 0.15, p < .001; Prospective:  = 0.17, p < .001; see Figure 3). Next, we conducted a logistic regression with uniqueness as a predictor of risk, controlling for response time. We found that risk was positively correlated with image uniqueness (Retrospective:  = 0.18, p < .001; Prospective: 0.20, p < .001), indicating that the monkeys made higher bets as the sample image became more unique compared to all of the other stimuli in the database. Lastly, we tested if visual features of the distractor array affected judgements. Given that similarity between a sample and distractors can only be judged once the subject has seen the distractors, we included only the retrospective trials when calculating these results. Our results show that confidence decreased as similarity between a sample image and distractors increased even when controlling for response time ( = -0.16, p < .001).
These results show that additional visual features of stimuli affect monkeys' confidence judgments. Although these effects of visual features (controlling for accuracy) may seem counter intuitive, they do in fact correlate with accuracy (Complexity:  = 0.28, p < .001; Uniqueness:  = 0.34, p < .001; Similarity:  = 0.28, p < .001). This shows that even though the effects of these visual features are seen beyond the effects of accuracy, in most cases the heuristic of using these visual features would work quite well at predicting their own accuracy. Figure 3. Effect of additional visual features on confidence. A-C. In the retrospective condition, complexity and uniqueness were positively correlated with the proportion of high-risk bets, such that as the images were more complex, or more unique (compared to all other images in the image back) subjects were more likely to make high bets. Average similarity between the sample and distractors was negatively correlated such that as the sample and distractor images became more similar, subjects were less likely to make high bets. D, E. In the prospective condition complexity and uniqueness were positively correlated with the proportion of high-risk bets. For each regression only correct trials are included to control for any effects of accuracy. The blue lines are fits from logistic regressions and the gray regions are the 95% confidence intervals. The data was binned into 10 groups per variable for visualization purposes. The error bars represent the SE of the mean.
Lastly, it is important to note that these measures are only rough measures of a small subset of visual features which could be used to make confidence judgments. There are a variety of additional visual fluency cues that have been shown to effect humans' confidence judgments (see Alter & Oppenheimer, 2009). Furthermore, studies have also shown that humans use many other types of (nonvisual) cues, such as conditions during testing (e.g., stimulus duration; Busey, Tunnicliff, Loftus, & Loftus, 2000), how easily an answer comes to mind (Benjamin et al., 1998), or success on previous recent trials (Martí et al., 2018). Together, the data suggest that humans and monkeys extract proxies for salience from the environment that they use as heuristics to judge the likely fidelity of their own memory and cognition.

Conclusion
Human and animal metacognitive parameters can be revealed by looking at the types of errors they make and the metacognitive illusions they suffer. In the current study, monkeys used the visual contrast, complexity, and uniqueness to predict their memory for the stimuli. Subjects reported high confidence when those cues were high. Most of the time, heuristic cues to metacognition are successful. Brighter, louder, higher contrast, visually unique, and more complex stimuli perceptually pop out in sparse environments and will likely be remembered. However, this is not a perfect relation and under conditions with many interfering and salient stimuli this rule breaks down and subjects do not remember more salient stimuli better -yet they continue to think that they will.
Although this work shows that animals sometimes use cues like saliency of stimulus features to make confidence judgments this does not mean that animals are incapable of accessing their own internal states directly. For example, we know that humans use saliency to make confidence judgments, yet it is also clear that humans are capable of direct self-monitoring, e.g. asking "what was I thinking?" (Kornell, 2014). Thus, monkeys' reliance on saliency cues do not provide evidence that monkeys are not capable of self-directed monitoring or directly access their own memory states.
Lastly, a major finding of this work is the dissociation of the animals' metacognitive responses from rewards. The association between rewards and uncertainty responses has led to problems interpreting results because it leaves open the possibility of simpler associative strategies (Smith et al., 2008). Many studies have tried to rule out these alternative strategies by eliminating the reward contingencies in uncertainty responses through methods like deferred feedback  or immediate transfer (Kornell et al., 2007). Here we take an alternative approach and show that rewards and uncertainty responses can be dissociated in another way. The animals were shown to base their judgments on visual features, like experimentally manipulated image contrast, which decreased metacognitive accuracy and therefore led to a decrease in overall rewards. This violates the basic foundations of both associative models of animal metacognitive decisions and behavioristic models of animal behavior more broadly (Jozefowiez, Staddon, & Cerutti, 2009;Le Pelley, 2012;Staddon, 2001). These associative models are based on the ideas that "when confronted with a stimulus, the subject emits the behavior which is associated with the higher payoff" (Jozefowiez et al., 2009, p. 30). Our results instead show that monkeys base their metacognitive confidence choices on features which do not always lead to a greater number of rewards. Although the reliance on visual cues of salience to make metacognitive judgments is not an explicit (or direct) mechanism of metacognition, it is "introspective" in that the goal is to provide information to the animal about the status of internal processes so that parameters of internal functioning can be factored into behavior, even when it does not lead to a direct increase in rewards.