Mechanisms Underlying Cognitive Bias in Nonhuman Primates

Recent research in nonhuman animals highlights the exciting possibility that performance on cognitive bias tasks might indirectly measure an individual’s subjective, affective state. Subjects first learn to perform a conditional discrimination task with two differentially reinforced responses, and then intermediate, unreinforced stimuli are introduced. Differences in affective state have been related to changes in the response to these ambiguous stimuli in a variety of species. However, some research suggests that other learning effects may be influencing performance. In the current study, rhesus and capuchin monkeys were trained on a 40-step psychophysical discrimination task in which opposing responses made at opposite ends of the discrimination spectrum resulted in one or four pellets. Once at criterion, intermediate levels were introduced. With continued exposure – and no manipulation of affective state – subjects shifted from an ambiguous classification of intermediate stimuli towards classifying the majority of these probe stimuli as the less positive response option. When the reinforcement contingencies were switched, the biased responding on the task also shifted significantly. These findings suggest that other mechanisms, such as hormonal changes and/or contrast effects, may also underlie biased responding. As this field develops, it is critical that the mechanisms underlying cognitive bias in nonhuman animals be thoroughly investigated.

One of the goals of providing enrichment to captive animals is to improve and optimize animal welfare and well-being (Maple & Perdue, 2013).A wide variety of enrichment techniques, including cognitive enrichment, have been developed in zoos, laboratories and other animal facilities.One of the biggest remaining challenges is finding reliable and accurate ways of measuring an animal's response to such enrichment, as much of this experience is subjective and hard to measure objectively.A growing field has begun to investigate the relationship between emotion and cognition, and research indicates that each can influence the other in a causal manner (Mathews & MacLeod, 2002), and might offer insights into an animal's subjective response to various aspects of the environment, including enrichment.The term "cognitive bias" recently has been applied to describe the influence of an individual's emotional state, sometimes referred to as affect, on cognitive processes.In humans, cognitive biases have been studied in areas such as judgment making, attention, and memory (e.g., Mathews & MacLeod, 1994;Paul, Harding, & Mendl, 2005).In general, individuals in a positive affective state, or a "good mood," tend to interpret ambiguous information in a more positive manner, be more attentive to positive information when presented with both positive and negative stimuli, and better remember information that has a more positive valence.The opposite is true for individuals in a negative affective state including long-term states such as anxiety or depression.For example, anxious individuals exhibit an attention bias towards threatening stimuli over non-threatening stimuli (MacLeod, Mathews, & Tata, 1986), and individuals will make more optimistic probability judgments when in a positive state (Nygren et al., 1996).This area of research has provided the basis for an interesting line of investigation into potential similar cognitive biases in nonhuman animals (hereafter, animals).
Not only does cognitive bias research in animals seemingly provide support for an evolutionary foundation of these biases in humans, but it also may allow for an indirect measure of an animal's subjective emotional experience.Emotional experience is commonly described by two dimensions: 1) valence, which relates to how positive or negative an experience feels to an individual, and 2) arousal, or physiological responsiveness or alertness.In animals, emotional state is primarily assessed using arousalbased measures such as cortisol levels (e.g., Paul et al., 2005).On the other hand, estimates of valence rely almost exclusively on linguistic reports in humans and are not usually measured in animals.However, valence is a critical component to understanding animal emotion because two very different experiences could elicit similar states of arousal (e.g., mating or being attacked), yet have very different valence or subjective meaning to the individual (Mendl, Burnam, Parker, & Paul, 2009;Paul et al., 2005).Cognitive bias tasks provide a unique way to potentially tap into this critical aspect of emotional experience, and, thus, many studies have attempted to assess such biases in a wide variety of species (dogs (Canis lupus familiaris): Mendl et al., 2010;goats Burman, Parker, Paul, & Mendl, 2008a, b;Enkel et al., 2010;Harding, Paul, & Mendl, 2004; sheep (Ovis aries): Doyle, Fisher, Hinch, Boissy & Lee, 2010;Doyle, Vidal, Hinch, Fisher, Boissy, & Lee, 2010;starlings (Sturnus vulgaris) : Brilot, Asher, & Bateson, 2010;Matheson, Asher, & Bateson, 2008).
In their seminal study, Harding et al. (2004) trained rats on a go/no-go task in which one tone (i.e., positive cue) signaled the delivery of food following a lever press, but another tone (i.e., negative cue) signaled an aversive event if subjects pressed the lever (Harding et al. 2004).Then, subjects were housed in either predictable or unpredictable (i.e., positive or negative) conditions for nine days, and three novel intermediate "probe" tones that fell between the positive and negative cues were introduced.Subjects housed in unpredictable conditions made fewer positive responses to the ambiguous tones and were slower to respond than individuals housed in predictable conditions, suggesting reduced anticipation of positive events for the individuals housed in negative conditions.This study introduced the exciting possibility that behavioral responses on a cognitive bias task might yield important information about an animal's subjective experience.
However, given this go/no-go approach, it is possible that an animal in a negative state (i.e., housed in unpredictable conditions) might simply show a generalized decrease in responding that appears to reflect a pessimistic interpretation of the ambiguous cues, or reduced anticipation of a position event (Brilot et al., 2010).Therefore, more recent tasks have required two distinct responses rather than the presence or absence of a single response (e.g., Brilot et al., 2010;Enkel et al., 2010;Matheson et al., 2008).In these tasks, animals are typically trained to respond in distinct ways to two cues, one of which results in a positive outcome and one that results in a negative (or less positive) outcome.Then the subject is presented with intermediate cues (falling on a spectrum between the two originally trained cues).The response indicates whether the subject interprets these ambiguous cues as positive or negative.If a subject perceives the ambiguous cue as indicative of a positive outcome, it should respond in the way that it was initially trained to the positive cue, and vice versa.Once baseline performance is established, positive (e.g., enrichment) or negative (e.g., unpredictable housing conditions) interventions are used to induce changes in the subject's affective state and subsequent performance on a judgment task is measured.Brilot and colleagues (2010) presented starlings with a task in which background color indicated whether a positive or negative (or relatively less positive) outcome could be obtained by searching under one of two stimuli (lids with symbols covering petri dishes).In the presence of a dark background, subjects should have uncovered the dish covered by a particular stimulus (S+) to receive a large reward (three mealworms).In the presence of a light background, subjects should have uncovered the dish covered by the other stimulus (S-) to receive a smaller reward (one mealworm).Then, intermediate background colors were introduced and responses to either the S+ or S-were recorded.The authors then manipulated housing condition (enriched condition included "natural wood branches; water for bathing; and a tray filled with bark for natural probing opportunities", p. 725) in an effort to examine potential changes in the anticipation of positive or negative events (Brilot et al., 2010).In contrast to predictions, there was no effect of housing condition (enriched versus unenriched) on performance.Interestingly, the authors also reported that some of the birds rapidly learned that the ambiguous stimuli were not associated with reinforcement, thus rendering the "ambiguous" stimuli as unambiguous indicators that responding would not be reinforced.This learning led to a decrease or complete absence of responding in the probe trials across sessions.
The findings of the Brilot et al. (2010) study present an important alternative explanation for many of the findings of existing cognitive bias studies.Specifically, subjects may be rapidly learning the meaning of ambiguous stimuli and reductions in rates of responding or latency to respond may reflect this learning, rather than affective state-induced cognitive bias.Given the finding that mild stress can improve memory and cognition (see Mendl, 1999 for a review), animals in minimally stressful states (i.e., unenriched conditions) may simply be faster to learn these contingencies than control animals or ones in a positive state.In a re-examination of data from the Bateson and Matheson (2007) study, Brilot et al. found that the birds moving from enriched to unenriched housing conditions did show a decrease in lid-flipping behavior, consistent with a cognitive bias interpretation.Critically, however, birds that were shifted from unenriched to enriched conditions also showed a decrease in lid-flipping behavior, suggesting that a similar learning effect to that reported in Brilot et al. (2010) may have contributed substantially to the outcome of that study.
In line with Brilot et al.'s (2010) rapid learning findings, Doyle, Vidal, et al. (2010) investigated the influence of repeated testing on a go/no-go judgment bias task.Sheep first learned that approaching a bucket in one location would result in food, while visiting a bucket in the opposite corner of the room would result in visual exposure to a dog (an aversive stimulus).Then, buckets were placed in ambiguous locations between the two trained areas and the response (approach or no approach) was recorded.The authors did not manipulate affective state, but rather allowed repeated exposure to the ambiguous locations to assess the potential impact of learning on a judgment task.Over time, subjects were less likely to respond to the ambiguous locations.This finding could be interpreted as increasing pessimism over time; however, the authors made no direct or intentional manipulation of environmental or housing conditions that would suggest a change in affective state.Rather, the authors interpreted their findings as the result of rapid learning of the ambiguous stimuli and raised some questions about the validity of judgment bias tasks with repeated probe stimuli (Doyle, Vidal, et al., 2010).Of course, it is possible that factors outside of experimenter manipulation may have an influence on affective state even in a controlled setting.Nonetheless, given that much of the literature has shifted away from go/no-go tasks in favor of conditional discrimination tasks in which two distinct responses are required, it is a critical issue to examine how these learning effects might manifest in these cognitive bias tasks.
A conditional discrimination judgment task was developed for monkeys using a computerized format that might offer some insight into the issue of repeated exposure to stimuli.Rhesus macaques (Macaca mulatta) and capuchin monkeys (Cebus apella) in a laboratory setting were trained on a 40-step psychophysical discrimination task in which an appropriate response to one end of the spectrum (level 1 for half of the monkeys and level 40 for the other half) yielded one pellet, and the appropriate response to the other end of the spectrum (level 40 or level 1, respectively) yielded four pellets.Once monkeys reached a performance criterion on the primary discrimination (all level 1 versus level 40 trials), monkeys experienced the 38 intermediate, unreinforced levels.Affective state was not manipulated, and changes in the classification of ambiguous stimuli across time were measured.If changes in affective state were necessary to drive changes in cognitive bias, there would be no expectation that the classification of ambiguous stimuli would change with repeated exposure.On the other hand, changes in classification without any intentional manipulation of affect would introduce the possibility that other mechanisms may influence performance on cognitive bias tasks.

Subjects
Seven rhesus monkeys (Macaca mulatta) and four capuchin monkeys (Cebus apella) housed at the Language Research Center (LRC) at Georgia State University participated in this experiment.These monkeys had not had any previous experience with cognitive bias tasks, but had encountered unreinforced stimuli in other tasks.Capuchin monkeys were group housed but separated for testing.Rhesus monkeys were individually housed with constant visual and auditory access to other monkeys.All monkeys were fed manufactured chow and various fruits and vegetables daily between 1600 and 1800 hrs.This study complied with protocols approved by the Georgia State University IACUC.All animals had training and experience with the joystick-testing system.All procedures were performed in full accordance with the USDA Animal Welfare Act and conformed to the "Guidelines for the use of laboratory animals."

Materials
The monkeys were tested using the LRC's Computerized Test System comprising a personal computer, digital joystick, color monitor, and pellet dispenser (Evans, Beran, Chan, Klein, & Menzel, 2008;Richardson, Washburn, Hopkins, Savage-Rumbaugh, & Rumbaugh, 1990).Monkeys manipulated the joystick to produce isomorphic movements of a computer-graphic cursor on the screen.Contacting appropriate computer-generated stimuli with the cursor brought them a predetermined number of 45 mg (capuchins) or 94 mg (macaques) banana-flavored chow pellets (Bio-Serv, Frenchtown, NJ) using a pellet dispenser interfaced to the computer through a digital I/O board (PDISO8A; Keithley Instruments, Cleveland, OH).All monkeys had previously participated in multiple psychological experiments involving this computerized test system.The software for the procedure was written in source code (Microsoft Visual Basic 6.0).

Procedure
Monkeys initially were trained on a conditional discrimination task.An ellipse was presented on the top half of the computer screen (Figure 1A).The height of the vertical axis of the ellipse varied from 8.8 mm to 44.1 mm across trials to create 40 discrete stimulus levels each of which was 0.88 mm taller than the previous levels.Initially, subjects were presented with only two stimuli: level 1 and level 40 of the continuum.There were also two visually distinct rectangular response icons available on the bottom half of the screen for each trial, and a cursor appeared in between these response locations.One of the response icons corresponded to the Level 1 stimulus and the other response option corresponded to the Level 40 stimulus.If the Level 1 stimulus appeared at the top of the screen, moving the cursor to the "small" icon was reinforced with the delivery of food pellets, and if a Level 40 stimulus appeared, responding to the "large" icon was reinforced.The small and large icons remained in the same spatial location on the screen throughout testing.Critically, the number of pellets delivered for a correct response to these two icons differed.For half of the subjects (macaques: Han, Lou, Luke, & Murphy; capuchins: Griffin & Wren), a correct "small" response yielded one pellet and a correct "large" response yielded four pellets.For the remaining subjects (macaques: Obi, Gale & Chewie; capuchins: Drella & Lily), a correct "small" response yielded four pellets, and a correct "large" response yielded one pellet.Incorrect responses resulted in a 10 second timeout period during which the screen remained blank.Monkeys remained in the initial training phase until they made 24 correct responses in their most recent 30 consecutive trials (80%).In the subsequent testing phases, monkeys had to reach this criterion with the anchor stimuli (Level 1 and Level 40) at the onset of each session before probe stimuli were introduced.

Initial discrimination condition.
After monkeys reached criterion in these test sessions, 30% of trials were unreinforced probe trials randomly selected from levels 2 -39 (See Figure 1B).Responses to probe trials were never reinforced or punished and resulted in a 2s ITI before the next trial began.The remaining 70% of trials were anchor stimuli.Incorrect classification of anchor trials resulted in a 10 second timeout period, and correct responses were reinforced with either one or four pellets (as in the training phase).Sessions consisted of 500 trials, and each monkey completed ten sessions of the initial discrimination in which all 500 trials were completed.
Reversal condition.To determine the monkeys' sensitivity to the reinforcement contingencies of the task, a second condition was conducted to see if the bias shifted when the payout structure was reversed.After completing the initial discrimination test, the reinforcement contingencies were reversed for each monkey ("small" = 1 pellet; "large" = 4 pellets → "small" = 4 pellets; "large" = 1 pellet, and vice versa).The same responses were correct in the initial and reversal conditions (i.e., touch small for small, large for large), and the only change in the reversal was that the magnitude of reinforcement changed for each response.Monkeys again had to reach criterion on the anchor stimuli (24 of 30 trials correct) before the same probe stimuli were introduced.Sessions consisted of 500 trials, and monkeys completed a total of 15 sessions of the reversal discrimination.

Data Analysis
To visualize performance on the initial trials, the first two sessions (typically 500 trials each) were collapsed into bins showing 4 levels per bin.Two sessions were selected in order to provide insight into the very early classification patterns, but this yielded a relatively small number of responses for many of the levels (e.g., some levels might have only one response).Thus, the data were collapsed into these 4level bins to gain a better graphical depiction of how initial responses were distributed.This is an important aspect of the study because it will show whether there was a bias in the initial classification responses or if they tended to be randomly distributed.
Then the proportion of responses to the higher and lower paying response option was recorded for all stimulus levels and collapsed across the last 2000 trials for each condition.For half of the subjects, these data were transposed for analytical purposes so that all responses were on the same scale and magnitude was equated for.Specifically, for graphical and analytical purposes, classifying a level 1 as the small response icon yielded one pellet, and classifying a level 40 as a large yielded four pellets, and vice versa for the reversal.In other words, following transposition of the data, a level 1 corresponds to the smaller reward for all subjects and the level 40 corresponds to the larger reward for all subjects.In actuality, these contingency arrangements were counterbalanced across subjects.For each subject, a best fit regression line was used to determine the theoretical "crossover point" or "point of subjective equality" at which the responses were evenly allocated between the high and low payout response options in the initial and reversal conditions.Given that regression lines were used to estimate the crossover point, it was possible that the value could exceed the range of stimulus levels.If this occurred, the minimum or maximum level (0 or 40) was assigned for that individual.Both species exhibited the same pattern of responding, so data were collapsed for all analyses, although performance is shown for each species to indicate this similarity.A one-sample t-test was used to compare the crossover point on the initial discrimination to the objective middle point: level 20.5.A paired samples t-test was used to compare crossover points in the initial discrimination and the reversal condition.

Early Performance
Initial patterns of performance are shown in Figures 2 and 3. Subjects quickly learned the initial discrimination.When first encountering the neutral probe stimuli, most monkeys showed a relatively unbiased distribution of response, as demonstrated by a fairly even distribution of responses in the middle levels of the distribution.

Initial Discrimination Condition
Next, the mature performance, as reflected in the last 2000 trials, was examined.The crossover point differed significantly from the objective middle point of the stimulus continuum, t(10) = 10.244,p < 0.001.As shown in Figure 4A (capuchin monkeys) and Figure 5A (rhesus monkeys), the responses were biased towards the lower payout option.In other words, subjects classified the majority of the intermediate, non-anchor, probe stimuli as the "small" payout option even though half of those stimuli were objectively large stimuli.

Reversal Condition
In the reversal condition, only the reinforcement contingencies were reversed, not the actual classification response.Thus, the same exact response was still objectively correct, but the payout for the two correct anchor responses differed, and this led to a reversal of the biases to again favor responses to the lower payout option (see Figure 4B and Figure 5B).Crossover points shifted significantly in the reversal condition compared to the initial discrimination, t(10) = 4.682, p = 0.001 (initial discrimination: x = 35.14;reversal discrimination: x = 14.75).

Discussion
In a differentially reinforced judgment task, monkeys showed biased responding after continued exposure to the task.At first, subjects were mostly neutral in their classification of ambiguous stimuli, as visually depicted in Figures 2 and 3.However, at the end of the taskindependent from any direct manipulation of affectperformance was significantly biased.Intermediate stimuli that were initially untrained and ambiguous were classified as the less positive response at the end of testing.This pattern was consistent across individuals of both species tested.Following a reversal of the reinforcement contingencies, the crossover point again shifted significantly, suggesting that the bias very possibly resulted from changes in the payout structure rather than changes in affective state.There was no explicit or known manipulation to affective state that would account for the shift in classification in the traditional cognitive bias paradigm, nor is this finding in line with an explanation that these animals are generally experiencing negative trait affect given that the initial classifications appeared to be mostly neutral.It is important to note that these results are not conclusive, but introduce the possibility that other mechanisms might be at play in this task and further consideration is needed in future research.
As enrichment technology improves and cognitive enrichment techniques advance our opportunities for providing a stimulating and challenging environment to captive animals (e.g., Perdue, 2016;Tarou, Kuhar, Adcock, Bloomsmith, & Maple, 2004;Yamanashi & Hayashi, 2011), the techniques for assessing various forms of enrichment are equally important.Cognitive bias tasks offer one potential avenue for assessing these efforts, but the present findings highlight the need to critically assess some of the interpretations of data from the rapidly growing body of cognitive bias research.The impetus of this field has been the possibility that cognitive bias tasks could indirectly measure affective state, an otherwise challenging construct to measure in nonhuman animals.The fundamental assumption in this field is that altered affective state is the mechanism through which cognitive biases emerge in nonhuman animals, an assumption based on this well studied relationship in humans (Mathew & MacLeod, 1994).At this time, a careful assessment of the mechanisms underlying performance on cognitive bias tasks in animals is critically important, and alternative explanations should be carefully considered.The current findings bring light to the possibility that, in this type of task, there may be additional influences on performance.
Specifically, some studies reporting "pessimistic" cognitive biases in many studies relied on the performance of individuals on a cognitive bias task after experiencing a negative event or change in the environment (e.g., Bateson et al., 2011;Burman et al., 2008b;Doyle et al., 2011).Animals in more negative conditions shifted to classifying ambiguous stimuli as the more negative response option compared to controls, which potentially reflected an underlying negative affective state.However, in line with several previous studies (Brilot et al., 2010;Doyle, Vidal, et al., 2010), the present results indicate that these biases may emerge over time as subjects learn that the so-called "ambiguous stimuli" do not yield reinforcement, and not necessarily as a result of a negative shift in affective state.
It has been suggested that animals in a mildly stressful state might experience enhanced learning of these contingencies (Brilot et al., 2010), and this logic could potentially be extended to related findings in this area.Several studies have reported a "positive cognitive bias" (e.g., Brydges, Leach, Nicol, Wright, & Bateson, 2011;Doyle, Fisher, et al., 2010;Matheson et al., 2008) in which positive changes in the environment are related to a more "optimistic" interpretation of ambiguous stimuli.Further, while much of the research has focused on manipulations to state affect (i.e., temporarily induced positive or negative states), there is also research suggesting that an individual's trait affect might influence performance on cognitive bias tasks.Animals that exhibit more stereotypic behavior show more negatively biased responding than those who exhibit less or none (e.g., Brilot et al., 2010;Mendl et al., 2010;Pomerantz, Terkel, Suomi, & Paukner, 2012).However, these very results could potentially be interpreted in the same manner suggested by Brilot and colleagues (2010).Mildly stressed animals, either due to an experimenter induced manipulation of state affect or intrinsically higher levels of trait stress, may learn the task contingencies more rapidly and appear more "pessimistic."Conversely, animals in a positive state may not have the benefit of the learning facilitation induced by mildly stressful situations, and appear relatively more "optimistic."This possibility remains highly speculative at this point and more research is needed to fully address this potential Recent studies of cognitive bias have begun to measure cortisol levels (Pomerantz et al., 2012;Sanger, Doyle, Hinch, & Lee, 2011), a physiological indicator of stress, which may provide insight into a direct biological mechanism for the enhanced learning of these contingencies.Additional insight into these issues might arise from an investigation of reaction times to various stimuli as a task similar to this one progresses.
The present results additionally indicated that at the end of the task, all subjects showed a bias towards classifying "ambiguous" stimuli as the more negative payout response.Previous work has shown that in a go/no-go task subjects will stop responding to the ambiguous stimuli (i.e., treat the ambiguous stimuli like negative stimuli), and the present results indicate that in a task requiring two discrete responses, the trend is to label the ambiguous stimuli as the less positive response option.Although the computerized task in the current study involved many more trials than the standard manual tasks, these results clearly demonstrated that, regardless of how rapidly an individual developed the bias, it was consistently in the same direction.
The directional bias observed in the present study may have emerged as a result of contrast effects.Broadly speaking, contrast effects are shifts in judgment that result from simultaneous or successive exposure to stimuli of greater or lesser value along the same dimension.Incentive contrast refers to one such dimension: specifically, when shifts in the outcome or incentive for a response result in altered task performance.A variety of incentive factors may influence performance on tasks, including shifts in the magnitude, quality, and delay to reward or schedule of reinforcement (Dachowski & Brazier, 1991).The finding that performance deteriorates when the incentive shifts to a less positive outcome (e.g., a downshift from 32% saccharin to 4% for same response) compared to controls is referred to as successive negative contrast (Mustaca, Bentosela, & Papini, 2000).This type of effect may drive the directional bias observed in all of our monkeys.Specifically, expecting to get four pellets for classifying a perceptually "large" stimulus as the large response option and receiving nothing (in the unreinforced probe trials) is a greater contrast than expecting to get one pellet and receiving none.One might expect more suppressed responding to the option suffering the larger negative contrast, and this shift would inherently bias an animal towards the lower payout response.
More work will be necessary to explore the influence of contrast effects on cognitive bias tasks and possible physiological mechanisms that may alter learning of task contingencies or sensitivity to contrast effects.For example, Bentosela, Ruetti, Muzio, Mustaca, & Papini (2006) found that administering corticosterone after a downshift in saccharin concentration facilitated the negative contrast effect in rats, highlighting a potential mechanism through which cognitive bias might emerge.Another important step will be to manipulate levels of arousal through non-affective stimulation (e.g., an ACTH challenge, see Wasser et al. 2000) and explore the possibility that arousal, independently from valence, might influence performance on cognitive bias tasks.Implementing reversal designs (ABAB) rather than pre-post designs (AB) may also help to disentangle learning effects from cognitive bias.
Another possibility is that these findings do reflect elements of cognitive bias in a traditional sense.Experiencing the ambiguous and unreinforced stimuli might in and of itself induce a negative affective state in the animals that underlies the observed pattern of responding.This possibility, along with the previously described alternatives, should be intentionally and directly teased apart in future research as the field progresses.This is an exciting, growing area of research that has great potential to tap into the subjective experiences of nonhuman animals and explore the evolutionary history of the relationship between cognition and affect.The consistency of observed bias, its stability across diverse taxa, and the range of methodologies effectively used suggests that the effect is robust.However, the underlying mechanism, which is the driving interest behind much of this work, may be more complex or multifaceted than previously thought and should be thoroughly explored as this field develops.

Figure 1 .
Figure 1.A depiction of the trial set up as displayed on the computer.Panel A shows the two anchor stimuli trials (levels 1 and 40) and Panel B shows 3 possible levels of the unreinforced probe stimuli.