Chimpanzees Rarely Settle on Consistent Patterns of Play in the Hawk Dove, Assurance, and Prisoner’s Dilemma Games, in a Token Exchange Task

Games derived from experimental economics can be used to directly compare decision-making behavior across primate species, including humans. For example, the use of coordination games, such as the Assurance game, has shown that a variety of primate species can coordinate; however, the mechanism by which they do so appears to differ across species. Recently, these games have been extended to explore anti-coordination and cooperation in monkeys, with evidence that they play the Nash equilibria in sequential games in these other contexts. In the current paper, we use the same methods to explore chimpanzees’ behavior in the Assurance Game; an anti-coordination game, the Hawk Dove game; and a cooperation game with a temptation to defect, the Prisoner’s Dilemma game. We predicted that they would consistently play the Nash equilibria, as do the monkeys, and that, as in previous work, the subjects’ level of experience with cognitive experiments would impact performance. Surprisingly, few of our pairs consistently played the same outcome (i.e., no statistically significant preferences), although those who did showed evidence consistent with Nash equilibria play, the same pattern seen more consistently in the monkeys. We consider reasons for their inconsistent performance; for instance, perhaps it was due to lack of interest in a task that rewarded them almost every trial no matter what option they chose, although this does not explain why they were inconsistent when the monkeys were not. A second goal of our study was to ascertain the effects of exogenous oxytocin in their decision making in one population. In line with recent work showing complex effects of oxytocin on social behavior, we found no effect on subjects’ outcomes. We consider possible explanations for this as well.

. Perhaps not surprisingly, given their tendency to cooperate in field observations and/or experimental contexts, all three of these species choose to play the payoff-dominant NE after sufficient exposure in iterated games, although intriguing differences have emerged among species.
One set of studies comparing these species and humans, using nearly identical procedures, highlighted the fact that different mechanisms are likely underlying their decisions (Brosnan et al., 2011;Parrish et al., 2014, reviewed in Brosnan, Beran, Parrish, Price, & Wilson, 2013. Capuchin monkeys chose to play the payoff-dominant NE only when they could see their partner's choices as they made them (i.e., a sequential game), and could not maintain this pattern, even once it was established, if choices were subsequently hidden until both subjects had made their selection (i.e., a simultaneous game). This indicates that capuchins may be solving the task by matching their partners, which could be a sufficiently beneficial strategy for a species that lives in small, highly cohesive social groups. Rhesus monkeys and chimpanzees, on the other hand, played the payoff-dominant NE even when choices were hidden, although a substantial number of chimpanzees failed to settle on any consistent strategy. Further testing with rhesus monkeys and humans, in which they played against a simulation that was preprogrammed to specific strategies, indicated that rhesus monkeys showed a bias toward choosing the Stag option (which resulted in the highest payout most of the time), whereas humans were equally successful, but did so by matching their choices to those of their partner (i.e., probability matching; Parrish et al., 2014).
Two other interesting points also emerged from this work, both of which indicate that chimpanzees may do best when they can see their partner's decision as it is made. First, Bullinger, Wyman and colleagues (2011) found evidence of a "leader-follower" dynamic, in which the second player's decision was influenced by the first player's actions. In a second study, Duguid, Wyman, Bullinger, Herfurth-Majstorovic, and Tomasello (2014) explicitly found that coordination declined when it was more difficult to see the partner. Indeed, when given a choice between coordinating with a partner and simply working on their own, chimpanzees preferred the latter , providing further evidence that coordination can be difficult.
Emerging work explores two other decision situations, anti-coordination (as seen in the Hawk Dove Game; HDG) and cooperation when there is a temptation to defect (as seen in the Prisoner's Dilemma game; PD). The former is an anti-coordination game in which players must choose between two options, Hawk and Dove. The payoff matrix (Table 1) is such that choosing Hawk results in the highest individual payout, but only if the partner plays Dove. If both subjects choose Hawk, neither subject is rewarded. A Dove choice results in a low outcome no matter what, but it is lower if the partner plays Hawk. There are two asymmetric NE, the uncoordinated Hawk/Dove and Dove/Hawk outcomes, and the payoff dominant outcome is to alternate between the NE, such that each partner takes turns receiving the best payoff. Smith, Leverett, & Brosnan, unpublished data). This was contrary to our predictions for two reasons; first, we had predicted that the capuchins would have difficulty finding and persisting on the NE in the Hawk Dove Game if they used matching to solve such decision tasks (the asymmetric NE requires anti-matching). Second, in this case, unlike in the Assurance Game, rhesus did not play the NE consistently when they could not see their partner's choices. Somewhat to our surprise, however, a substantial proportion of humans (approximately half) found the payoff dominant alternating Nash solution even with no instruction.
The Prisoner's Dilemma (PD; Table 1) differs from the others in that the best payoff depends on whether the game is one-shot (that is, subjects make a single choice) or repeated. The NE is to defect, because cooperating risks a zero outcome if the partner defects, and in a oneshot game, it is always in the partner's best interests to defect, no matter what choice the subject makes. Nonetheless, mutual cooperation results in a better average payoff than mutual defection, and in repeated play (called the iterated Prisoner's Dilemma) it is therefore better for both players to establish a cooperative relationship. This repeated version of the Prisoner's Dilemma is arguably the more relevant for long-lived organisms that spend their lives in the same group, such that their interactions are enmeshed in repeated interactions in relationships with known individuals. Indeed, when humans are presented with iterated versions of the Prisoner's Dilemma, they often diverge from game theory predictions, cooperating more than expected (Hayashi, Ostrom, Walker, & Yamagishi, 1999). Experiments with non-human animals generally find that these species defect (bluejays: Clements & Stephens, 1995;pigeons: Green, Price, & Hamburger, 1995), although recent evidence indicates that cooperation may occur. In one recent study on macaques, despite the fact that they mutually defected more often than they mutually cooperated, cooperation was more likely after trials in which they both cooperated (Haroush & Williams, 2015).
A key advantage of economic games is that we can directly compare the subjects' choices across different games because the payoff matrix is anchored between an individual bestpossible payoff of four rewards and a worst-possible payoff of zero rewards. Though the NE and payoff-and risk-dominant outcomes vary among games, this nonetheless allows a direct comparison of different types of social decisions when both the actual payoff and decisionprocedure are carefully controlled. If subjects coordinate on the NE, it suggests that they recognize that their choices are influenced by their partner's choices. If subjects do not coordinate on the NE, it suggests that selfish interests/the temptation to defect override the interest in working towards an outcome that benefits both partners. If they do not develop a consistent pattern (NE or otherwise) of play, it indicates that they either do not understand the game or are not interested in the task.
In all of the studies to date, subjects faced one game structure at a time. However, this is not how primates experience decisions in real life. Chimpanzees and other social species are enmeshed in a social context in which their interactions differ among different individuals, or even within interactions with the same individual in different contexts that involve different choices (as in the snowdrift game: Sánchez-Amaro, Duguid, Call, & Tomasello, 2016). One major question is whether subjects maintain their preference for their pattern of play when games alternate, rather than occurring in blocks of sessions, mimicking the real life movement between different contexts. To explore this, with the chimpanzees at the National Center for Chimpanzee Care in Bastrop, Texas, U.S.A. (Bastrop) we included ten sessions in which session blocks alternated between the Hawk Dove Game and the Assurance Game (on different days) to see whether they continued to play the NE when the context shifted between sessions.
Thus, the focus of our study was to extend previous work on coordination in chimpanzees to include these three games, the Assurance Game, Hawk Dove Game, and the Prisoner's Dilemma, in order to better understand 1) how the chimpanzees' ability to anti-coordinate and cooperate related to their ability to coordinate and 2) how the chimpanzees' choices related to those of other species that have been tested (capuchin monkeys and rhesus monkeys). We tested two different populations of chimpanzees living in similar environments, to determine whether there were population differences based on the levels of experience of the animals. Specifically, chimpanzees at Yerkes National Primate Research Center in Lawrenceville, Georgia, U.S.A. (Yerkes) have more experience with cognitive and behavioral testing than do the Bastrop chimpanzees. Indeed, prior research has shown that chimpanzees with extensive experience with cognitive and behavioral testing (e.g., at the Language Research Center of Georgia State University and Wolfgang Köhler Primate Research Center in Leipzig, Germany) showed evidence of a strategy in the Assurance Game, finding the payoff-dominant outcome quickly, in both token tasks and other procedures (Bullinger, Wyman, et al., 2011;Duguid et al., 2014). We therefore thought it was important to test the new games on multiple populations with different levels of experience.
Finally, it is not clear what mechanisms help individuals make decisions in these tasks, and what leads to different decisions. It has been proposed that, among other functions, the neuropeptide oxytocin (OT) influences cooperation and competition. In humans, exogenously administered oxytocin (typically through a nasal spray) correlates with behavioral measures of cooperation and trust (Declerck, Boone, & Kiyonari, 2010;De Dreu et al., 2010;Kosfeld, Heinrichs, Zak, Fischbacher, & Fehr, 2005;Theodoridou, Rowe, and unrelated chimpanzees (Crockford et al., 2013), and in capuchins, oxytocin is released endogenously after grooming and fur rubbing (Benítez, Sosnowski, Tomeo, & Brosnan, 2018). Because economic games require social attention, including some degree of coordination (and sometimes cooperation) with a partner to maximize rewards, especially when there is a temptation to defect, they are a good framework for testing whether the effects of exogenous OT promote prosocial behavior. To provide additional information on this question and to supplement the literature on the response to OT in various non-human taxa, we tested the effect of exogenous OT on the acquisition and expression of strategies in our games in the Yerkes population of chimpanzees.
For our study, we maximized the chances of finding NE behavior by using the procedure in which subjects could see their partners' choices as they made them (i.e., a sequential task); this is the only procedure that generated NE play in capuchin and rhesus monkeys in these games . Furthermore, partner visibility is more ecologically valid, and therefore, this setup may be better at eliciting species-typical strengths. Overall, we predicted that chimpanzees would show the same strategies seen earlier in the Assurance Game (matching or the payoff-dominant Stag-Stag NE), and that they, like rhesus monkeys and capuchins, would play the NE in the Hawk Dove Game. We also predicted that they would maintain their preferences for NE play when alternating between the Hawk Dove and Assurance Games. Because data on other species are contradictory, we did not have a directional prediction for their choices in the Prisoner's Dilemma. Finally, given the contradictory evidence on the effect of oxytocin on social behavior, we did not have a directional prediction for its effect, although we note that earlier work did not find an effect of OT on game play in either monkey species (Smith et al., unpublished data).

Ethics Statement
All work was approved by the IACUCs of The University of Texas MD Anderson Cancer Center (00000894-RN01) and the Yerkes National Primate Research Center (YER-2001074-041718GA). Both centers are fully accredited by AAALAC-I All work met the standards of care for research with primates of the USA and the American Society of Primatologists.

Subjects
We tested 36 socially-living chimpanzees with tolerant social partners from their groups (in four male/male dyads, seven male/female dyads, and seven female/female dyads); we considered partners to be tolerant if they simultaneously ate food in close proximity to each other, without stealing from one another. Subjects were housed at two facilities: the National Chimpanzees at both facilities were tested in indoor dens that adjoined the outdoor enclosures and were part of their normal housing area (indoor runs ranged in size from 6 ft deep by 15 ft wide to approximately 8 ft and 8 in deep by 9 ft wide). Individuals voluntarily participated and separated from their group for testing purposes in their inside enclosures for a period of no longer than 60 min and were rewarded using positive reinforcement techniques for doing so. If a subject stood by the door or acted as if it wanted to go outside, testing ended and was repeated on another day if the subject chose to participate. No chimpanzees at either facility were ever deprived of food or water, including during testing sessions. All rewards for testing were in addition to the animals' daily diets. Unfortunately, procedures at the two facilities ended up differing on several dimensions, due to both constraints at the facilities and management needs that reduced our ability to test our subjects at Yerkes. As a result, we do not feel it is appropriate to combine the studies into a single analysis and report the results at each facility separately.
Bastrop.-Ten pairs of subjects (three male/male, two male/female, five female/female dyads) voluntarily participated, as indicated by choosing to come in to their indoor enclosures and engage with the experimenter. The tested dyad was temporarily separated from the group; groupmates had access to other indoor and outdoor areas of the living space but were unable to directly interact with the tested dyad. Each session consisted of 60 trials. Each pair completed ten sessions of Hawk Dove Game, followed by ten alternating sessions of Assurance and Hawk Dove games (alternating across sessions on different days, rather than within sessions, due to the difficulty of replacing tokens in buckets each trial), and finally ten sessions of the Prisoner's Dilemma (one pair did not complete the Prisoner's Dilemma). Due to facility constraints, we did not run the oxytocin condition at Bastrop.
Yerkes.-Eight pairs of subjects (one male/male, five male/female, two female/female dyads) voluntarily participated in their indoor enclosures. Again, voluntary participation was indicated by subjects coming in to their indoor enclosures and engaging with the experimenter, and the tested dyad was temporarily separated from the group. Each session consisted of 75 trials. All pairs were tested on the Assurance Game first, followed by the Prisoner's Dilemma. We were unable to complete the rest of the games because our subjects were moved to new social groups for management reasons.
All subjects at Yerkes received oxytocin and saline controls for each test. Each pair participated in an initial 10 sessions, 5 experimental sessions with OT and 5 control sessions with a saline placebo, for each economic game. Sessions were blocked and counterbalanced: half of all pairs received OT first and half received the placebo first Those pairs that failed to show a consistent response pattern after 10 sessions participated in an additional 5 sessions of their initial condition (either OT or saline) to see whether additional exposure would influence subsequent choices.

General Procedure Experimental Procedure
In order to make sure that our results were comparable, we utilized the procedure that we had developed previously to test how the payoffs influence social decision-making without changing other important features of the task. Each game is a repeated dichotomous choice task; subjects make a choice between two different tokens, each of which represents one of the choices in the game (i.e., Hawk or Dove, Stag or Hare, Cooperate or Defect). Rewards are based on the subject and the partner's choices. We use no pre-training or pre-testing in order to study how the subjects' choices emerge as they gain an understanding of the payoffs of the task. These methods are identical to those used in our previous work (Brosnan et al., 2011;Smith et al., unpublished data) so that we can directly compare our results across species.
Pairs were mutually exclusive and were formed from members of the same social group. No subject was ever used in more than one pair. Because all participation was voluntary, subjects were paired with partners with whom they chose to voluntarily associate for testing, meaning that these were presumably pairs that had at least reasonably good relationships (in a previous study, only approximately half of all possible pairings routinely separated together; Brosnan et al., 2015).
For each game, each of the two strategies was assigned a differently colored PVC token. Each combination of token choices was assigned a specific payout, in grapes, based on the game structure (Table 1). Tokens were presented to the chimpanzees in buckets that were affixed to the interior of each enclosure. Each bucket was a standard 2-gallon size (23.5 cm height, 28 cm diameter) cut in half vertically, affixed so that the cut side of the bucket was flush against the interior of the enclosure mesh. This allowed chimpanzees to easily access tokens and the experimenter to replenish the supply and maintain even numbers of each of the colored tokens following each trial. Buckets were placed approximately 1 m apart so that each chimpanzee had their own bucket. The buckets were filled with 10 tokens of each of the two different colors. Tokens were 5 cm-long, 1.8 cm diameter PVC pipes colored with nontoxic paint.
Exchanges counted only if the tokens were returned directly to the experimenter's hand. To facilitate this (and avoid off-target exchanges that would be difficult to interpret), a flat Lexan surface measuring 45 cm in length and 15 cm in width was affixed horizontally to the exterior of the mesh. At Yerkes, the Lexan surface was constructed to be 15 cm from the front of the cage mesh such that if subjects attempted to exchange tokens at an inappropriate time, the tokens were likely to drop to the ground and would not count as an exchange. At Bastrop, the Lexan surface was suspended at a slight angle such that tokens exchanged at an inappropriate time would roll back into the chimpanzees' enclosure and would not count as an exchange.
Each trial began with the experimenter announcing the trial number (to facilitate coding from video at a later time, and also to indicate to subjects that a new trial was beginning), then extending one hand to each subject, palm up. Generally, the subjects sat attentively next to one another and the apparatus for the majority of the duration of each session; they could therefore see their partner's choices, and potentially use social cues to influence them. There were no constraints on the order in which the subjects made their choices. Both subjects chose one token from their respective buckets and exchanged it through the enclosure mesh to the experimenter. (If subjects attempted to return two tokens simultaneously, the experimenter returned them to the bucket and requested a new exchange; if subjects attempted to return two tokens sequentially, the experimenter accepted the first and rewarded the chimpanzee appropriately, but ignored the second). The experimenter placed each token vertically (so it did not roll) on a platform in front of each subject on the respective side of the center of the Lexan surface so that each subject could see their own and their partner's choice. The experimenter then retrieved the appropriate number of rewards for each subject from a container. Grapes were then distributed sequentially, one at a time, to make it clear how many each subject received, and simultaneously to both subjects, with one subject receiving grapes from each hand, and with the experimenter counting out loud as each grape was dispensed. If subjects received a different number of rewards, the experimenter held the empty hand up to her shoulder on the side with the subject who had received all of their rewards, while continuing to reward the partner with the other hand. This way, both subjects could easily determine both how many rewards they received, and how many they received relative to their partner. The next trial began as soon as both subjects had consumed all of their rewards.

Oxytocin or Saline Administration
The oxytocin conditions were done only at Yerkes, and included all subjects tested at Yerkes. Each dyad voluntarily separated from their group, stayed in the same enclosure together and each received 60 IU OT (experimental condition, which resulted in approximately 3.5 ml of solution) or ~6 ml sterile saline (control condition) administered via intranasal nebulizer. Chimpanzees were trained using only positive reinforcement techniques to hold their nose flush against the mesh while a tube was placed under their nostrils and vapor was administered (all training was done with sterile saline). The first step of training consisted of shaping them to maintain that position and desensitizing them to the presence of the nebulizer (which emitted a hum during operation). We then shaped them using fruit juice rewards and successively longer periods to allow us to place the tube emitting the vapor while they held their position by the mesh. During testing, chimpanzees held for four consecutive intervals of 2.5 min, with 15-30 s breaks between intervals, which allowed for the full administration of OT (if OT vapor was still being emitted after the standard dosing time, they continued to dose until all OT was gone). During control sessions, the chimpanzees performed the same behavior, but received only sterile saline via the nebulizer. Two experimenters worked in tandem to dose both chimpanzees simultaneously. In situations in which there were distractions or the nebulizer malfunctioned, extending the nebulizing period beyond 20 min, the test was canceled and repeated at a later date.
Following the administration of OT or saline, the pairs remained indoors together and separated from their group for an additional 30 minutes, before participating in up to 75 trials of an economic game. Such a waiting period has been standard in the literature, due to the possibility that OT administration triggers a "feed-forward" release of endogenous OT (Chang et al., 2012;Gossen et al., 2012;MacDonald et al., 2011;Porges & Carter, 2011;Scheele et al., 2013;Striepens et al., 2013). Although our results show no effect of oxytocin, making it difficult to determine whether the manipulation was successful, we feel that it is nonetheless important to report our full procedure and ensure that this information is available in the literature. Exposure to OT or saline was in blocks of five sessions, counterbalanced with half of dyads receiving OT first, and half of dyads receiving saline first.

Statistical Analyses
All sessions were video recorded. Data were coded from video for the identity of the subject that exchanged first and for which type of token each subject exchanged for each trial. We analyzed the last five sessions that each pair participated in for each game for several reasons. First, at Yerkes, trials were blocked five with saline and five with oxytocin, so using blocks of five trials avoided including both in the same analysis. Second, subjects at Yerkes received more than 10 sessions on some occasions, making a direct comparison with the Bastrop chimpanzees inappropriate. Finally, as discussed earlier, our subjects learned the outcomes as part of the study, and we were interested in whether they developed consistent responses after they were familiar with the payoff matrices. Note that data from all sessions for each pair are presented in the supplementary data files for both Bastrop and Yerkes subjects.
Our alpha level was p < .05 and all reported p-values are two-tailed. We tested each pair's outcomes against chance using χ 2 tests and Yates' continuity correction, as expected cell frequencies are n > 40 (Siegal, 1956). Because we are testing whether pairs coordinate and how, dyadic outcomes are the appropriate metric here, rather than individual choices (all pairs were mutually exclusive, which avoids the problem of pseudoreplication). We treated each pair's entire test as a data point in the χ 2 tests; a single measure per pair avoids the problem of multiple comparisons, which would yield a higher rate of false positives. In situations in which there were too many cells with zero values (or the expected values of cells were < 5), we used Fisher's Exact tests. If neither chi-square nor Fisher's tests were possible due to a high number of cells with zero values, we considered a strategy significant if a pair selected a single outcome on 75% or more of the trials (with four possible outcomes, 'chance' is 25%, making 75% a very conservative criterion). This criterion was used to analyze previous studies of the Assurance, Hawk Dove and Prisoner's Dilemma games (Brosnan et al., 2012;Smith et al., unpublished data).
Analyses on the final five sessions were conducted in SPSS. In addition to comparing overall outcomes, we examined individual behavior for nonrandom choices with binomial tests, and we used a non-parametric runs test to determine whether a series of binary choices was random. To determine whether the choice of the subject who played first influenced the second player's choice in each game, we ran generalized linear mixed effects analyses in R (R Core Team, 2017) using the lme4 package (Bates, Maechler, Bolker, & Walker, 2015).
We used a random intercepts model to predict second mover behavior (Choice 2; DV) with Choice 1 and Session as our fixed effects, along with their interaction term, and we included Subjects and the Dyad as random effects. To obtain p-values, we ran likelihood ratio tests of the full models with fixed effects against a null model without the effects.

Results
Due to differences in procedures (number of trials per session, number of sessions per game, which games were run, oxytocin was utilized only at Yerkes), we report the data for the two facilities separately rather than in a single model. First, we present data from Bastrop, where chimpanzee dyads participated in ten sessions of Hawk Dove Game, then ten alternating sessions of Assurance and Hawk Dove games, and finally ten sessions of Prisoner's Dilemma. Then we present data from Yerkes, where chimpanzee dyads participated in varying numbers of five-session blocks of Assurance and Prisoner's Dilemma games. Oxytocin was administered in alternating blocks of sessions in a counterbalanced fashion to all Yerkes dyads. In order to more directly compare results, we analyzed data from the last five sessions that any dyad from either facility completed.  Table 3). Though every pair played the payoff maximizing NE (Stag-Stag) within every session, no pair persisted in playing it (all p > .05). Considering individual data, one subject (CHU) developed a consistent preference for choosing Hare (binomials < 0.05). Overall, across all pairs, neither subjects' first choices nor Session had a significant impact on their partner's second choice (χ 2 = 6.90, df = 9, p = .65).
Two pairs' choices significantly deviated from chance in the alternating sessions of Hawk Dove Game, with one pair apparently avoiding Dove/Dove and the other pair preferentially playing Dove/Hawk (Figure 3, Table 4). Considering individual data, one subject (GAY) developed a consistent preference for choosing Dove, and another subject (CHU) developed a consistent preference for choosing Hawk (binomials < 0.05). Across all pairs, the impact of subjects' first choices on their partner's second choice, although not significant, was very close (χ 2 = 16.88, df = 9. p = .0506). However, the effect, if any, was modest; if the first player chose Dove, the odds of the second player matching with Dove were 1.06, and mismatching with Hawk was 0.95. These results reflect at best a slight tendency to choose the lower risk Dove-Dove strategy.

Yerkes
Oxytocin Results.-We compared the last five Experimental (OT) and last five Control (Saline) sessions each pair participated in with paired t-tests for each paired outcome, to determine whether the administration of oxytocin affected the proportion of choices made in the games.
In Assurance Game, three of the six pairs showed significant differences between their last five sessions of Experimental versus Control (Table 6) In the Prisoner's Dilemma, both subjects within a single pair switched their preferences (Table 6); one from Defect to Cooperate (TRA), and the other from Cooperate to Defect (JUL).
We did not observe a consistent pattern to the changes (i.e., not all subjects changed to Stag or Cooperate, etc.) in either game, suggesting that these differences are either due to order effects or random changes in play, but are not meaningful with respect to OT. Therefore, we ran the remaining analyses focusing on the last five sessions each pair participated in, irrespective of whether it was Experimental (OT) or Control (Saline). Assurance Game.-No pair developed a significant preference for any of the four possible outcomes in the Assurance Game ( Figure 5, Table 7). However, four pairs showed marginal, but not significant, preferences (p < 0.10). In three cases, one subject in the pair (ABB, ATH, KAT) developed consistent preferences for choosing Stag (binomials < 0.05).
However, in only one of these cases was there any indication that the pair was converging on the NE (ATH-CHI; see Table 7). Additionally, for three of the four pairs with marginal preferences (AMS-ABB, ATH-CHI, LUC-FAY), the last five sessions they participated in were in the Control condition, further supporting evidence of an order effect, rather than an effect of oxytocin on their behavior.
Model comparison indicated that our full model was better able to predict second mover behavior than a null model (χ 2 = 19.52, df = 9, p = .02). Closer inspection revealed that this was the result of a significant interaction between Session and first mover behavior (Choice 1). Looking only at the last five sessions, after the first mover had chosen Stag, the second mover chose Stag significantly more in the last session as compared to the first session of this 5-session block (β = 0.60, z = 2.35, p = .02), indicating an increasing preference for matching Stag play across these last five sessions.
The Prisoner's Dilemma.-One pair's choices in the Prisoner's Dilemma deviated from chance (REB-EVE, Fig 6, Table 8). A qualitative assessment of the data does not indicate a strong preference for one option over the other. Neither subject had a consistent preference for choosing Cooperate or Defect (binomials > 0.05). Another pair (LUC-FAY) showed marginal, but not significant, preferences (p < 0.10) and again, neither subject had a consistent preference for choosing Cooperate or Defect (binomials > 0.05).
Model comparison for the Prisoner's Dilemma also indicated that our model better predicted second mover behavior than a null model (χ 2 = 30.61, df = 9, p < .001). Overall across pairs, first choice behavior influenced second choice behavior (β = 0.83, z = 3.459, p < . 001), with a tendency to match. Closer inspection, however, revealed that there was also a significant interaction between session and first mover behavior (Choice 1). Looking again only at the last five sessions, analysis revealed a significant decrease in selection of Defect by the second mover after the first mover already chose Defect in each of the last three sessions as compared to the first session in the last block of five (β's > 0.68, z's > 2.07, p's < .04). These results indicate an increase in avoidance of the mutually destructive Defect/ Defect outcome in the later sessions compared to earlier ones.

Discussion
One of the challenges of studying behavior from a comparative perspective is developing experimental procedures that allow for direct comparisons across species and contexts. Previous work has successfully used games derived from experimental economics to do this, finding both continuities and differences across primates. In this paper, we extend previous work on coordination in chimpanzees to explore anti-coordination, using the Hawk Dove game, and cooperation with a temptation to defect, using the Prisoner's Dilemma game. Despite the fact that previous work on both of these games in monkeys shows that they persist in playing the NE in at least some contexts, in the current study, most pairs of chimpanzees did not. Additionally, we found no effect of exogenous administration of oxytocin on whether they developed a consistent pattern of responses, or their overall responses. Below, we discuss possible reasons for the differences between this study and previous work.
Across all of our games and conditions, there were only four instances in which a pair showed a significant pattern to their responses. Of these four significant responses, one pair showed a strong preference for one of the asymmetric NE in the Hawk Dove Game, and another showed a preference for the mutual defection NE in the Prisoner's Dilemma. Thus, when chimpanzees do show consistent patterns of responses, half of the time they are, as with monkeys, the NE (assuming that there was an equal chance of each of the four outcomes across all three games, it means a 25% rate of finding the NE by chance; note that the monkeys found the NE at a much higher rate). Perhaps more compelling, at Yerkes, there was an overall influence of first choice behavior on second choices across sessions that suggested convergence towards the NE: in the Assurance Game, if the first player chose Stag, the second player over time increased their proportion of choosing Stag, and in the Prisoner's Dilemma, if the first player chose Defect, the second player over time decreased their proportion of choosing Defect.
In most cases, however, the significant pattern seemed to be due to one member of the pair showing a strong preference for one or the other token and their partner playing randomly. Unfortunately, it is hard to interpret what these results mean, in the absence of a significant pattern at the pair level. It could be that the individual with the preference understood the task and was attempting to maximize their outcomes, but it is equally likely (and we think more so) that they either did not understand the task, and so reverted to some other reason for preferring a token, or were receiving enough rewards that they did not put the effort in to choosing the maximizing option. Note that side biases were not possible in our task, due to the way we offered the choices, but in other studies, primates have been argued to show side biases when they were unable to understand the task (reported in Jensen, Hare, Call, & Tomasello, 2006;Vonk et al., 2008), so it is possible that these preferences were the functional equivalent to these side biases in our task.
Nonetheless, there was some evidence that they were learning about the task and adjusting their behavior over the course of the study. As we mentioned above, while the Yerkes pairs did not settle on a strategy over the final five sessions of testing in the Assurance Game, nor was there an overall effect of first choice on second choice behavior, there was a significant interaction between the second subject's choice and session. Specifically, there was a significant increase in the selection of Stag as the second choice after the first choice was Stag in the fifth session compared to the first. In other words, in later sessions, subjects were more likely to match Stag than in earlier sessions. We hypothesize that this is a subtle indication of subjects learning the task.
This potential learning becomes clearer when looking at the Prisoner's Dilemma results. There was an overall effect of first choice on second choice behavior in the Prisoner's Dilemma, as well as significant interactions between second subject's choice and session. In the first two sessions (of the last five sessions that we compared), subjects matched on Defect over 60% of the time, then reduced to below 50% matching in the last three sessions.
Specifically, in comparing the first of these sessions with sessions three, four, and five, there was a significant reduction in the selection of Defect after the first mover chose Defect. It is not clear whether in the first session the chimpanzees were engaged in a matching strategy, or whether they had a preference for choosing Defect. Nonetheless, they overcame this across sessions by decreasing their tendency to match, but only for the Defect choice, which avoided the mutually destructive outcome. Note, however, that this move away from Defect by the second player also increases the inequality of outcomes if the first player continues to play Defect. This highlights the difficulty of moving away from mutual defection in the Prisoner's Dilemma task.
Finally, we note that most of our evidence of NE play or learning came from the Yerkes chimpanzees, who had more extensive experience with cognitive testing, rather than the Bastrop chimpanzees. Indeed, we had hypothesized that this would be the case, based on our previous finding that chimpanzees with additional experience in cognitive testing were more likely to find the NE outcomes (Brosnan et al., 2011). Though this evidence is not conclusive, this continues to be a hypothesis worth further investigation.
It is surprising that the chimpanzees did not persist in playing the NE in each game, given that capuchins and rhesus macaques do so, and that at least some chimpanzees do so as well (Brosnan et al., 2011;. Moreover, in other methodologies, chimpanzees coordinate well (Bullinger, Wyman et al., 2011;Duguid et al., 2014;Sánchez-Amaro et al., 2016).
Although we cannot conclusively answer this question, we speculate on a few reasons why this may have been the case. First, previous work (Brosnan et al., 2011) indicates that chimpanzees show a strong effect of experience, which, as we indicated, may also have been the case in our study. However, neither population has nearly as much experience as the "experienced" chimpanzees in previous work, so it may be that our current chimpanzees simply lacked the experience to either understand or be motivated by the task (we cannot disentangle the two, however, as we discuss below, we suspect that motivation may have been a substantial issue).
Second, whereas the monkeys tested in previous work had approximately the same number of trials in each of these games (i.e., Hawk Dove, Prisoner's Dilemma and alternating economic games) as our chimpanzees, most had substantially more experience overall with experimental economic games from previous work. Most capuchins (see details on exceptions, below) had all experienced 2037-4362 trials and the rhesus had all experienced 472-1876 trials in the Assurance Game (Brosnan et al., 2011; prior to their exposure to the Hawk Dove Game and Prison'r's Dilemma. On the other hand, these chimpanzees had never seen this task before in any form. One concern is that our subjects may not have had sufficient opportunity for learning within the game. However, their trial numbers were well within the range of the number of trials in which subjects in other studies have learned the contingencies: the Bastrop chimpanzees participated in 1800 trials (600 per game), and the Yerkes chimpanzees participated in 698-2625 trials (varying numbers of sessions per game). In previous work, chimpanzees at the LRC learned the contingencies in 400 trials (Brosnan et al., 2011) and capuchins and squirrel monkeys learned in approximately 300 trials per game (Smith et al, unpublished data;Vale, Williams, Schapiro, Lambeth,, & Brosnan, 2019).
Thus, the monkeys had more experience with this format of testing, which presumably helped them to learn the task more rapidly, but not more experience with the specific game. One thing we do note, however, is that in our recent work, we were able to test two pairs (composed of four unique individuals) of capuchin monkeys for which none of the four subjects had previous experience with economic games. These four naїve subjects found the NE as quickly (or in some cases, more quickly) than pairs that did have previous experience (Smith et al., in revision), so previous experience with economic games seems unlikely to be the only explanation.
Related to this, whereas capuchin monkeys play the payoff dominant NE in the Assurance Game in either an exchange-based manual task, as we used here, or a computerized task (Brosnan et al., 2011;, pairs are much more likely to play the NE in the computerized task. Indeed, in more recent work, they played the asymmetric NE in a computerized version of the Hawk Dove Game, but not in an exchange-based manual version . We speculated in that paper that this was because of two key differences between the computerized and manual tasks. First, the manual task takes longer, so we were able to complete only 40 trials per session in that study, whereas in the computerized task, we could do 120 trials in the same time period. Second, there was a shorter delay between choice and reward in the computerized task as compared to the manual task, because humans' reaction times are not as fast as computers' reaction times. Both of these, of course, would enhance learning, and as we argued, we think that it was these factors that allowed them to learn more effectively . In addition, in manual tasks, experimenters are present, which means that there are simply more things for the subjects to focus on as potential cues, making it more difficult for them to isolate what is the relevant cue, and therefore learn it (e.g., Prétôt, Bshary, & Brosnan, 2016). Finally, in the manual task, subjects often choose a token before the trial has even started (the buckets cannot be removed between trials). Once they have a token in hand, it may be difficult for them to put it down and pick up the more relevant one (that is, inhibit simply returning whatever token is already in their hand), even if the subject knows what would be the better option (we argue for a similar challenge in another task looking at efficiency in token trading; Brosnan & Beran, 2009). It may be easier to redirect a joystick, especially because, on the computer, subjects cannot make a choice until it appears, after the trial has started, so there is no chance to "pre-commit" to an option that ends up being a poor choice. Unfortunately, none of the chimpanzees we tested were computer trained, so we are unable to test the hypothesis that subjects would do better in a computerized format. We hope that others who have that option will do so.
Although the manual version of this task may be challenging, it also likely depends on the format of the task. Chimpanzees do better at maintaining NE play in alternate methodologies based on a barpull apparatus (Bullinger, Wyman et al., 2011;Duguid et al., 2014;Sánchez-Amaro et al., 2016). Perhaps not surprisingly, other work has found that barpulls are very intuitive and, for cooperation tasks, are much more likely to yield cooperation than other tasks with less kinesthetic feedback (Brosnan & de Waal, 2002). The take home message is that it is critical to test subjects on multiple modalities, even for the same question, to ensure that we fully understand what the limits of their abilities really are (de Waal, 2016) and better understand how context influences the behavior in question. This will lead to a more complete understanding of decision-making behavior.
In addition, and certainly not mutually exclusively, subjects may not have been motivated to learn the various outcomes or play one strategy over another. The task is easy and subjects received some reward virtually every trial; therefore, the consequence for not making a strategic decision was merely the potential for receipt of fewer rewards (they are also never food deprived, including daily fruits and vegetables, so they are neither hungry nor lacking in preferred foods). Perhaps the chimpanzees simply learned rapidly that they got some rewards no matter what. Of course, this has been the case in all prior studies as well, and capuchins, rhesus macaques, and some populations of chimpanzees are nonetheless motivated to strategically play payoff-maximizing choices. However, this is not without precedent; other work indicates that subjects can do worse on easier tasks. For example, capuchin monkeys do better on metacognition tasks (i.e., avoiding the most difficult trials) when there are six options to be discriminated among, and the chances of guessing are low, compared to when there are only two options, even though in both cases there are very difficult discriminations that should be avoided (Beran, Perdue, Church, & Smith, 2016;Beran, Perdue, & Smith, 2014). The authors argued that in the task with fewer stimuli, the monkeys are too often rewarded by chance and so lack incentive to pay attention and to monitor how sure they are they can choose correctly. Similarly, another recent paper shows that, counterintuitively, marmosets and squirrel monkeys are much better at a memory task when there are nine options as opposed to two (Schubiger, Kissling, & Burkart, 2016). As with the above argument, the authors argue that in the two-option task, the subjects got rewarded 50% of the time due to chance, and so there was no motivation to pay attention, whereas in the 9-option task there was only an 11.1% chance of picking the right option unless they paid attention. We cannot directly test this, as changing the number of options fundamentally changes the game, and the comparison with other species, but future work could address the possibility that motivation plays a role in their attentiveness by using fewer trials.
Finally, we did not find an effect of exogenously administered OT on pairs' performance. Some subjects showed changes in their individual preferences between the last five experimental and last five control sessions, but for five of those six subjects, the final five sessions of Assurance Game with oxytocin were also their last exposure to the game, so there was a confound between experimental condition and order. As a result, we do not believe that these differences can be ascribed to the effects of the OT. In addition, work on capuchin monkeys, who do show consistent patterns of play, also show no effect of exogenous OT (Smith et al., unpublished data). Unfortunately, the lack of consistent choices means that we cannot determine whether OT affects chimpanzee behavior in this experimental game context. We do note, however, a few potential issues related to our study.
First, although prior studies indicate the efficacy of using a nebulizer to administer oxytocin (Chang, Barter, Ebitz, Watson, & Platt, 2012;Dal Monte, Noble, Turchi, Cummins, & Averbeck, 2014;Modi, Connor-Stroud, Landgraf, Young, & Parr, 2014;Simpson et al., 2014), we cannot know exactly how much of the dose the subjects actually received because they were not restrained. We do note, however, that using a similar procedure, capuchin monkeys show an increase in peripheral OT that is of the same magnitude as that following grooming or fur rubbing (a social behavior specific to capuchins), and in the same timeframe, indicating that for at least some primates, the procedure is effective (Benítez et al., 2018).
Additionally, we followed other studies by including a 30-minute wait period between OT administration and the economic games to allow for OT uptake (Cavanaugh, Huffman, Harnisch, & French, 2015;Chang et al., 2012;Gossen et al., 2012;MacDonald et al., 2011;Porges & Carter, 2011;Scheele et al, 2013;Smith, Ågmo, Birnie, & French, 2010;Striepens, et al., 2013). Whether this is appropriate has recently been questioned, as the time course for OT showing up in the bloodstream and cerebrospinal fluid, and returning to baseline levels, differs by species (domestic dogs: Romero, Nagasawa, Mogi, Hasegawa, Kikusui, 2014; rhesus macaques: Modi et al., 2014). As a result, it is not clear what the ideal time course for testing is, but if chimpanzees respond like other primates, we would have captured an increase in peripheral OT during our data collection. We do note that, in our study with capuchins, the spike following endogenous release and exogenous administration followed a similar time course (Benítez et al., 2018).
Finally, this all assumes that exogenous OT has the predicted effect on social behavior. Indeed, more recent reviews have identified problems with biologically validating peripheral OT measurement and administration, critiqued the robustness of evidence linking OT and prosocial behavior, and cautioned against underpowered studies (Leng & Ludwig, 2016;Nave, Camerer, & McCullough, 2015;Walum, Waldman, & Young, 2016). These are all concerns for the current study, regardless of outcome. Since the Bartz et al. (2011) review, there is an increasing body of evidence that oxytocin does not unilaterally promote prosocial behavior in primates (Brosnan et al., 2015;Mustoe et al., 2015;2016;Parr, Modi, Siebert, & Young, 2013;Proctor et al., 2016; but see Chang et al., 2012;Winslow & Insel, 1991). Of course, even if oxytocin does influence (pro)social behavior, it may be that economic games are not the appropriate paradigm in which to search for those effects.
Overall, we found that chimpanzees rarely develop consistent patterns of play in these three games. This is particularly surprising given the frequency with which capuchin and rhesus monkeys develop consistent preferences for the NE, and that chimpanzees coordinate outcomes in tasks using other procedures (Bullinger, Wyman et al., 2011;Duguid et al., 2014;Sánchez-Amaro et al., 2016). We propose that this difference is largely due to a combination of difficulties that the chimpanzees had with our experimental task and a lack of motivation. In particular, we think that the shorter trial number and relatively more extended delay between choice and reward may make it more challenging to learn the payoff structure in our manual task relative to a computer task, despite the fact that other species, and indeed, other chimpanzees, have developed consistent preferences for the NE in previous manual tasks (Brosnan et al., 2011;. Unfortunately, we are unable to test this hypothesis but hope that others with access to computer trained apes will do so. Finally, the fact that rewards are received on almost every trial may have reduced motivation to pay attention (i.e., Beran et al., 2014;2016;Schubiger et al., 2016). We hope future studies address both of these constraints to allow us to better understand the similarities and differences in decision-making across primates.  Graphical representation of the Bastrop players' responses in the last five sessions of the Assurance Game. Numbers within the circles represent the session number. X and Y-axis reflect the proportion of Stag chosen within a session by Player 1 and Player 2, respectively.   Graphical representation of the Yerkes players' responses in the last five sessions of the Assurance Game. Numbers within the shapes represent the session number; circles are oxytocin sessions, triangles are saline sessions. X and Y-axis reflect the proportion of Stag chosen within a session by Player 1 and Player 2, respectively. Graphical representation of the Yerkes players' responses in the last five sessions of the Prisoner's Dilemma Game. Numbers within the shapes represent the session number; circles represent oxytocin sessions, triangles represent saline sessions. X and Y-axis reflect the proportion of Cooperate chosen within a session by Player 1 and Player 2, respectively.