Testing the Relationship Between Looking Time and Choice Preference in Long-tailed Macaques

Visual bias in social cognition studies is often interpreted to indicate preference, yet it is difficult to elucidate whether this translates to social preference. Moreover, visual bias is often framed in terms of surprise or recognition. It is thus important to examine whether an interpretation of preference is warranted in looking time studies. Here, using touchscreen training, we examined (1) looking time to non-social images in an image viewing task, and (2) preference of non-social images in a paired choice task, in captive long-tailed macaques, Macaca fascicularis. In a touchscreen test phase, we examined (3) looking time to social images in a viewing task, and (4) preference of social images in a paired choice task. Finally, we examined (5) looking time to social images in a nontest environment. For social content, the monkeys did not exhibit clear preferences for any category (conspecific/heterospecific, in-group/outgroup, kin/non-kin, young/old) in the explicit choice paradigm, nor did they differentiate between images in the viewing tasks, thus hampering our interpretation of the data. Post-hoc analysis of the training data however revealed a visual bias towards images of food and objects over landscapes in the viewing task. Similarly, across choice-task training sessions, food and object images were chosen more frequently than landscapes. This suggests that the monkeys’ gaze may indeed indicate preference, but this only became apparent for non-social stimuli. Why these monkeys had no biases in the social domain remains enigmatic. To better answer questions about attention to social stimuli, we encourage future research to examine behavioral measures alongside looking time.

Assessing allocation of social attention is important in understanding what variables play a role in social decision making. For example, in rhesus macaques (Macaca mulatta), male facial skin coloration has been proposed as a sexually selected trait that indicates fitness. In line with this theory, female rhesus macaques exhibit longer looking time to images of male faces with a redder coloration (Waitt et al., 2003), and it thus follows that, in a free-ranging colony, females are more likely to solicit males with dark red over pale faces (Dubuc et al., 2014). Looking time paradigms, thus placed in the context of social behavior, can be informative about social decision-making, such as mate choice.
A limitation to looking time paradigms, however, is their interpretation (Winters et al., 2015). An initial issue is the use of the term "visual preference" to describe prolonged attention to a stimulus. The use of this term is problematic as it often ambiguously suggests that a visual preference could translate to a social preference, an implication that is unsubstantiated without additional behavioral data. Winters et al. (2015) suggest the use of "visual bias" as a less misleading term, which simply implies discrimination between two or more stimuli. A secondary problem of interpreting looking time is in understanding why there is a visual bias, an issue that has also been raised in the field of infant cognition. As with nonhuman subjects, looking time paradigms have provided a window into how infants perceive and discriminate stimuli. However, Aslin (2007) noted the distinct lack of clear linking hypotheses for such discrimination studies, that is, what should looking time tell us about the underlying cognitive process in question? Framing this in primate terms, a number of studies have examined how study subjects respond to images of conspecifics versus heterospecifics. Sumatran orangutans (Pongo abelii), humans (Homo sapiens), brown capuchins (Sapajus apella), rhesus, Tonkean, Japanese, bonnet and pig-tailed macaques (Macaca mulatta, M. tonkeana, M. fuscata, M. radiata, M. nemestrina) and red-fronted lemurs (Eulemur rufifrons) look longer at images of conspecifics versus heterospecifics (Adams & Macdonald, 2018;Dufour et al., 2006;Fujita, 1987;Rakotonirina et al., 2018). Attending to conspecific stimuli could be advantageous as they contain socially relevant cues (Rakotonirina et al., 2018). Yet contrastingly, studies in rhesus and stump-tailed macaques (M. arctoides) report visual bias for heterospecific over conspecific images (Fujita, 1987;Méary et al., 2014). In this case, it is difficult to assert that subjects exhibit a preference for a certain stimulus type, as the underlying motivation for visual bias is unclear. That is, do subjects look longer due to socially relevant cues, or because they see something novel that captures their interest? Such responses are perhaps better framed in terms of interest rather than preference, at least until more is understood about what drives a visual bias within this context.
There is an additional, paradoxical, problem in the looking time literature: attention has been interpreted to indicate anticipation or expectation. Anticipatory gaze was used by Krupenye et al. (2016) in examining apes' understanding of false belief. A scenario depicted an observer looking for a displaced object. Test participants looked more often towards the place where the observer last saw the object than where the object was last placed, suggesting that they anticipated where the observer would look for the object. A similar interpretation was made in assessing captive chimpanzees' responses to visual depictions of infanticide. Despite having experienced infanticide, chimpanzees still looked longer at infanticide scenes over conflict scenes, suggesting that infanticide violated their expectations of regular social behavior (to care for an infant; von Rohr et al., 2015). Similarly, male long-tailed macaques looked longer at images containing incongruent information about group dominance hierarchy, such as a dominant individual being submissive to a subordinate (Overduin-de Vries et al., 2016). In contrast, chimpanzees did not look longer at scenes depicting unexpected social outcomes, instead attending primarily to negatively valenced emotional content (Wilson et al., in prep). The point here is that depending on the paradigm in which a stimulus is framed, the interpretation of attention varies from one of preference, recognition, or interest to one of expectation or violation of expectation. The problem is that, regardless of the paradigm presented, underlying characteristics that motivate attention still vary (Winters et al., 2015), an issue that seems to be ubiquitous across species, including humans. Moreover, reasons for why one stimulus might motivate a longer looking time than another stimulus could vary within a study, for example, gaze allocation might depend on a combination of or alternation between factors such as interest, recognition, preference, and surprise. The potential multi-faceted nature of looking time is therefore worthy of more detailed consideration.
Given the extent to which gaze responses are used in cognitive research, it is important to address underlying motivations of visual bias and thereby clarify interpretations regarding looking time.
Returning to the concept of preference, there is some evidence that visual bias may in fact play a role in explicit choices. For example, when human subjects are manipulated to look at a stimulus for longer, then they are more likely to select that stimulus than a counterpart that is presented for a shorter period (Shimojo et al., 2003). Similarly, in a forced-choice paradigm, women with a preference for masculine faces exhibited stronger gaze patterns to preferred stimuli (Lyons et al., 2016), suggesting that visual bias may reflect preferences. The limitation here is that gaze and preference were assessed in the same task, and thus could not be decoupled. In the current study, we tested the idea that visual bias indicates choice preference, by comparing looking time in a viewing task to explicit choice in a separate preference task, in a group of captive long-tailed macaques.
We first trained the monkeys on a novel viewing task, where they could control the time the image was presented, and a separate choice task, where they had to choose between two images. For training, we used images of food, landscapes, and novel objects, with a separate set of images for each task to maintain novelty. To assess for preferences during training on the choice task, we predicted that if the monkeys perceived the images as meaningful representations (Fagot, 2000;Fagot et al., 2010), they would choose images of food over landscape or objects, and images of objects over landscape, as both food and novel objects (e.g., tools, keys, glasses) motivate their interest (see Appendix for more details). We carried out unplanned analyses of the training data. Due to the use of different image sets between tasks, we could not, however, examine the relationship between looking time and choice.
To test the monkeys' looking time and preferences for social images, we presented them with a different viewing task where images were presented for a fixed time, as well as a choice task that used the same paradigm as for the non-social stimuli. We presented four types of social image pairings: kin vs. nonkin, young vs. old, ingroup vs. outgroup, and conspecifics vs. heterospecifics. The same images were used in both tasks. We expected longer looking times for outgroup over ingroup (Gothard et al., 2004;Schell et al., 2011), younger over older monkeys (Almeling et al., 2016) and non-kin over kin (Pfefferle et al., 2014). We made no directional predictions for species, given previous conflicting findings (Adams & Macdonald, 2018;Dufour et al., 2006;Fujita, 1987;Méary et al., 2014;Rakotonirina et al., 2018). We expected that if monkeys' looking time reflected preference, then we would see a relationship between looking time and choice across image categories. Additionally, we conducted a separate "free view" task in the monkeys' home enclosure, allowing us to validate our findings for the looking task results without reward motivation and in a bigger sample. For this task we used only the age category images and thus our predictions were consistent with those for the previous looking task.
We collected data in three experimental phases, on five different tasks. Experiment 1 consisted of training monkeys on two different touchscreen tasks (viewing task 1 and choice task 1) using non-social images. Experiment 2 consisted of testing monkeys' responses to social images on two different touchscreen tasks (viewing task 2 and choice task 2). Experiment 3 consisted of collecting data on the free view task, where monkeys viewed images presented in their home enclosure. In all three viewing tasks, images were presented individually, and we measured looking time to each image, defined by the monkeys clearly orienting their head and eyes towards the images. The choice tasks presented the monkeys with pairs of images. In choice task 1, they had to learn to select one image to progress to the next trial and subsequently, the reward. We then tested their social preferences on this paradigm in choice task 2. For this we measured type, and side, of image chosen. All tasks except the free view task took place in test cubicles where monkeys interacted with a touch screen. The free view task took place in the monkeys' enclosure (see Table A1). Details of each task are provided below.

Participants
We collected data from a group of 36 long-tailed macaques (Macaca fascicularis) housed at the German Primate Center, Göttingen. The group had access to both an indoor (49 m 2 ) and outdoor (141 m 2 ) enclosure, with ad libitum access to food, water, and enrichment. Details of participating monkeys can be found in Table A1.

Testing Apparatus
We used the XBI touch screen -eXperimental Behavioural Instrument (Calapai et al., 2016), with a 15 in. Elo touchmonitor, MacBook Air (macOS Sierra Version 10.12.6), HD Wi-Fi camera with wide angle lens, and two pumps and drinking tube for automated fluid reward (see Figure 1). Experiments were run using MWorks (version 0.7). Individuals participated voluntarily in all touch screen testing, which took place in a separated testing area divided into six cubicles (height: 2.6 m; width: 2.25 m; depth: 1.25 m). Monkeys were rewarded for participation during testing with diluted grape juice (diluted at 200 ml juice:100 ml water; 0.25 ml per reward). Testing hours were Monday to Friday, 10:00 -12:00 and 14:00 -18:00.

Figure 1
Tasks and Apparatus Note. Panels A, B: View of the testing enclosure and XBI. Panel C: Viewing task 1 on the Elo touch screen and Panel D: on the XBI. Panel E: Choice task; the monkey selects the target, then chooses one of two images presented; Panel F: Viewing task 2; the monkey touches the target for a reward and then is presented with an image. Panel G: Free view task: (left) viewing apparatus; (middle) set up: (a) experimenter, (b) test monkey, (c) image held in frame, (d) camcorder; (right) participating monkey.

Touchscreen Training
For the touch screen tasks, monkeys were trained and tested individually. All monkeys received prior training on the touch screen before commencing any task. When they exhibited excess caution towards the screen, for example by retreating when a stimulus appeared on-screen, we habituated them by pairing them with a partner who was comfortable interacting with the touch screen display.

Video Coding
Looking time (in viewing task 1, viewing task 2, and the free view task) as well as which side of the screen the monkey touched first, and which image they touched first (choice tasks 1 and 2) were coded from videos (image chosen was recorded by MWorks, but sometimes the monkeys' first touch to an image did not activate it, so we checked for this when coding the videos). We used the free behavioral coding software SolomonCoder (version 17.03.22). For each task (excluding choice task 1; see Appendix for details), a subset of videos was coded by two coders to assess interrater reliability. At least one of the coders was always naïve to the hypotheses.

General Analytical Procedure
Analyses were conducted in R studio version 1.0.153. We ran a series of mixed effects models using the lme4 package. For all models, we ran bootstrapped 95% confidence intervals. Where we adjusted models with a Bonferroni correction, we report the corresponding adjusted confidence intervals. For the viewing tasks, our analyses excluded images the monkeys did not look at. For viewing task 1, choice tasks 1 and 2, and visual bias to image category in viewing task 2, we set the number of adaptive Gauss-Hermite quadrature points to zero to aid model convergence. The number of Gauss-Hermite quadrature points (nAGQ) determines how the random effects are distributed in the model. When fitting glmms using the lme4 package in R, nAGQ defaults to one (Bolker, 2019; Zhang et al., 2011). Setting nAGQ to zero means that parameter estimation is less exact, but faster (Bolker, 2019), since the computational time decreases with fewer quadrature points (Kim al., 2013). Each model is described in detail below.

Experiment 1 Viewing Task 1
We examined looking time to images in the first training session only, when the images were novel (see Appendix for details about further sessions).

Participants
We report data from 13 monkeys trained on this task (6M; 7F; M age = 5.91 years, SD = 5.47), eight of which received their initial training on a different apparatus (see Table A1 and Appendix for details).

Stimuli
Sixty images of non-social content were used. Images were divided into three categorieslandscape, object, and food -with equal pairings between each category. Images were scaled in GIMP to 4000 x 3000 pixels at 72 ppi. For more details see the Appendix.

Design and Procedure
An image appeared on screen (19.63 cm x 19.63 cm) with a target beneath (grey square: 3.93 cm x 3.93 cm; see Figure 1). The monkeys could view the image for 60 s, or, by pressing the target, change the image sooner. The purpose of this was to give the monkeys control over how long they viewed each image. Each session consisted of 20 images. Reward was given at random intervals, so monkeys were not reinforced to touch the target but learned to change the picture by trial and error, thereby removing foodbased incentives for image viewing. Order of image presentation was randomized. See the Appendix for additional details.

Interrater Reliability
We calculated interrater reliability for looking time to each image for seven monkeys, using intraclass correlation coefficients (Koo & Li, 2016;Shrout & Fleiss, 1979). We estimated the reliability of mean values across k coders, ICC(3,k). For viewing task 1 we report an ICC(3,k) of .72, indicating good reliability (Koo & Li, 2016).

Models
To examine differences in initial looking time to each non-social image category, we ran three pair-wise generalized linear mixed models, fitted with a Gamma distribution, adjusted with Bonferroni correction. The models contrasted 1) food versus landscape, 2) landscape versus object, and 3) food versus object. Looking time was the dependent variable, image category was the binomial fixed effect and ID was a random effect.

Results
Looking time to images of food and objects was significantly longer than for landscapes, even after adjusting for multiple comparisons. There was no significant difference in looking time between objects and food (see Table 1 and Figure 2, left).

Stimuli
Stimuli are as described for viewing task 1, except we used separate training sets for each task to ensure that images were novel and therefore promoted interest in the task.

Design and Procedure
We presented monkeys with pairs of images from which they could make a choice. Each trial started with a central target (green circle, 3.93 cm x 3.93 cm) on the screen, which provided reward when touched. Next, two images were presented on either side of the screen (7.85 cm x 7.85 cm). Order and side of image presentation were randomized. Participants had to select one image, though they were not rewarded for this -by providing juice only at the start of each trial (upon touching the target circle), reward was decoupled from image viewing. Image pairs were presented for up to 300 s, after which a new trial would start. When an image was selected, the other image would disappear and the selected image would increase in size (15.7 cm x 15.7 cm; see Figure 1) and remain for six seconds, before the next trial started.
Each session consisted of 20 trials. Monkeys received a minimum of six training sessions, and reached criterion when they selected an image on every trial, within 20 s, over two consecutive sessions.

Results
We examined image choice across training sessions for seven monkeys that passed training, using a generalized linear mixed Poisson model. We excluded any incomplete training sessions (n = 21 for five monkeys). Number of instances of each category chosen per monkey/session was the dependent variable, image category was the fixed effect and ID was the random effect. In a full model, we compared food and object choices over landscape. Consistent with looking time, monkeys chose food and object images more than landscape. A subset model that compared only food and object choices revealed that monkeys chose food images significantly more than object images (see Table 2 and Figure 2, right). Due to some procedural adjustments during training (see Appendix for details), we ran additional models to determine whether these changes had any overall effect on choice preferences. Results were consistent with our full model, reflecting preferences for food and object over landscape in both early and later trials (see Figure  A1 and Appendix for details). To check for side bias of image selection, we included which side image was chosen as the binary dependent variable, image category chosen as the fixed effect and ID as a random effect. We found no indication that which side image was chosen (left or right of screen) was related to image category chosen (food versus landscape: OR = 0.94, SE = 0.12, 95% CI [0.73, 1.18]; object versus landscape: OR = 1.03, SE = 0.12, 95% CI [0.80, 1.29]).

Discussion
As the images used for viewing task 1 and choice task 1 were, for the purposes of training, different, we could not directly examine the relationship between looking time and choice for these two tasks. The results do, however, indicate that preferences for images of food and objects in choice task 1 were consistent with longer looking times to novel food and object images in viewing task 1. This suggests that there may be a relationship between looking time and category preference for non-social images.

Experiment 2 Viewing task 2
Viewing task 2 was developed as an alternative testing procedure to that developed with viewing task 1. Although six monkeys reached criterion on viewing task 1, testing using this method was not feasible (see Appendix for details). Using data from viewing task 2, we examined the relationship between looking time to social images, and selection of social images in Choice task 2. Choice task 2 and viewing task 2 were counterbalanced in order of presentation.

Participants
Eleven monkeys were tested, but one monkey did not continue to engage the apparatus and thus failed to complete the test, so data were analyzed from ten monkeys, seven of which took part in viewing task 1 (6M, 4F; M age = 5.07, SD = 3.96).

Test Stimuli
Test stimuli were the same images of social content as for the choice task.

Design and Procedure
The monkeys started each trial by pressing a target (purple diamond; 3.93 cm x 3.93 cm) to receive a reward. They were then presented with one image (15.7 cm x 15.7 cm; Figure 1) for six seconds, randomized in order. As the monkeys were familiar with the target-reward system, no training was required. Each monkey received one test session consisting of 36 -46 trials, which varied due to number of kin images available per subject (between 18 and 23 image pairs: M = 20.71 trials, SD = 1.89).

Results
To examine visual bias towards social images, we ran separate generalized linear mixed models for each image category, kin, age, group, and species. Viewing time was the dependent variable, type of image viewed was the binomial fixed effect, and ID the random effect. Looking time did not differ significantly between image type within any of the four social image categories (see Table 3).
We also examined whether looking time to images decreased across trials. We ran a linear mixed model with log transformed looking time as the dependent variable, trial number as a fixed effect and ID as a random effect. Looking time did not decrease significantly across trials (b = 0.003, SE = 0.003, 95% CI [-0.004, 0.01]).
To account for some variation in image content, we additionally conducted sensitivity analyses, finding no effect of variation in image content on looking time (see Appendix).

Participants
Seven monkeys that reached criterion in choice task 1 were tested on this task (5M; 2F; M age = 5.89 years, SD = 4.46). All seven monkeys also took part in viewing task 2.

Test Stimuli
Monkeys were presented with images of kin vs. nonkin (one to six pairs), ingroup vs. outgroup (four pairs), conspecifics vs. heterospecifics (four pairs) and young vs. old (eight to nine pairs). Images were sorted into pairs and matched by sex, approximate age and rank, and gaze direction. Images were cropped and scaled in GIMP to dimensions of 3000 x 3000 pixels at 72 ppi.

Design and Procedure
Each monkey received one test session. Session length consisted of 18 to 23 trials which varied due to number of kin images available per subject (M = 20.71; SD = 1.89).

Interrater Reliability
From six videos, reliability of side chosen and first touch were assessed by checking for discrepancies between coders. Three discrepancies for side chosen were found, only one of which indicated an error by the primary coder. This was corrected in data analysis. Two discrepancies were found for first touch, neither of which were errors by the primary coder.

Results
There were no clear group-level preferences for image choice. To examine whether side chosen predicted image choice, we ran four generalized linear mixed models, one per image category, with side chosen as the dependent binomial variable, image type chosen as the fixed effect and ID as the random effect. Which side of the screen image was chosen was not related to the type of image chosen ( To examine whether looking time in viewing task 2 predicted image choice in the choice task, we aggregated looking time to each image group per monkey, and paired this with the number of instances images in each group were chosen. We ran a generalized linear mixed model for each image category, with a Gamma distribution, looking time as the dependent variable, instances images were chosen as fixed effects and ID as a random effect. Image choice did not predict looking time to images in viewing task 2 across image categories (see Table 3, grey rows).
To examine whether there was any effect of test order (i.e., the order in which monkeys received the choice task 2 or viewing task 2) on response in the viewing task, we ran a generalized linear mixed model with a Gamma distribution, with looking time as the dependent variable, order as a fixed effect and ID as a random effect. Seeing the images in the choice task first did not have any effect on looking time to the images when receiving the viewing task second (b = 0.43, SE = 0.28, 95% CI [-0.47, 1.24]).

Experiment 3 Free View Task
This task was conducted in the monkeys' enclosure, to verify that our touch screen results from earlier experiments were not simply an artifact of the testing environment. That is, we carried out the test without providing food reward as a motivator and were also able to test a larger sample by including monkeys that do not usually come into the testing cubicles. We followed the methods of Schell et al. (2011) and Almeling et al. (2016).

Participants
We tested 29 monkeys (10M, 19F; M age = 7.22 years, SD = 6.82), 14 of which had participated in at least one of the touchscreen tasks, and ten of which had participated in viewing task 2.

Test Stimuli and Apparatus
Following results for Almeling et al. (2016), we chose two images from the age category of the prior touch screen studies, depicting the same female at two different ages. We selected images for a female who was not a regular test subject, as we were not sure if we would be able to test all non-regular test subjects. We also wanted to re-test our regular test subjects in their home enclosure to examine whether their responses to photos were consistent with responses in the testing cubicles. Each image was cropped to a circular frame with a 17 cm diameter, printed on A4 paper with a white background and set inside a wooden frame (Figure 1). Each subject saw both conditions. Condition order (i.e., young vs. old) was semi-randomly counterbalanced.

Design and Procedure
During testing, which image to present was determined by an assistant from a pre-determined order of images to be shown to each subject. The experimenter was blind to the conditions. Subjects were tested alone in one area of the enclosure and sitting near to the mesh. The experimenter would approach within 1 m, holding the frame at eye level to the monkey with a handheld camera (Panasonic HC-X909 and HC-X929, full HD with wide angle lens) above the frame. The experimenter wore sunglasses and kept their head tilted down to avoid eye contact with the monkey. The image was covered with a white sheet prior to testing. The experimenter would remove this at the start of the experiment and the assistant would start a timer. The image was kept in place for 60 s, or less if the monkey left, or another monkey approached. Monkeys were not rewarded in this task.
In four instances we repeated a condition (three times for the "old" image condition, one time for the "young" image condition): twice because the experiment was disturbed by another monkey in the first three seconds, and twice because the test subject was distracted and left almost immediately.

Interrater Reliability
Viewing time and task length (due to viewing time variation) for 29 videos were coded, with an ICC(3,k) of .93 and .99 respectively, indicating excellent reliability (Koo & Li, 2016).

Results
In four cases where sessions were repeated, the initial sessions were removed from further analysis. First, we examined whether taking part in the touch screen test had an effect on looking time in the free view task. As fewer monkeys had seen the images before (N = 10) than those that had not (N = 19), we used a linear mixed model with log-transformed looking time as the dependent variable and ID as a random effect. We made a binary variable that split monkeys into those that had or had not seen the pair of images used for the free view task previously and examined it as a fixed effect in the model. Monkeys that had seen the images before had significantly reduced looking times to images in the free view task Finally, we examined whether looking time differed between image type for naïve monkeys only (those that had not taken part in viewing task 2; N = 19), with a paired t-test. Despite differences in looking time between naïve and experienced subjects, when we examined differences in looking time to each image for naïve monkeys only, there was still no significant difference (t(18) = -0.29, p = .77). These findings for looking time match those from the touch screen studies.

Looking Time Differences in the Free View Task Between Monkeys That Did and Did Not See Images in the Touch Screen Test.
Note. Central horizontal bars indicate median values; upper and lower horizontal bars indicate the interquartile range (IQR); vertical error bars represent data points up to 1.5 x the IQR.

General Discussion
We examined whether looking time is an indication of preference, by measuring image choice explicitly in relation to looking time responses. The results provided mixed evidence for a relationship between looking time and preference. On the one hand, there was no relationship between tasks that assessed (1) looking time to, and (2) choices between images of social content. This lack of relationship is likely explained by the absence of variance in responses -there were no patterns in either task indicating visual bias or preference for any of the social categories. On the other hand, the training data revealed patterns of initial visual bias, and overall preference, for images of objects, and especially food, over landscapes. Differences for image choice between food and landscape held when accounting for adjustment in the training procedures for the choice task. The results also revealed a choice preference for food over object images; however, this was not reflected in the looking time data. Since food and landscape images represent the highest and lowest preferred categories respectively, this suggests that links to visual bias may only be found for image categories with high preference differences. Overall, these results suggest that visual bias to images of food over landscape, and to a lesser extent images of objects over landscape, does reflect preferences when presented with an explicit choice task.
The fact that looking time in viewing task 2 did not decrease overall across the testing sessions, and that responses by naïve subjects in their home environment elicited similar responses as experienced monkeys, suggests that these findings are not a result of the methodology used, such as presenting the images on a screen or in succession. The free view task, however, did reveal order effects, in that monkeys looked longer at whichever image they viewed first. We also found that monkeys looked at images longer when they were novel, suggesting that previous experience influences gaze behavior. This latter finding is not likely explained by differences in exposure to the researchers, since testing always took place adjacent to the home enclosure, and observational data collection as well as target-training were carried out in the home enclosure, giving all monkeys equal chance of exposure to the researchers.
Why the monkeys showed no category-level biases towards the social stimuli, in contrast to earlier studies (Almeling et al., 2016;Gothard et al., 2004;Pfefferle et al., 2014;Schell et al., 2011) remains unclear, but may be due to differences found at the individual rather than species level. Due to our small sample size in the touchscreen studies, we were unable to explore whether this was the case.
Although our results tentatively suggest that visual bias in a viewing task could indicate preference in an explicit choice task, it is necessary to understand why attending to one stimulus more than another could be meaningful to the study individual or species. This is important if gaze data is to be interpreted from an ecologically valid perspective (D'Eath, 1998;Morton et al., 2016). Indeed, as Aslin (2007) points out, one should also consider that a lack of bias towards one stimulus-type does not mean that subjects do not discriminate between stimuli. When using looking time as an experimental measure, we encourage the use of clear linking hypotheses between visual bias and the underlying cognitive mechanisms. This should include paying particular consideration to the social context that is being assessed, since this is likely to drive gaze allocation.
To address the issue of ecological validity, one approach would be to examine visual bias in parallel with other responses. For example, red-fronted lemurs spend more time looking at and more time sniffing images of conspecifics compared with heterospecifics (Rakotonirina et al., 2018). A number of studies have focused only on behavioral responses to measure image discrimination. In goats (Capra hircus), ears are positioned forward for longer in response to conspecific negative over positive facial expressions (Bellegarde et al., 2017), which could be an adaptive response to potential threat (Bethell et al., 2012). Goats were also found to direct first interactions towards positive over negative human facial expressions (Nawroth et al., 2018). Discus fish (Symphysopdon aequifasciatus) perform partner specific displays to lateral images of their mate, but not to images of unfamiliar conspecifics, suggesting that they use facial color patterns to differentiate between conspecifics (Satoh et al., 2016). Similarly, female jumping spiders (Maevia inclemens) were found to exhibit sexual receptivity more often towards video of normal male morphs than digitally altered morphs (Clark & Uetz, 1993). Approach behavior has been used as a way to measure response to facial stimuli in both brown capuchins (Morton et al., 2016) and horses (Equus caballus; Wathan et al., 2016). These measures may be more informative to answer questions about stimuli recognition, avoidance, or preference, and in future research should be considered either alongside or instead of looking time paradigms.
An additional avenue of research that now shows promise for a number of species, such as primates (Hopper et al., 2020) and dogs (Somppi et al., 2012), is the use of eye tracking. Eye tracking can be conducted non-invasively and thus is of increasing application to a wider scope of research setups such as in zoos (Hopper et al., 2020). The benefit of this approach is that it allows one to examine in detail gaze responses to social stimuli, such as how species explore facial images (Kano et al., 2012;Wilson et al., 2020). This could be particularly useful in instances such as that presented in the current study, where we found no visual bias amongst categories of different social stimuli. An eye tracking approach in this case could instead reveal differences in tracking patterns between such stimuli and could be more informative in highlighting attentional differences at the individual level.
Regarding our own methodology, there are several limitations to address. First, in viewing task 1, the use of the reward randomization did not, as we hoped, dissociate the food incentive from viewing the images. This did not detract however from initial attention to the images. Second, we made several procedural adjustments during training for the choice task. Sensitivity analyses however revealed that these changes had, at best, marginal effect on overall choices. An additional limitation is that although our training data suggest a relationship between looking time and choice, the use of different training stimuli meant we were unable to test this directly (since these analyses were unplanned). Moreover, training on the choice task took place after training on the initial viewing task, so we could not counterbalance task order, although, the fact that different image sets were used in each task means that this did not affect image novelty. Finally, in this study we focused on the assessment of response to static images. Further investigation of looking time in response to dynamic stimuli would be beneficial in understanding the relationship between gaze and preference.

Viewing Task 1
Apparatus for Training. For viewing task 1, initial training for ten monkeys took place on an Elo touch screen, before being transferred to training on the XBI touch screen. As we used the first training session to examine looking time to novel non-social images, data from eight of these monkeys was taken from sessions with the Elo touch screen (two were excluded as top positioned cameras were missing from their first session, and these were required for coding image type). This earlier training took place with the use of an Elo 17 in. SXGA TFT touchmonitor which was connected to an external MacBook Pro computer which ran on OS X El Capitan version 10.11.6 (see Figure 1). In this set up, cameras from the side and above filmed the monkeys during training, and monkeys were rewarded with cut raisins. The experimental software, MWorks, and the task, remained the same across both touch screen set-ups, with the exception that on the XBI, the task background was changed to white. Due to the lack of light in the XBI set-up, a white background enabled better vision of the monkey in the set-up.
Training Stimuli. Landscape images were chosen from pre-existing footage taken by VW. Images were taken of the monkeys' food items, such as cut fruit and vegetables, and of novel household objects, including stationery, ornaments, and kitchen utensils, which the monkeys were unlikely to have seen before. Novel objects were used to promote interest in the images. Task Limitations. Viewing task 1 was developed as a pilot method prior to the other tasks, to allow the monkeys to learn, through training with non-social images, to control how long they viewed each image for, similar to Fujita (1987) and Tanaka (2003). We tried to test participants on this task. However, we were not successful, as the reward randomization did not work. In initial training sessions, monkeys took time to view the images and press the target. Over time, however, they learned to press the target repetitively and thus lost attention to the images, which meant that in later training sessions, it was not possible to code looking time to most of the images. Consequently, for this task we only present looking time from each monkey's initial training session when images were novel and they were clearly attending to them. This approach parallels viewing task 2, where monkeys only saw each image once.

Choice Task 1
Video Coding. To assess monkeys' progress during training, all videos were coded by VW following each training session. This was done by playing videos in VLC and noting choicesincluding (non-social) image category chosen and side image chosen -in Microsoft Excel. We used these data to analyze the monkeys' choices across sessions. As the coder had to be familiar with the images in the training data in order to code the category correctly, obtaining a second coder for this was not possible.
Procedure Adjustments. Due to the monkeys' short attention span, this was a challenging task for them to learn. We thus adjusted some variables during training to make the task easier for the monkeys. The trial-start target was added after initial sessions (3 to 4 sessions each) with four monkeys, in order to separate the reward from the images; in initial sessions, reward was provided when the image pair appeared on screen, which distracted monkeys from choosing an image. The image pairs were initially presented for 20 s. If the monkeys did not select one of the two images within this time, a new trial would begin. However, this often led to the monkeys avoiding the stimuli and waiting for the next trial to begin automatically. To prevent this, we increased the presentation time of the pair of pictures to a maximum of 300 s (this was implemented after an average of 5.88 sessions for eight of the participants and from the beginning for the other five participants). To encourage the monkeys to respond to the pairs presented, we also added a "red screen" time-out for 3 s if they did not respond within the time limit (this was implemented after an average of 18.13 sessions for 8 monkeys and was not implemented for the other five monkeys). Once an image was selected, the enlarged image was initially set to remain for 10 s, but as the monkeys lost interest during this time, after three to four initial sessions in four subjects, we decreased it to 6 s. Sensitivity Analyses for Procedure Adjustments. Of the 2,220 trials analyzed, there were 200 trials when no start target was displayed; 540 trials when the image pair was displayed for 20 s instead of 300 s; 160 trials when the enlarged image was displayed for 10 instead of 6 s; and 420 trials when the red screen was used.
We ran two additional models to determine whether the procedural adjustments during training had any effect on the results. In the first model, we examined image choice in a subset of data across 7 monkeys using a Poisson model. This model focused on monkeys' initial training sessions (mean number of sessions = 4.57 (SD = 5.13)), before any changes to procedure were implemented. Food and object images were chosen significantly above landscape images in these initial sessions (food: OR = 1.68, SE = 0.11, p < .001, 95% CI [1.34, 2.13]; object: OR = 1.59, SE = 0.11, p < .001, 95% CI [1.28, 2.04]; see Figure A1). The second model examined image choice across later trials, after changes were implemented (mean number of sessions = 8.29 (SD = 3.25). Results showed a consistent pattern as with earlier trials (food: OR = 1.64, SE = 0.08, p < .001, 95% CI [1.40, 1.93]; object: OR = 1.40, SE = 0.08, p < .001, 95% CI [1.21, 1.64]; see Figure A1). These findings are consistent with the full model, assessing response across all trials.

Test Stimuli
For the kin and age categories, as well as ingroup images, photos of familiar conspecifics were used. For outgroup conspecifics, we used images from a group of unfamiliar long-tailed macaques at the German Primate Center, to ensure all images were of equally unfamiliar individuals. For other species, we used images of Barbary macaques taken at the monkey park Forêt de Singes, Rocamadour, France.
When matching images into pairs, in some cases images were mirror-flipped to assume the same gaze direction as the matched pair (N = 4 for images of familiar conspecifics). The species and group categories contained only images of females with averted gaze. The age and kin categories contained both male and female images, with both direct and averted gaze. Where necessary, images were also adjusted for brightness, saturation, and color balance, to ensure minimal differences in lighting conditions when the images were taken. Images in the age condition were matched by individual, that is, we used images of current conspecifics, who were at either juvenile (1 to 3 years old) or subadult (3 to 4.5 years old) age and matched these with images of the same individuals when they were infants (< 1 year). In the young group there was one exception; one individual was 23 months old. This image was matched with a subadult image, and the age difference between the image groups was not smaller than the age difference for the other images. All images in the older age group were taken an average of 2.03 years after the infant images, and had an average age of 2.83 years (SD = 0.90).

Sensitivity Analyses
Viewing Task 2. For viewing task 2, in four cases, images of familiar conspecifics were mirrorflipped for testing to match the gaze direction of the paired image. To check whether this had any effect on looking time, we removed the cases of mirror-flipped images and re-examined the data, which made no difference to looking time between image groups ( Figure A2).
In the young age group, one individual was significantly older than the others photographed. We examined whether removing this image from the model had any effect on the difference in response to younger versus older. Removing images of the older individual from the young age category (eight cases) reduced the mean looking time for the young group from 1.68 s to 1.63 s, and had no significant effect on the model (b = 0.07, SE = 0.20, 95% CI [-0.34, 0.52]). We also ran an additional analysis on a subset of the data for the age category, to account for differences between image pairs of difference between age groups. Three image pairs had a between-age group mean difference of 40.3 months. The other seven pairs had a between-age group mean difference of 18.6 months. We re-ran the generalized linear mixed model for the age category from the main analysis, examining looking time just for this subset of images. For the age category differences, initial examination of the mean looking times revealed that, for the seven image pairs with a smaller age gap between groups, mean looking time to the young group (M = 1.72 s) was almost the same as for the older age group (M = 1.74 s). For the three image pairs with a larger age gap between groups, mean looking time was higher for the young group (M = 1.68 s) than the old group (M = 1.61 s). The model revealed no significant difference in looking time between older and younger images, for the three image pairs with the larger age gap (b = 0.05, SE = 0.21, 95% CI [-0.71, 0.78]; see Figure A3).
Finally, we assessed, using linear mixed models, whether differences in sex of photographed monkeys, or gaze direction of images, affected looking time, particularly as direct gaze may be perceived as threatening (de Waal, 1976). Choice Task 2. In addition to examining the relationship between image choice and looking time, we examined whether first touchinstances where monkeys lightly touched the image, but did not touch it enough to activate it to changehad a relationship with looking time. We ran a generalized linear mixed model for each image category with first image touched as the dependent binomial variable, log transformed viewing time and side image chosen as fixed effects and ID as a random effect. Consistent with results for which image chosen in each trial, first touch had no relationship to looking time for any of the categories ( Table A2). Note. Numbers depict no. training sessions prior to testing. VT = viewing task. Bold indicates counterbalance between viewing task 2 and choice task, i.e. which task monkey completed first. Y = completed test, F = failed to start or complete test. Bolditalics = viewed adult condition first. † Excluded from training data analysis, due to lack of downward angled camera in their first session, from which footage the coder could extract image type. *Tested in error. 1 Age calculated to the nearest year. Age determined from when touchscreen testing began. For monkeys that took part only in the free view task, age is calculated from when the free view task began. Note. Number of instances images chosen from each category (top) and number of instances when monkeys first touched an image but did not select it (i.e., image size did not increase).