The Legislative, Ethical, and Conceptual Importance of Replicability in Farm Animal Welfare Science

In this commentary, we discuss three replicability issues that are specifically relevant to research regarding farm animal welfare: (1) Legislative action, and its potential economic consequences, should derive from robust and replicable research to benefit animals kept in an industrial setting. (2) From an ethical perspective, the use of relatively few additional animals in replication studies in farm animal welfare science can be justified in comparison with the high numbers of animals of the same species that are routinely kept in millions for production purposes. (3) Conceptually, studies in farm animal welfare science can take advantage of heterogeneous farm settings that can serve as the population for sampling and, accordingly, follow a framework that improves external validity and thus increases replicability. In light of these issues, animal welfare scientists should more strongly embrace measures to increase replicability in their research.

A considerable amount of published results will turn out to be false (Ioannidis, 2005). This might be due to questionable research practices, such as p-hacking (changing statistical methods until the pvalue is under the threshold for significance) and HARKing (Hypothesizing After Results are Known; Head et al., 2015, or due to the chance aspect of the p-value and the dichotomy of defining a result as a "true" effect once a certain threshold is reached (e.g., p < .05; Cumming, 2012;Wasserstein et al., 2019). Nonetheless, a failure to replicate effects is an integral process of science, where hypotheses are tested and re-tested and some are accordingly identified as reliable and robust, while others are not (Earp & Trafimov, 2015). To better understand the presence or absence of specific effects, it is not only key to replicate a wide range of experiments, but also to identify field-specific research characteristics to assess the importance, necessity, as well as the limitations of replications.
In research conducted on animals, we often encounter a trade-off between the need to replicate studies and the concerns about the welfare of the animals involved (Prescott & Lidster, 2017). This is particularly relevant when experiments involve sub-optimal housing conditions (e.g., when animals are kept alone or when housing conditions are barren) and/or harmful treatments (e.g., by inducing pain or fear). One main area addressed in the field of applied ethology which faces the same trade-off is the welfare of animals under human care with a focus on farm animals. In farm animal welfare science, three issues are specifically relevant when considering replicability: the legislative desirability, the ethical acceptability, and the intrinsic heterogeneity of study sites. Here we argue that: (1) animal welfare legislation and its potential economic consequences on animal keepers such as farmers has a specifically high demand for solid replication, (2) the ethical aspect of using animals for experimentation should be considered in the context that animals of the same species are routinely kept in millions in an industrial setting, and (3) testing animals at different facilities (e.g., farms) may provide some conceptual problems but, in effect, mainly benefits replicability by providing heterogeneous experimental contexts.

The Legislative and Economic Aspects of Replicability in Farm Animal Welfare Science
Farm animal welfare scientists assess how farmed animals cope with their environment (Gygax & Hillmann, 2018), which results in scientific knowledge on what animals want and like (Dawkins, 2004). This knowledge is meant to be used to improve housing conditions of animals kept under human care. The final step is political, though, and can be viewed as a societal compromise between economic and ethical factors. The knowledge on effects of specific interventions/treatments has the potential to lead to new recommendations on animal husbandry that directly influence how animals are kept or managed. This knowledge can then be implemented in either new (compulsory) legislation or in voluntary product labels. In both cases, convincing policymakers and practitioners can be difficult due to potential substantial economic cost that occurs when husbandry systems need to be adapted. These expenses usually need to be covered by farmers, who need to balance a (potential) improvement of their animals' welfare and their own economic constraints. If findings from applied research ought to be adapted to legislation and thus implemented into industrial settings, we need to be able to assure policymakers that the detected effects of treatments are reliable, robust, and meaningful. To keep the number of falsenegative findings low, farm animal welfare scientists should assure that their studies are high-powered (e.g., through sample size and design decisions: Lazic, 2018;Rouder & Haaf, 2018). Also, once set, there will be only limited and slow possibilities for further legislative change in the near future. The most important aspect here is likely to be the consequences from the animals' point of view: any effect of improving welfare should be as certain as possible, optimally substantiated by a meta-analysis, because otherwise animals might not profit from any implemented measures. However, under special circumstances (e.g., if there are deadlines for the implementation of interventions) this certainty could be adjusted downwards (precautionary principle) if an opportunity for such an implementation will otherwise be missed. Replicability of applied research findings is therefore central due to the slow nature of legislative consequences and their potential high cost for the animal keeper. Replicability is hence likely to be of more practical relevance in applied ethology (and animal welfare science) than in fundamental animal behavior research. A similar issue regarding replicability is faced by behavioral research in translational medicine due to the extremely high cost of developing novel medications.

The Ethical Aspect of Replicability in Farm Animal Welfare Science
As in most other areas of animal behavior research, experiments in farm animal welfare science may confront test subjects with sub-optimal housing and/or management conditions (regarding the animals' needs and motivations). But unlike other areas, animal welfare scientists' explicit aim is to consider how these housing conditions affect the welfare of the tested animals and how these conditions can be improved. Often, the (control) treatments reflect the (minimal) requirements of the current legislation or best practice. Frequently, this means that these (control) treatments imply welfare conditions that are known already to be sub-optimal or minimal, at best. We can therefore ask whether it is ethical to conduct replications that expose farm animals to a minimal welfare standard. Additionally, it has to be taken into account in farm animal welfare science that millions (and depending on species even billions) of farm animals are exposed contemporaneously to the same minimal conditions in farming practice (FAO, 2017). From a utilitarian perspective, replicating experiments that have shown promising results regarding specific welfare parameters should be of high priority to assess whether these effects are robust and meaningful. In order to reduce the number of animals in an experiment, the smallest effect size of interest (i.e., the point at which welfare is increased to a relevant degree) can be, in principle, defined in animal welfare science (although this may not be that easy in practice). In these cases (and if random variation is known), the minimum sample size to detect this effect size can be chosen in accordance with an adequate power analysis. Thus, the exposure of a relatively limited number of subjects to sub-optimal welfare conditions in replicating studies can lead to potential welfare improvements in a high number of farm animals once legislation has acknowledged the benefits of this very effect (see point 1).

The Conceptual Aspect of Replicability in Farm Animal Welfare Science
Experimental settings in farm animal welfare science often do not allow for identical direct replications because research is often conducted on farms, which provide a less-controlled environment than studies in laboratories. In addition, resources of institutions to create identical experimental laboratory settings are limited because large and expensive facilities are needed for the study of the relatively large farm animals. Although identical direct replications might thus not be possible, researchers can still aim to use closely related procedures and sample from similar enough populations of animals in similar settings to assess the reliability of the original experiment (Machery, 2019). But is noisiness in a system a problem -or rather a potential benefit? Farm animal welfare studies cover a broad range of study types (as do studies in other fields), ranging from highly experimental studies on a single farm (potentially with a single group of animals) to studies conducted on multiple farms, either in an epidemiological approach (e.g., comparing lame and non-lame cows in a quasi-natural experiment) or repeating a (simple) experiment on several farms. In the highly experimental studies on single groups, the high level of noise in farm settings may be a problem because idiosyncrasies of a specific farm may mask a potential effect. Furthermore, variables that are distinct to specific sites may lead to false positive findings, or findings that will not generalize beyond the original study. In contrast, multi-farm studies are quite common in applied ethology because it is feasible to recruit several (production) farms and statistical models have become more easily applicable in the last two decades for dealing with such hierarchically nested datasets (e.g., from Pinheiro & Bates, 2000, to Bürkner, 2017. Conducting the same type of observation or experiment on several farms may increase external validity, similar to multi-lab pre-clinical studies (see Voelkl et al., 2018Voelkl et al., , 2020. However, a multi-lab study with its own team of researchers at each lab may provide for increased external validity in comparison to a single team working on several farms (i.e., due to a lack of heterogeneity in experimenters conducting experiments, including their potential biases). Therefore, the latter approach may be more closely comparable to the approach of "systematic heterogenization of study samples" which is the idea that detected effects found in heterogenous contexts are more externally valid than those found in a more homogenous testing context and that those heterogeneous contexts can be actively produced by the researcher. This has been suggested as an easy step towards improved replication and at the same time applicable in a single lab by Voelkl et al. (2018Voelkl et al. ( , 2020. This heterogenization created by a multi-farm approach has the potential to drastically improve the replicability of detected effects (Voelkl et al., 2018) and improve generalizability (Yarkoni, 2019) and the severity of theoretical tests (Baribault et al., 2018;Mayo, 2018).

Conclusions
Due to sampling variance and the risk of false-positive and false-negative decisions, replications of studies are an essential part of scientific progress. We outline here that replicability has a special standing for research on farm animal welfare because of the practical legislative consequences based on the scientific claims. From a utilitarian perspective, the use of additional animals in replication studies in farm animal welfare science can be additionally justified because it means exposing only relatively few animals to conditions to which millions are exposed in an industrial setting, which all might profit from the research conducted. Finally, we highlight that studies in farm animal welfare science can easily follow a conceptual framework that improves external validity and thus increases replicability. In light of the importance to base welfare decisions on secure findings and the fact that a high number of animals can profit from these findings, animal welfare scientists should strive to replicate their findings or, at least, implement the heterogenization of study populations to increase the replicability of the investigated effects.