Introduction
According to a wide variety of moral traditions, it is plausible that we should avoid causing gratuitous suffering to sentient beings, whether intentionally or out of recklessness (Birch Reference Birch2024). It matters, therefore, that we determine which beings are sentient (e.g. having positively and negatively valenced phenomenally conscious states, such as pleasure or pain; Fischer Reference Fischer2021). When deciding how to act, we should give some thought to possible impacts on the welfare of these beings.
How can we determine which beings are sentient? Self-report is the most straightforward evidence of conscious, subjective states like pain; however, this marker is unavailable for the vast majority of non-human animals, generating significant uncertainty about their sentience. Yet, despite the uncertainty generated by a lack of self-report, decisions must be made about whether, when, and how to rear, use, and manage these animals. We thus need a consistent, empirically supportable way to determine when the evidence for sentience is strong enough (even if not dispositive, e.g. decisively positive or negative) that it would be reckless to disregard the possibility that some of our actions cause gratuitous suffering in these animals.
To this end, many have proposed objective, proxy-based approaches that often rely on an argument-by-analogy with the human case (Varner Reference Varner2012), assessing the presence/absence of a variety of neurobiological and behavioural proxies for subjective experience to build the case for or against the plausibility of sentience in a particular group of animals (the ‘marker methodology’; Allen & Trestman Reference Allen, Trestman and Zalta2020). These frameworks can be applied consistently across taxa and are based on objective and observable phenomenon accessible to empirical discovery. While the evidence accumulated using these frameworks does not eliminate uncertainty about sentience, it can change our estimation of the plausibility of sentience in any particular animal group (though see Andrews Reference Andrews2024).
While many such frameworks have been used to assess the plausibility of non-human animal sentience (e.g. Smith & Boyd Reference Smith and Boyd1991; Sneddon et al. Reference Sneddon, Elwood, Adamo and Leach2014), they have rarely coupled the evidence accumulated with direct precautionary recommendations. Recently, however, a framework has emerged (Birch et al. Reference Birch, Burn, Schnell, Browning and Crump2021; Crump et al. Reference Crump, Browning, Schnell, Burn and Birch2022; Gibbons et al. Reference Gibbons, Crump, Barrett, Sarlak, Birch and Chittka2022a) for assessing when the evidence for sentience is strong enough to warrant precautionary measures. According to this framework, first proposed by Birch et al. (Reference Birch, Burn, Schnell, Browning and Crump2021), precautionary measures are warranted for the members of a group of animals when we have high or very high confidence that they satisfy at least five of eight criteria (discussed below), analogues of which were originally proposed to determine when precautionary measures are warranted for groups of vertebrates (Bateson Reference Bateson1991; Smith & Boyd Reference Smith and Boyd1991; Sneddon et al. Reference Sneddon, Elwood, Adamo and Leach2014). In particular, satisfying at least five of eight criteria is supposed to support the conclusion that “these animals should be regarded as sentient (or capable of pain) in the context of animal welfare legislation” (Gibbons et al. Reference Gibbons, Crump, Barrett, Sarlak, Birch and Chittka2022a; p 203). These recommendations, while typically reliant upon evidence collected on a few model species, are often applied in practice at the taxonomic level of ‘Class’ (e.g. Mammalia, Cephalopoda) for reasons of historical precedent, consistency, and legislative simplicity (Birch et al. Reference Birch, Burn, Schnell, Browning and Crump2021).
This precautionary framework thus couples evidence-based standards with recommendations for regulative consideration. A framework that couples objective proxies with concrete recommendations for precautionary action may be especially important when considering animals for which there is: (1) significant phylogenetic distance from humans; (2) no precedent for consideration; (3) strong empathetic bias (or even disgust); (4) active scientific debate regarding the question of sentience; and/or (5) societal or economic reasons not to acknowledge sentience. One such case would be the insects (for some of the recent scientific debate, see: Elwood Reference Elwood2011, Reference Elwood2016; Sneddon et al. Reference Sneddon, Elwood, Adamo and Leach2014; Adamo Reference Adamo2016, Reference Adamo2019; Barron & Klein Reference Barron and Klein2016; Klein & Barron Reference Klein and Barron2016; Tiffin Reference Tiffin2016; Burrell Reference Burrell2017; Fischer & Larson Reference Fischer and Larson2019; Gibbons & Sarlak Reference Gibbons and Sarlak2020; Overgaard Reference Overgaard2021; Gibbons et al. Reference Gibbons, Crump, Barrett, Sarlak, Birch and Chittka2022a,Reference Gibbons, Sarlak and Chittkab; Key et al. Reference Key, Zalucki and Brown2021; Barrett & Fischer Reference Barrett and Fischer2024). Using the Birch et al. (Reference Birch, Burn, Schnell, Browning and Crump2021) framework, two orders of adult insects (Blattodea [cockroaches and termites], and Diptera [flies and mosquitoes]) were found to meet six-of-eight criteria to a high or very high level of confidence, representing strong evidence for sentience (Gibbons et al. Reference Gibbons, Crump, Barrett, Sarlak, Birch and Chittka2022a).
Following Gibbons et al. (Reference Gibbons, Crump, Barrett, Sarlak, Birch and Chittka2022a), let us call the framework, ‘the Birch et al. framework’ (given its origins in Birch et al. Reference Birch, Burn, Schnell, Browning and Crump2021). Further, let us refer to the specific threshold in the Birch et al. framework as ‘the five-of-eight threshold.’ We have two aims in this discussion paper. First, there has not been much defence of the five-of-eight threshold. So, we present a new argument for it that is based on a historical case study. Second, because precautionary measures have costs (financial, temporal, opportunity, etc), as has been discussed in the insect case (Adamo Reference Adamo2019; Freelance Reference Freelance2019), it is important to reduce false positives inasmuch as is compatible with the precautionary aims of the framework. So, we suggest some ways that the Birch et al. framework could be refined to mitigate the risk of such errors. However, as we will argue, while such refinements may change which precautionary measures are warranted, they are unlikely to affect whether some measures are warranted.
In the following section, we introduce the Birch et al. framework in more detail. In The evidence for sentience in birds circa 1969, we explore a past decision to implement precautionary measures despite uncertainty about animal sentience. In Refining the framework, we then discuss some refinements to the Birch et al. framework. The final section concludes with future research directions.
The Birch et al. framework
We begin by introducing the Birch et al. framework — which, again, is designed to help us assess when the evidence for sentience in some taxon is strong enough (even if far from dispositive) to warrant precautionary measures. At present, the Birch et al. framework is the only serious attempt to couple evidence with any kind of precautionary recommendations.
How does it work? In brief, the Birch et al. framework includes four neurobiological and four behavioural criteria that pertain to the probability of non-human animal sentience. The neurobiological traits are thought to be indicators of having the required ‘hardware’ for producing phenomenally conscious valenced experiences; the behavioural traits are thought to be indicators that the taxon faced the kinds of selective pressures that may explain why sentience evolved. These eight criteria include:
-
1. Nociceptors: The animal possesses receptors located in neurons that respond specifically to noxious stimuli.
-
2. Integrative brain regions: The animal possesses brain regions capable of integrating information from various sensory sources.
-
3. Integrated nociception: Neural pathways within the animal link nociceptors to integrative brain regions.
-
4. Analgesia: Behavioural responses to noxious stimuli are modulated by chemical compounds affecting the nervous system in either or both of the following ways:
-
a. Endogenous: The animal has an internal neurotransmitter system that modulates their response to noxious stimuli, aligning with the experience of pain or distress.
-
b. Exogenous: Substances such as local anaesthetics, analgesics (e.g. opioids), anxiolytics (e.g. benzodiazepines), or antidepressants modify the animal’s response to noxious stimuli, suggesting alleviation of the experience of pain or distress.
-
-
5. Motivational trade-offs: The animal engages in dynamic decision-making, weighing the adverse impacts of noxious or threatening stimuli against the value of potential rewards. This process reflects flexibility in centralised, integrative information processing involving an evaluative common currency.
-
6. Flexible self-protection: The animal exhibits flexible self-protective behaviours, including wound tending, guarding, grooming, and rubbing, generally directed toward the site of an injury. These actions indicate a representation of the bodily location exposed to noxious stimuli.
-
7. Associative learning: The animal demonstrates associative learning by forming connections between noxious stimuli and neutral cues. They acquire new ways to avoid such stimuli through reinforcement, extending beyond habituation or sensitisation. Some forms, like instrumental conditioning, rapid reversal learning, or trace conditioning — all representing unlimited associative learning — are tentatively linked to sentience in humans.
-
8. Analgesia preference: The animal expresses a preference for analgesics or anaesthetics when injured, demonstrated through:
-
a. Self-administration: The animal learns to self-administer putative analgesics or anaesthetics when injured.
-
b. Conditioned place preference: The animal favours a specific location when injured, where analgesics or anaesthetics can be accessed.
-
c. Prioritisation: When injured, the animal prioritises obtaining these compounds over other needs.
-
The Birch et al. framework builds on a historical precedent: namely, Smith and Boyd’s (Reference Smith and Boyd1991) vertebrate-focused framework. However, the Birch et al. framework updates its predecessor so that it can be used for all animals, not just vertebrates. It does this by focusing on functional attributes rather than specific structures. For instance, the Birch et al. framework allows for any chemical compounds that affect the nervous system to modulate responses to noxious stimuli (criterion four), whereas Smith and Boyd (Reference Smith and Boyd1991) specified that opioids must be present to play that role. While opioids may perform this function in some invertebrates too (Brown Reference Brown2022), there is no obvious reason to penalise other invertebrates that accomplish the same function via other chemical structures.
This decision to update the framework to focus on functional significance instead of specific structures reflects a similar change in the scientific debate on sentience broadly. For instance, it was once more common to hypothesise that the neocortex is necessary for sentience, implying that mammals are the only animals that are plausibly sentient (the ‘no cortex, no cry’ hypothesis; Dinets Reference Dinets2016). However, brain structures have been found in other vertebrates that arguably perform the functions of the neocortex (e.g. the pallium in birds; Pessoa et al. Reference Pessoa, Medina, Hof and Desfilis2019), causing that hypothesis to fall out of favour (Butler & Cotterill Reference Butler and Cotterill2006). Hence, the Birch et al. framework includes integrative brain regions (criterion two) rather than any specific structure.
In both this example and the preceding one, the relevant traits — modulating responses to noxious stimuli and being able to integrate information from various sensory sources — are recognised to probably be ‘multiply realisable’, in the sense that similar functional traits can be realised through different mechanisms (Michel Reference Michel2019). This is true of many functional traits: for instance, eyes may have evolved up to forty times in animal evolutionary history and, despite their many structural differences, generate the same broad functional ability — namely, some form of sight (Schwab Reference Schwab2018).
These updates are valuable because they allow us to compare the strength of the evidence for any two groups of animals, irrespective of structural differences or phylogenetic distance. This allows us to make taxonomically consistent determinations about when moral caution is warranted based on the strength of the same pieces of evidence in each group.
While those responsible for the Birch et al. framework have quite nuanced views about the relative evidential value of each criterion (see the introduction to Gibbons et al. Reference Gibbons, Crump, Barrett, Sarlak, Birch and Chittka2022a), the original version was meant to be a simple instrument for making policy recommendations. So, the Birch et al. framework takes a checklist approach, treating all the criteria equally: the greater the number of satisfied criteria, the more likely it is that the organism being evaluated is sentient (a point to which we return below) and the greater the case for precautionary measures. In particular, the framework includes the five-of-eight threshold: having high or very high confidence in any five of the eight criteria counts as “strong evidence of sentience” for the purpose of assessing whether precautionary measures are warranted, where “strong evidence of sentience” is supposed to support the conclusion that “these animals should be regarded as sentient (or capable of pain) in the context of animal welfare legislation” (Birch et al. Reference Birch, Burn, Schnell, Browning and Crump2021).
We should stress that it is easy to misinterpret the phrase, “strong evidence for sentience.” Crucially, “strong evidence for sentience” does not mean something like, “eliminates all other possible explanations of the data.” No proponent of the Birch et al. framework has claimed that satisfying five of the eight criteria guarantees sentience — or even that it raises the probability of sentience over some specific value (e.g. 0.5). In part, this is because of the so-called “SPUD challenge” (Dung Reference Dung2022; Mason & Lavery Reference Mason and Lavery2022), which considers the properties and abilities of spines detached from brains (S), plants and protozoa (P), unconscious or non-conscious humans (U), and decerebrate mammals and birds (D). Most agree that it is safe to say that these entities are not conscious; so, if they have several of the eight criteria, then we know that it is possible to have some of these neurological features and behavioural capacities without being conscious.
And, indeed, many of the criteria are satisfied by some ‘SPUD’ entity: unconscious humans have nociceptors, integrative brain regions, integrated nociception, and respond to analgesia; decerebrate rats display flexible self-protection; and decerebrate cats display some basic forms of associative learning (Mason & Lavery Reference Mason and Lavery2022). But, the ‘SPUD’ test is designed to help evaluate dispositive evidence for sentience and thus runs the risk of false negatives while evidence is being collected. Again, then, “strong evidence for sentience” in the Birch et al. framework is meant to be understood relative to the aim of assessing whether any precautionary measures are warranted and not in a dispositive sense — intentionally employing a standard of evidence that allows for the possibility of false positives, as we thereby reduce the risk of unintentionally causing gratuitous suffering through false negatives.
However, the risk of false positives may cause some to be concerned that the five-of-eight threshold is too low to justify the level of precaution that the Birch et al. framework proposes: namely, inclusion in animal welfare legislation. In response to this concern, we make a historical argument: we had even less evidence for sentience in other animals when we included them in welfare legislation; so, we should judge that this higher standard is adequate for at least certain insects to receive some kind of protection.
The evidence for sentience in birds circa 1969
History suggests that the five-of-eight standard is sufficiently demanding to justify welfare protections in some countries. To see why, consider the case of protections for avian species in the United States. In 1970, the US extended the Animal Welfare Act (AWA) from just a handful of species to all warm-blooded animals, including birds (though, for political reasons, excluding birds used in research and agriculture settings). This established a review process for animal research, inspections of rearing facilities by veterinarians, and some minimal federal reporting requirements (e.g. the number of animals used, efforts to replace them, and the categories of distress they might experience).
At that time, however, evidence supporting avian sentience was scant, at least relative to the Birch et al. framework. On the neurobiological front, for instance, researchers had identified integrative brain regions and some connecting pathways between them. While there was evidence of responsiveness to analgesics (Blough Reference Blough, Garattini and Ghetti1957; Schneider Reference Schneider1961; Phillips Reference Phillips1964), evidence of nociceptors remained uncertain even into the 1970s (i.e. mixed results: Reille Reference Reille1968; Kreithen & Keeton Reference Kreithen and Keeton1974) and early physiological recordings of ‘putative’ nociceptors (i.e. ones that appeared to be responsively similar to those in mammals) were not made until after the act’s passage (Dorward Reference Dorward1970).
On the behavioural front, there was some evidence of associative learning involving negative reinforcement and avoidance conditioning in 1970 (Ferster Reference Ferster1960; Cumming & Berryman Reference Cumming and Berryman1961; Ratner Reference Ratner1961; Rachlin & Hineline Reference Rachlin and Hineline1967; Macphail Reference Macphail1968; Smith & Keller Reference Smith and Keller1970). They had also observed fear-like responses (Phillips Reference Phillips1964). However, many standard framework criteria — such as flexible self-protective behaviours, motivational trade-offs, and the valuation of anaesthetics and analgesics in response to injury — had not been tested. Evidence of more complex forms of associative learning — trace conditioning, for instance — only emerged after deliberations about the AWA amendment were complete (Jenkins Reference Jenkins and Schoenfeld1970). Importantly, this evidence was collected from a narrow range of bird species (mostly, pigeons). In this case, accumulating the available evidence as of 1969 across all bird species at the level of the class (Aves), and not even the lower taxonomic level of the order, would have resulted in adult birds meeting three of eight criteria to a high/very high degree of confidence (with less confidence in the evidence for another two criteria).
Still, protections were extended to birds despite the modest state of evidence at that time. And, strikingly, the current evidence for some adult insects being sentient is much better now than it was for adult birds in 1970. If evidence were accumulated at the level of the class Insecta (to best match our example in the Aves), they would fulfil seven of eight criteria to a high/very high degree of confidence. Even at the level of the order, adult insects of the Blattodea and Diptera fulfil six of eight criteria under the Birch et al. framework with a high or very high degree of confidence (Gibbons et al. Reference Gibbons, Crump, Barrett, Sarlak, Birch and Chittka2022a), including the fulfilment of all four neurobiological criteria and associative learning in both groups, as well as evidence for motivational trade-offs (Diptera) and flexible self-protective behaviours (Blattodea). And, unlike the evidence for Aves coming from just a handful of studies, over two hundred and fifty papers were found on the evidence for these criteria in the six orders of insects reviewed in Gibbons et al. (Reference Gibbons, Crump, Barrett, Sarlak, Birch and Chittka2022a).
If we commend past decisions to enact precautionary measures based on the evidence available at that time, then we can consider the quality of that evidence to calibrate our judgments about what counts as adequate reason for precautionary measures. Since the commendable decision to extend protections to birds was based on very little evidence, it is hard to object to a higher standard for precautionary measures now — i.e. the five-of-eight threshold.
Someone might object that empirical research is less relevant in the case of birds than insects. This person might argue: “Because avian sentience is much more likely antecedently, we are commending that past decision just based on their prior probability of avian sentience, not because we judge that the scant published evidence available was sufficient to warrant precautionary measures.”
Fifty years ago, though, it was much more reasonable to think that birds are mere automata than it is now, weakening the point about past decision-makers’ prior probability for avian sentience (Rollin Reference Rollin1989). Behaviourism was still highly influential in psychology. Religious views that reject common descent were more prevalent. It was much more widely held that traits not possessed by birds, such as the neocortex and language, were required for consciousness. Some of the key works arguing for sentience in birds — like Butler et al. (Reference Butler, Manger, Lindahl and Arhem2005) and Cabanac et al. (Reference Cabanac, Cabanac and Paren2009) — were decades away from being published. Moreover, assumptions about the lack of intelligence of small bird brains were (and still are) common enough to result in a colloquial insult (‘birdbrain’).
Again, then, if past individuals knew very little about birds’ capacity for sentience and we judge, nevertheless, that they were right to think that precautionary measures were warranted, then it is reasonable to take precautionary measures for other animals now based on substantially better evidence than was then available for birds. That is, if we think precautionary measures are warranted on a weaker basis, then they are warranted on a stronger one. It is plausible, therefore, that the five-of-eight threshold is a reasonable basis for precautionary measures.
Refining the Framework
While we have defended the five-of-eight threshold, we readily acknowledge that precautionary measures have costs: it can be expensive (in many senses of ‘expensive’) to implement and enforce them. When those costs are worth paying, it is because there is no feasible way to prevent an excessive rate of false negatives (which is the precautionary objective) without some corresponding rate of false positives. (Here, the terms ‘feasible’ and ‘excessive’ are sensitive to the moral stakes — i.e. the probability of causing gratuitous suffering, the severity of that suffering were it to occur, and the burdens of risk-mitigating courses of action). However, if we can prevent an excessive rate of false negatives with fewer false positives, then we should, as otherwise the costs of precautionary measures would not be justified. So, we have reason to try to refine the Birch et al. framework so that it still prevents an excessive rate of false negatives but without as many false positives.
The ideal refinements will be those that reduce the rate of false positives without any impact on the rate of false negatives. While there may not be any realistic examples of such refinements, it remains that some are likely to have disproportionate effects on one rate or another. It is easy, for instance, to think of an example that would have a disproportionate impact in the wrong direction — i.e. mostly increasing false negatives for some smaller reduction in false positives. Consider adding a negative condition to the Birch et al. framework. At present, the framework is based entirely on positive markers of sentience — markers that raise the probability that an animal is sentient. In principle, the framework could also include negative markers — ones that, if not met, lower the probability that an animal is sentient. Negative conditions will rule some animals out and not rule any animals in; all else being equal, then, we should expect them to increase false negatives more than they decrease false positives, at least on average.
By contrast, consider distinguishing between the absence of evidence and evidence of absence. The Birch et al. framework treats these cases equally when tallying up the number of criteria that are fulfilled by a taxa: 0 points are added for the absence of evidence and 0 points are subtracted even when there is evidence that a trait is absent (though, of course, this evidence would be persuasive that a ‘0’ score was appropriate for that criterion). However, if evidence of the absence of one of these traits in a taxon is evidence that those animals are not sentient, then, on average, it would be more likely to reduce false positives than increase false negatives. To help prevent false positives, then, evidence that a trait is absent should be considered when justifying the strength of the case for precaution. So, a refined version of the Birch et al. framework probably should not include a negative condition and should find some way to distinguish between the absence of evidence and evidence of absence. (Not incidentally, this addresses Andrews’ [2024] concern that the marker method can only serve as a positive test).
We now briefly consider two other possible amendments to the Birch et al. framework.
One possible amendment involves weighting the criteria and the quality of the evidence, as opposed to the current egalitarian model. Some criteria may have more evidential value than others; there may also be dependencies between the criteria that create the risk of double-counting if those relationships are not considered (Irvine Reference Irvine2022). Likewise, it is critical to consider how much we can trust published negative evidence, some of which may be driven by failures to develop ecologically relevant study designs.
There are simple strategies for addressing these issues, such as assigning different scores to the individual criteria and then assigning actual scores instead of confidence ratings based on the quality of the evidence for their being satisfied. For instance, perhaps analgesia preference matters more than associative learning, such that the former should be scored out of two points whereas the latter should be scored out of one. Then, weak evidence might provide 25% of the points for a given criterion while robust evidence would provide 100%. While the exact scores and percentages may be somewhat arbitrary, this is not a problem: it is the general relationships between the scores and percentages that matter. These numbers permit more transparent disagreements about the relative importance of criteria and the relative quality of different pieces of evidence. They can also allow people to perform sensitivity tests, which can clarify whether disagreements are action-relevant (i.e. whether a given disagreement matters for whether precautionary measures are warranted). And they can distinguish the value of simple evidence for a criterion (e.g. basic associative learning, criterion 7) from more complex evidence (e.g. reversal learning and trace conditioning), while still allowing both kinds of evidence to provide some support for sentience in the framework. While there are versions of this amendment that might increase the false negative rate objectionably, not all versions would have this limitation. For instance, if a particular criterion seems to provide especially weak evidence, then giving it relatively less weight may well avoid more false positives than false negatives.
A second possible amendment is more complex: namely, the Birch et al. framework could adopt a more sophisticated evolutionary approach to trait analysis. Right now, the approach has been to accumulate all evidence for a criterion within an order; then, there is an order-level evaluation of the confidence appropriate for that order fulfilling that criterion, which then applies to all species in that order. However, these order-level evaluations are often based on evidence from just a single/handful of well-studied species that may not be good representatives of the majority of species within the order (e.g. eusocial, reasonably large bees and wasps represented the majority of data in the Hymenoptera, but the majority of species in that order are solitary, miniaturised parasitic wasps). Thus, order-level evaluations may represent an especially high tolerance for false positives for some criteria.
On the other hand, these order-level evaluations may unduly penalise orders that are poorly studied, resulting in low ratings due to an absence of evidence. This represents a high tolerance for false negatives due to a lack of order-level evidence, particularly when there is no a priori reason to suspect most sentience-relevant traits would be gained or lost at the arbitrary level of the taxonomic order. A more evolutionarily informed approach would allow some evidence for a trait found in other orders to be used as evidence of the likelihood of that trait in species from the unstudied order.
A more phylogenetically informed approach would involve considering what we know about the expected origin(s) of a trait, the phylogenetic distribution of the trait, the complexity of the trait, the correlation between the trait and other traits, the (ir)reversibility of the trait, the divergence time of the orders/species in question for making the inference, and more. From these data, we might then be able to infer the level of taxonomic precision we actually need to have some level of confidence in any particular taxa meeting any particular criterion.
This approach may help to reduce both false negatives and false positives; let us consider two, highly simplified examples. First, consider the lack of order-level data on nociceptors in Mecoptera and Siphonaptera, the scorpionflies and the fleas. These orders are, together, the closest relatives to Diptera (last common ancestor, ~260 mya; Misof et al. Reference Misof, Liu, Meusemann, Peters, Donath, Mayer and Zhou2014) and would currently get a ‘very low confidence’ rating for criterion one. However, it is well-known that the genetic architecture underlying nociception is ancient and highly conserved (Peng et al. Reference Peng, Shi and Kadowaki2015), with homologous ion channels in humans and fruit flies. Further, no studied species of insect, to date, completely lacks the genes for nociceptive ion channels. While there is significant variation in which nociceptive ion channels may be present, their copy numbers (Goldberg et al. Reference Goldberg, Godfrey and Barrett2024), and their precise function (Wang et al. Reference Wang, Qiu, Lu, Kwon, Pitts, van Loon, Takken and Zwiebel2009), the presence/absence of nociceptive ion channels is itself not a highly variable trait. It might thus be most parsimonious to infer that Mecoptera and Siphonaptera likely have the genes for at least some nociceptive ion channels, despite missing order-level data; this would represent some evidence towards their fulfilment of criterion one.
Conversely, some traits may be gained or lost at a sub-order level, resulting in false positives when order-level analyses are used. Here, we might consider the distribution of eusociality as an analogy, despite its plausible irrelevance to sentience. Within just the subfamily of Halictinae bees, there are three known independent origins of eusociality, and as many as twelve loss or ‘reversal’ events of species that returned to a solitary life (Danforth Reference Danforth2002). Thus, we may not want to infer that all Hymenoptera, or even all Halictinae, are eusocial based on the observance of eusociality in a few species, as it would result in many false positives. We would instead infer that some sub-order level of evidence gathering would be needed for the trait of eusociality to be inferred in any particular species.
Methods for making accurate phylogenetic inferences are much more complex than the two quick examples we have just provided and are themselves subject to a host of philosophical complexities that are beyond the scope of this paper (Haber & Velasco Reference Haber, Velasco, Zalta and Nodelman2024). However, we hope that these simple examples have demonstrated how a more evolutionarily informed approach could avoid the Birch et al. framework’s current reliance on order-level analysis, with the potential to reduce both false positive and false negatives.
Suppose we refine the Birch et al. framework as suggested above: distinguishing between evidence of absence and absence of evidence, weighting the criteria and quality of the evidence, and adopting a more evolutionarily informed approach. How will that impact what it recommends about whether precautionary measures are warranted for some orders of insects?
While we cannot explore this issue in detail here, we can make some general remarks. Our first refinement would result in few changes to the conclusions in Gibbons et al. (Reference Gibbons, Crump, Barrett, Sarlak, Birch and Chittka2022a). There is only one study so far, on Hymenoptera for criterion 8 (analgesia preference), that would result in any distinguishing between the absence of evidence and negative evidence; thus, this refinement is not likely to change outcomes at this time. As far as refinements two and three, the specific impacts would depend strongly on their details. Yet, no matter how these refinements are made, it is plausible that while they might change which precautionary measures are warranted, they are unlikely to change that some precautionary measures are warranted. That is, while the refinements might change whether certain insects should be regarded as sentient in the context of animal welfare legislation, they are unlikely to change the conclusion that society has moral reasons to be cautious in its treatment of at least some insects, even if that caution is not enforced by law.
Put differently, given the case we have made for the five-of-eight threshold, the bar for moral precaution is probably quite low. So, if precautionary legal protection for birds was warranted despite the relatively modest evidence for their sentience in 1969, then precautionary moral consideration is warranted for many insects given better evidence for their sentience at the time of this writing — even if we revise the Birch et al. framework to reduce the risk of false positives. After all, legal protections are binding on everyone in the relevant jurisdiction and enforceable through legal sanctions, regardless of what people think about any animal’s sentience or the moral importance of protecting animal welfare. By contrast, most moral norms are not enforceable through legal sanctions (at least, not just because they are moral norms), and thus require less evidence to motivate them. So, if more demanding norms (legal protection) were warranted based on weaker evidence, then, minimally, less demanding norms (moral consideration) are warranted based on better evidence, purely on grounds of moral consistency.
The Birch et al. framework — or something much like it — will continue to be needed until the evidence for sentience in the relevant groups of animals is conclusively positive or negative. And, given the many uncertainties surrounding sentience, and the practical challenges of sentience research, this may mean that such frameworks will always be needed. Any precautionary framework will be designed to tolerate false negatives to mitigate the risk of unintentionally causing gratuitous animal suffering. That is a feature, not a bug, and should not be eliminated. Still, it is compatible with the aim of precaution to take reasonable steps to reduce the risk of false positives. It is beyond the scope of this paper to provide a refined version of the Birch et al. framework, but our suggested refinements open the door for future work to improve the framework’s utility.
Conclusion
The Birch et al. framework is a tool for assessing when the evidence for sentience is strong enough to warrant precautionary measures. It is motivated by the widely held moral principle that we ought to err on the side of caution, recognising that it is often right to take precautionary measures instead of running the risk of causing gratuitous suffering (O’Riordan & Cameron Reference O’Riordan and Cameron1994; Bradshaw Reference Bradshaw1998). This fits with the substantial surge in social and political interest in animal welfare, reflecting a growing societal commitment to improving our treatment of non-human beings (Bayvel & Cross Reference Bayvel and Cross2010; Ohl & van der Staay Reference Ohl and van der Staay2012), even when evidence for sentience is limited.
In general, the threshold for precautionary measures is sensitive to the costs of those measures. When the costs of taking precautionary measures are trivial, the trigger for caution can be minimal too. In cases where the costs of precaution are high, a higher evidential standard becomes necessary to strike a balance between ethical responsibility and practical feasibility.
Regarding insects, however, it is important to recognise that ostensibly burdensome levels of precaution — e.g. those associated with moral consideration for insects — may not always be burdensome on reflection. For instance, while there are basic legal protections for chickens, they are still farmed in extraordinary numbers, stocked at very high densities, and processed in ways that are widely seen as involving significant welfare compromises. In this light, it becomes increasingly challenging to argue that moral consideration for insects would be unduly burdensome and necessitate exceptionally high evidential standards to activate it. The sheer scale, and importance to human welfare, of the industries that use or affect insects underscores the ethical imperative of ensuring that the evidential threshold for moral consideration remains attainable and practical. Our precautionary standard should reflect the potential gravity of the situation while accounting for the limitations of empirical evidence (Birch Reference Birch2024).
When we appreciate this point, it becomes easier to see that scepticism about insect sentience — represented by Adamo (Reference Adamo2019) or Key et al. (Reference Key, Zalucki and Brown2021) — is perfectly compatible with taking precautionary measures (as Adamo herself reports doing in her lab; Love Reference Love2025). We can think that the evidence for insect sentience is quite weak while still being concerned about unintentionally causing gratuitous animal suffering. Indeed, anything else would involve imposing a double standard on insects, at least insofar as we are prepared to commend historical cases that took very modest evidence as warranting precautionary measures for vertebrates. If we judge that modest evidence was adequate in the past to warrant strong (legal) precautionary measures, then we should judge that better evidence is adequate now to warrant weaker (moral) precautionary measures — especially given that insects are far more numerous.
Competing interest
MB and BF report a relationship with the Insect Welfare Research Society that includes: Board of Directors. BF reports a relationship with Rethink Priorities that includes: employment. MB, JG and AS report a relationship with Rethink Priorities that includes: consulting/advisory.