In medicine, anecdotes or hunches aren’t considered “real” evidence

In medicine, anecdotes or hunches aren’t considered “real” evidence


Introduction

Have been thinking about data, truth and ‘evidence’, and what constitutes any of this. It’s actually really hard. Maybe this all happened because you didn’t forward that e mail to 10 people? We know what happened with the hydroxychloroquine debacle early on in the pandemic. We wanted to believe it helped us, even though it could harm us. Confession: I even ordered some. Every single one of these statements that had “no evidence” is currently considered true or at least pretty plausible:

In an extremely nit-picky sense, these headlines are accurate. Then. Commentators were simply describing the then-current state of knowledge. In medicine, anecdotes or hunches aren’t considered “real” evidence. So, if there hasn’t been a study showing something, then there’s “no evidence”. In early 2020, there hadn’t yet been a study proving that COVID could be airborne, so there was “no evidence” for it.

On the other hand, here is a recent headline: 

Here’s another: 

I don’t think the scientists and journalists involved in these stories meant to shrug and say that no study has ever been done so we can’t be sure either way. I think they meant to express strong confidence these things are false. You can see the problem. Science communicators are using the same term - “no evidence” - to mean: this thing is super plausible, and honestly very likely true, but we haven’t checked yet, so we can’t be sure. We have hard-and-fast evidence that this is false, stop repeating this easily debunked lie.

This is utterly corrosive to anybody trusting science journalism. Imagine you are John Q. The Public. You read “no evidence of human-to-human transmission of coronavirus”, and then a month later it turns out such transmission is common. You read “no evidence linking COVID to indoor dining”, and a month later your governor has to shut down indoor dining because of all the COVID it causes. You read “no hard evidence new COVID strain is more transmissible”, and a month later everything is in panic mode because it was more transmissible after all. And then you read “no evidence that 45,000 people died of vaccine-related complications”. Doesn’t sound very reassuring, does it?

Unfortunately, I don’t think this is just a matter of scientists and journalists using the wrong words sometimes. I think they are fundamentally confused about this. In traditional science, you start with a “null hypothesis” along the lines of “this thing doesn’t happen and nothing about it is interesting”. Then you do your study, and if it gets surprising results, you might end up “rejecting the null hypothesis” and concluding that the interesting thing is true; otherwise, you have “no evidence” for anything except the null. This is a perfectly fine statistical hack, but it doesn’t work in real life. In real life, there is no such thing as a state of “no evidence” and it’s impossible to even give the phrase a consistent meaning.

Causation

Causation is really important in medicine and science, in fact in anything though it means different things to different people (lawyers vs doctors etc). It’s so much better than correlations where I immediately think of stock price performance following (or before) US (or UK) elections, or whether the weather influences football scores, or in fact the other way round. Causality (also referred to as causation, or cause and effect) is influence by which one event, process, state or object (a cause) contributes to the production of another event, process, state or object (an effect) where the cause is partly responsible for the effect, and the effect is partly dependent on the cause. In general, a process has many causes, which are also said to be causal factors for it, and all lie in its past. An effect can in turn be a cause of, or causal factor for, many other effects, which all lie in its future. Some writers have held that causality is metaphysically prior to notions of time and space.

Causality is an abstraction that indicates how the world progresses, so basic a concept that it is more apt as an explanation of other concepts of progression than as something to be explained by others more basic. The concept is like those of agency and efficacy. For this reason, a leap of intuition may be needed to grasp it. Accordingly, causality is implicit in the logic and structure of ordinary language.

In English studies of Aristotelian philosophy, the word "cause" is used as a specialized technical term, the translation of the term αἰτία, by which Aristotle meant "explanation" or "answer to a 'why' question". Aristotle categorized the four types of answers as material, formal, efficient, and final "causes". In this case, the "cause" is the explanans for the explanandum, and failure to recognize that different kinds of "cause" are being considered can lead to futile debate. Of Aristotle's four explanatory modes, the one nearest to the concerns of the present article is the "efficient" one.

David Hume, as part of his opposition to rationalism, argued that pure reason alone cannot prove the reality of efficient causality; instead, he appealed to custom and mental habit, observing that all human knowledge derives solely from experience. The topic of causality remains a staple in contemporary philosophy. At a more pragmatic level, the Bradford Hill causality criteria in medicine still hold true:

Clearly this can be taken further:

It’s still used every day:

What studies can tell us?

They can tell us some things but not everything. Is there "no evidence" that using a parachute helps prevent injuries when jumping out of planes? This was the conclusion of a sweet paper in the BMJ, which pointed out that as far as they could tell, nobody had ever done a study proving parachutes helped. Their point was that "evidence" isn't the same thing as "peer-reviewed journal articles". So maybe we should stop demanding journal articles, and accept informal evidence as valid?

https://www.bmj.com/content/327/7429/1459?

Is there "no evidence" for alien abductions? There are hundreds of people who say they've been abducted by aliens! By legal standards, hundreds of eyewitnesses is great evidence! If a hundred people say that Bob stabbed them, Bob is a serial stabber - or, even if you thought all hundred witnesses were lying, you certainly wouldn't say the prosecution had “no evidence”! When we say "no evidence" here, we mean "no really strong evidence from scientists, worthy of a peer-reviewed journal article". But this is the opposite problem as with the parachutes - here we should stop accepting informal evidence, and demand more scientific rigor.

Is there "no evidence" homeopathy works? No, here’s a peer-reviewed study showing that it does (am not focusing on the retraction note at the bottom):

https://pubmed.ncbi.nlm.nih.gov/30202036/

Don't like it? I have 89 more studies showing it I can send you, if you can’t sleep. But a strong theoretical understanding of how water, chemicals, immunology, etc operate suggests homeopathy can't possibly work, so I assume all those pro-homeopathy studies are methodologically flawed and useless, the same way somewhere between 16-89% of other medical studies are flawed and useless. Here we should reject journal articles because they disagree with informal evidence:

https://en.wikipedia.org/wiki/Replication_crisis#In_medicine

Is there "no evidence" that King Henry VIII had a spleen? Certainly, nobody has published a peer-reviewed article weighing in on the matter. And probably nobody ever dissected him, or gave him an abdominal exam, or collected any informal evidence. Empirically, this issue is just a complete blank, an empty void in our map of the world. Here we should ignore the absence of journal articles and the absence of informal evidence, and just assume it's true because obviously it’s true.

I challenge anyone to come up with a definition of "no evidence" that wouldn't be misleading in at least one of the above examples. If you can't do it, I think that's because the folk concept of "no evidence" doesn't match how real truth-seeking works. Real truth-seeking is Bayesian. You start with a prior for how unlikely something is. Then you update the prior as you gather evidence. If you gather a lot of strong evidence, maybe you update the prior to somewhere very far away from where you started, like that some really implausible thing is nevertheless true. Or that some dogma you held unquestioningly is in fact false. If you gather only a little evidence, you mostly stay where you started.

I'm not saying this process is easy or even that I'm very good at it. I'm just saying that once you understand the process, it no longer makes sense to say "no evidence" as a synonym for “false”. Okay, but then what? “No Evidence That Snake Oil Works” is the bread and butter of science journalism. How do you express that concept without falling into the “no evidence” trap?

I think you have to go back to the basics of journalism: what story are you trying to cover? If the story is that nobody has ever investigated snake oil, and you have no strong opinion on it, and for some reason that’s newsworthy, use the words “either way”: “No Evidence Either Way About Whether Snake Oil Works”.

If the story is that all the world’s top doctors and scientists believe snake oil doesn’t work, then say so. “Scientists: Snake Oil Doesn’t Work”. This doesn’t have the same faux objectivity as “No Evidence Snake Oil Works”. It centers the belief in fallible scientists, as opposed to the much more convincing claim that there is literally not a single piece of evidence anywhere in the world that anyone could use in favor of snake oil. Maybe it would sound less authoritative. Breaking an addiction to false certainty is as hard as breaking any other addiction. But the first step is admitting you have a problem.

But I think the most virtuous way to write this is to actually investigate. If it’s worth writing a story about why there’s no evidence for something, probably it’s because some people believe there is evidence. What evidence do they believe in? Why is it wrong? How do you know?

Some people thought masks helped slow the spread of COVID. You can type out "no evidence" and hit "send tweet". But what if you try to engage the argument? Why do people believe masks could slow spread? Well, because it seems intuitively obvious that if something is spread by droplets shooting out of your mouth, preventing droplets from shooting out of your mouth would slow the spread. Does that seem like basically sound logic? If so, are you sure your job as a science communicator requires you to tell people not to believe that? How do you know they're not smarter than you are? There's no evidence that they aren't.

Reproducibility and science

Surely good science should be reproducible? Isn’t that one way we know it’s ok? One can after all always make data seem better than it is:

Last week we heard that an 8-year attempt to replicate influential preclinical cancer research papers has released its final, and disquieting, results. It was really remarkable what they attempted to do. They’d pick experiments in major journals and ask the authors how to do them, where they purchased reagents and so on. But, fewer than half of the experiments assessed stood up to scrutiny, reports the Reproducibility Project: Cancer Biology (RPCB) team in eLife. The project, one of the most robust reproducibility studies performed so far, documented how hurdles including vague research protocols and uncooperative authors delayed the initiative by five years and halved its scope. Other analyses have reported low replication rates in drug discovery, neuroscience and psychology. The RPCB, a partnership between the Center for Open Science and Science Exchange, a marketplace for research services in Palo Alto, California, launched in 2013. Funded by the philanthropic investment fund Arnold Ventures, headquartered in Houston, the collaborators set out to systematically reproduce experiments in 53 high-profile papers published in journals including Nature, Science and Cell. The project focused on preclinical cancer research because early hints at low reproducibility rates came from this space, animal studies, in particular, seemed difficult to reproduce. By selecting high-impact papers, the team focused on the research that most shapes the field:

https://www.nature.com/articles/d41586-021-03691-0

The RPCB started publishing its findings in 2017, and these hinted at the messy results to come. The researchers now summarize their overall findings in two papers published this month. The first of these papers catalogues the hurdles the researchers encountered. For every experiment they set their sights on, for example, they needed to contact the authors for advice on experimental design because the original papers lacked data and details. They deemed 26% of authors “extremely helpful”, sometimes spending months tracking down answers and sharing reagents. But 32% were “not at all helpful”, often ignoring queries altogether. This lack of cooperation, alongside the need to modify or overhaul protocols once experiments were under way, took a toll. On average, the team needed 197 weeks to replicate a study. And as costs added up to $53,000 per experiment, about twice what the team had initially allocated, the project’s budget couldn’t cover its original ambition.

The second study delves into the overall results of these experiments in detail. By one analysis, only 46% of the attempted replications confirmed the original findings. And, on average, the researchers observed effect sizes that were 85% smaller than originally reported. The experiments with the biggest effect sizes were those most likely to be replicated. Animal experiments fared worst, mainly because in vivo experiments tend to yield smaller effect sizes than do in vitro experiments.

Not everyone is convinced that the study has merit. Pushback came especially from researchers whose findings were not successfully replicated. Replication is extremely hard. You can never do it exactly the same. Does it matter if you shake a tube up and down instead of side to side? How do you account for different baseline readings? Figuring out when and how to stay true to an experimental protocol is part of the emerging science of replication. Failure to replicate alone is not necessarily cause for concern. Some preliminary findings are distractions, but contradictory follow-up results can lead to deeper scientific insights. The RPCB was not set up to call out or invalidate specific studies. Replication, like science, is about the total body of evidence. Rather, the goal was to capture a snap shot of the drivers and the magnitude of the reproducibility crisis, with an eye towards system-level solutions.

The real problem is the time, money and effort that are wasted in finding the signals amid the noise, says Tim Errington, the RPCB’s project leader and director of research at the Center for Open Science. “How well are we using our resources? And how are we learning new knowledge? This is the place to keep pushing, across disciplines.” There is no shortage of proposed fixes: for example, in vitro and animal studies can benefit from blinding, bigger sample sizes, greater statistical rigour and pre-registration of study plans. Papers should make fewer claims and provide more proof, researchers suggest. Data sharing and reporting requirements need to be baked into scientific processes.

There is also currently little support for the researchers who show that something doesn’t work, or who focus on the causes of variability between labs. The final data overall is disconcerting and looked like this:

Here’s a good twitter feed focusing on one paper:

https://twitter.com/elife/status/952949812189716480?lang=es

Sample sizes

Maybe it’s just about then doing it more, and making it bigger to find out the truth? Increasing the sample size of a survey is often thought to increase the accuracy of the results. However, an analysis published this week in Nature,  of big surveys on the uptake of COVID-19 vaccines shows that larger sample sizes do not in fact protect against bias. Although ‘big’ surveys can, under certain conditions, be useful for tracking changes in a population measure over time and across space, their estimates of population variables can be considerably biased.

Early in the COVID-19 pandemic, many nations lacked essential epidemiological data, even those with well-developed public-health monitoring infrastructures. There was a scarcity of timely information on regional increases in SARS-CoV-2 infections, on adherence to physical-distancing measures and on the social and economic effects of the pandemic. The state-sponsored data collections that existed at the time were often too slow to meet the demands generated by the pandemic.

As a result, some private companies jumped in to offer data; for example, Google provided anonymized, aggregated data on people’s mobility (go.nature.com/3htjccv), and Facebook presented anonymized and aggregated data about the development of connections between different geographical regions (go.nature.com/3lwknax). The Kings-based lifestyle company ZOE built the ZOE COVID Study app in collaboration with academic partners (go.nature.com/3i7ypxj). The app surveyed participants who downloaded it, to identify infection hotspots and track the effect of mitigation measures. And when vaccination programmes were rolled out, it was used to record COVID-19 vaccine side effects. In addition, various private-sector surveys, many of which were archived by the US-based Societal Experts Action Network (go.nature.com/3rcmkwh) — produced data on changes in the public’s response to the pandemic.

The US Census Bureau, in partnership with various federal agencies, and the Delphi group at Carnegie Mellon University, in partnership with Facebook, designed and performed massive surveys to forecast the spread of COVID-19 and measure its effects; questions about vaccination were added in early 2021. With more than 3 million and 25 million responses collected, respectively (as of November 2021; see go.nature.com/3dg0qvy and go.nature.com/3y2r1bk), these are now probably the largest US surveys relating to the pandemic. However, using a subset of responses, Bradley and colleagues demonstrate that the US Census Bureau–federal agencies survey (dubbed the Census Household Pulse survey) and the Delphi–Facebook survey overestimated the vaccination uptake compared with the benchmark data from the CDC (Figure below):

Big surveys can give biased estimates of population variables. Bradley et al. in Nature this week compared estimates of the uptake of SARS-CoV-2 vaccines among US adults, as reported by large surveys, with numbers of administered vaccine doses reported by the US Centers for Disease Control and Prevention (CDC) on 26 May 2021. Results from a survey carried out by the US Census Bureau in partnership with various federal agencies (Census Household Pulse), and another survey by the Delphi group at Carnegie Mellon University in Pittsburgh, Pennsylvania, in partnership with Facebook (Delphi–Facebook), overestimated vaccine uptake, but were useful in tracking the increase in vaccination over time in the first half of 2021. Bradley and colleagues explain how design choices in these surveys could account for the bias in the surveys’ absolute estimates of vaccine uptake.

The authors conclude that having more data does not necessarily lead to better estimates. They discuss how design choices in survey-data collection can lead to error, in this case, the overestimation of vaccination uptake. Their findings are a reminder to researchers that statistical precision does not equate to unbiased population estimates.

They focus on three elements that can contribute to the size of the error — that is, the difference between estimates from big surveys and actual population values. These elements are data quantity (the fraction of a population that is captured in the sample), problem difficulty (how much variation in the outcome of interest there is in the population) and data quality. The quality is very difficult to assess, because there is usually no independently verified ‘ground truth’ or ‘gold standard’ with which to compare survey data. In this case, the CDC’s reports of the numbers of vaccines administered provide benchmark data with which the estimates reported in the surveys could be compared. Under the strong assumption that these reports are indeed the gold standard and reflect the correct vaccination rates, the survey estimates can be compared with these official numbers (which the CDC frequently updates; state-level estimates updated more recently than those used by Bradley et al. can be found at go.nature.com/3dtrdit). Using this approach, they evaluated estimates from several surveys and found that they did not match the CDC’s reported rates of vaccination uptake.

However, what the metric used by Bradley and colleagues does not enable us to answer — at least, not quantitatively — is the cause of the differences in data quality. To address this issue, the authors used a conceptual framework from survey methodology called the total survey error (TSE) framework, which can help to optimize survey-data quality in three key ways.

First, the TSE framework seeks to ensure that the population of interest and the members included in the ‘frame’ from which the sample is drawn are aligned. Facebook’s active user base is an example of a population that is not aligned with the entire population of the United States. Therefore, if Facebook users have different vaccination habits from those who do not use Facebook, estimates from a survey of Facebook users will be biased. Second, the framework aims to minimize the extent to which those who are sampled and respond differ from the sample members who do not respond. For example, some people who don’t trust the government might be less likely to respond to a government survey. Third, the accordance between the survey measure and the construct of interest should be maximized, and the respondents need to answer in the way intended. For example, questions about vaccination are at risk of being answered positively if respondents feel that they need to present themselves in a favourable light.

For certain inferential tasks, surveys with deficiencies can be useful. The usefulness of a data set can be evaluated only in the context of a specific research question. For example, data from samples that are known to be biased have provided useful information for monitoring inflation rates, as exemplified by the Billion Prices Project (go.nature.com/3i6qock)5 — which, for years, used prices of online goods and services to estimate alternative indices of inflation. The project was able to do this because, even though not all goods and services were online, online and offline price changes tracked each other. Similarly, the data produced by the US Census Bureau and its partner agencies, and by the Delphi–Facebook partnership, can help to create early-warning systems when administrative data are lacking, as well as help to track cases and evaluate the effectiveness of measures designed to mitigate the spread of SARS-CoV-2 infections, if the errors of these surveys stay constant over time.

Large sample sizes can also reveal relationships between variables, such as reasons for vaccine hesitancy in subgroups of the population, and changes in these reasons over time, unless these relationships for survey respondents differ from those for people who do not respond. Samples collected at a high frequency over time and across relatively small geographical areas, such as some of the samples discussed by Bradley and colleagues, can also be used to evaluate the need for and effectiveness of policy interventions, such as mask-wearing mandates, lockdowns and school-based measures to limit COVID-19 spread.

The world is moving towards making decisions on the basis of data, as reflected, for example, in the US Foundations for Evidence-Based Policymaking Act of 2018 and the European Data Strategy (go.nature.com/3cp1f7o). In response to these changes, we will probably see more data from all kinds of sources, not just surveys. Strong hopes rest on having more available administrative data, such as those from the CDC, that can in some instances replace survey data and, in others, improve survey estimates.

However, as with survey data, we will need robust frameworks and metrics to assess the quality of the data provided by governments, academic institutions and the private sector, and to guide us in using such data. The work by Bradley and colleagues reminds us that, alongside the studies themselves, research is needed on how best to use data, and on their quality and relevance to the question being asked.

So what do we do?

Let’s think about how to move forward with truth, causality and evidence. How do we know it’s real? Or it’s even ‘for real’ or just an illusion? The most basic illusion I know of is the Wine Illusion; dye a white wine red, and lots of people will say it tastes like red wine. The raw experience - the taste of the wine itself, is that of a white wine. But the context is that you're drinking a red liquid. Result: it tastes like a red wine:

The placebo effect is almost equally simple. You're in pain, so your doctor gives you a “painkiller” (unbeknownst to you, it’s really a sugar pill). The raw experience is the nerve sending out just as many pain impulses as before. The context is that you've just taken a pill which a doctor assures you will make you feel better. Result: you feel less pain:

These diagrams cram a lot into the grey box in the middle representing a “weighting algorithm”. Sometimes the algorithm will place almost all its weight on raw experience, and the end result will be raw experience only slightly modulated by context. Other times it will place almost all its weight on context and the end result will barely depend on experience at all. Still other times it will weight them 50-50. The factors at play here are very complicated and I’m hoping you can still find this helpful even when I treat the grey box as, well, a black box.

The cognitive version of this experience is normal Bayesian reasoning. Suppose you live in an ordinary London suburb and your friend says she saw a coyote on the way to work. You believe her; your raw experience (a friend saying a thing) and your context (coyotes are plentiful in your area) add up to more-likely-than-not. But suppose your friend says she saw a polar bear on the way to work. Now you're doubtful; the raw experience (a friend saying a thing) is the same, but the context (ie. the very low prior on polar bears in London) makes it implausible:

Normal Bayesian reasoning slides gradually into confirmation bias. Suppose you are a zealous Democrat. Your friend makes a plausible-sounding argument for a Democratic position. You believe it; your raw experience (an argument that sounds convincing) and your context (the Democrats are great) add up to more-likely-than-not true. But suppose your friend makes a plausible-sounding argument for a Republican position. Now you're doubtful; the raw experience (a friend making an argument with certain inherent plausibility) is the same, but the context (ie. your very low prior on the Republicans being right about something) makes it unlikely.

Still, this ought to work eventually. Your friend just has to give you a good enough argument. Each argument will do a little damage to your prior against Republican beliefs. If she can come up with enough good evidence, you have to eventually accept reality, right? But in fact many political zealots never accept reality. It's not just that they're inherently sceptical of what the other party says. It's that even when something is proven beyond a shadow of a doubt, they still won't believe it. This is where we need to bring in the idea of trapped priors.

Trapped priors: the basic cognitive version

Phobias are a very simple case of trapped priors. They can be more technically defined as a failure of habituation, the fancy word for "learning a previously scary thing isn't scary anymore". There are lots of habituation studies on rats. You ring a bell, then give the rats an electric shock. After you do this enough times, they're scared of the bell, they run and cower as soon as they hear it. Then you switch to ringing the bell and not giving an electric shock. At the beginning, the rats are still scared of the bell. But after a while, they realize the bell can't hurt them anymore. They adjust to treating it just like any other noise; they lose their fear, they habituate.

The same thing happens to humans. Maybe a big dog growled at you when you were really young, and for a while you were scared of dogs. But then you met lots of friendly cute puppies, you realized that most dogs aren't scary, and you came to some reasonable conclusion like "big growly dogs are scary but cute puppies aren't." The same with research and science. We get used to hearing certain things and presume they’re true.

Some people never manage to do this. They get cynophobia, pathological fear of dogs. In its original technical use, a phobia is an intense fear that doesn't habituate. No matter how many times you get exposed to dogs without anything bad happening, you stay afraid. Why?

In the old days, psychologists would treat phobia by flooding patients with the phobic object. Got cynophobia? We'll stick you in a room with a giant Rottweiler, lock the door, and by the time you come out maybe you won't be afraid of dogs anymore. Sound barbaric? Maybe so, but more important it didn't really work. You could spend all day in the room with the Rottweiler, the Rottweiler could fall asleep or lick your face or do something else that should have been sufficient to convince you it wasn't scary, and by the time you got out you'd be even more afraid of dogs than when you went in.

Nowadays we're a little more careful. If you've got cynophobia, we'll start by making you look at pictures of dogs - if you're a severe enough case, even the pictures will make you a little nervous. Once you've looked at a zillion pictures, gotten so habituated to looking at pictures that they don't faze you at all, we'll put you in a big room with a cute puppy in a cage. You don't have to go near the puppy, you don't have to touch the puppy, just sit in the room without freaking out. Once you've done that a zillion times and lost all fear, we'll move you to something slightly doggier and scarier, than something slightly doggier and scarier than that, and so on, until you're locked in the room with the Rottweiler.

It makes sense that once you're exposed to dogs a million times and it goes fine and everything's okay, you lose your fear of dogs, that's normal habituation. But now we're back to the original question, how come flooding doesn't work? Forgetting the barbarism, how come we can't just start with the Rottweiler?

The common-sense answer is that you only habituate when an experience with a dog ends up being safe and okay. But being in the room with the Rottweiler is terrifying. It's not a safe okay experience. Even if the Rottweiler itself is perfectly nice and just sits calmly wagging its tail, your experience of being locked in the room is close to peak horror. Probably your intellect realizes that the bad experience isn't the Rottweiler's fault. But your lizard brain has developed a stronger association than before between dogs and unpleasant experiences. After all, you just spent time with a dog and it was a really unpleasant experience. Your fear of dogs increases.

How does this feel from the inside? Less-self-aware patients will find their prior colouring every aspect of their interaction with the dog. Joyfully pouncing over to get a headpat gets interpreted as a vicious lunge; a whine at not being played with gets interpreted as a murderous growl, and so on. This sort of patient will leave the room saying 'the dog came this close to attacking me, I knew all dogs were dangerous!' More self-aware patients will say something like "I know deep down that dogs aren't going to hurt me, I just know that whenever I'm with a dog I'm going to have a panic attack and hate it and be miserable the whole time". Then they'll go into the room, have a panic attack, be miserable, and the link between dogs and misery will be even more cemented in their mind.

The more technical version of this same story is that habituation requires a perception of safety, but (like every other perception) this one depends on a combination of raw evidence and context. The raw evidence (the Rottweiler sat calmly wagging its tail) looks promising. But the context is a very strong prior that dogs are terrifying. If the prior is strong enough, it overwhelms the real experience. Result: the Rottweiler was terrifying. Any update you make on the situation will be in favour of dogs being terrifying, not against it:

This is the trapped prior. It's trapped because it can never update, no matter what evidence you get. You can have a million good experiences with dogs in a row, and each one will just etch your fear of dogs deeper into your system. Your prior fear of dogs determines your present experience, which in turn becomes the deranged prior for future encounters:

Trapped prior: the more complicated emotional version

The section above describes a simple cognitive case for trapped priors. It doesn't bring in the idea of emotion at all - an emotionless threat-assessment computer program could have the same problem if it used the same kind of Bayesian reasoning people do. But people find themselves more likely to be biased when they have strong emotions. Why?

Van der Bergh et al. suggest that when experience is too intolerable, your brain will decrease bandwidth on the "raw experience" channel to protect you from the traumatic emotions. This is why some trauma victims' descriptions of their traumas are often oddly short, un-detailed, and to-the-point. This protects the victim from having to experience the scary stimuli and negative emotions in all their gory details. But it also ensures that context (and not the raw experience itself) will play the dominant role in determining their perception of an event:

https://orbilu.uni.lu/bitstream/10993/30368/1/Van_den_Bergh_et_al_Symptom_Perception_NBBR_2017.pdf

You can't update on the evidence that the dog was friendly because your raw experience channel has become razor-thin; your experience is based almost entirely on your priors about what dogs should be like:

In earlier diagrams, I should have made it clear that a lot depended on the gray box choosing to weigh the prior more heavily than experience. In this diagram, less depends on this decision; the box is getting almost no input from experience, so no matter what its weighting function its final result will mostly be based on the prior. In most reasonable weighting functions, even a strong prior on scary dogs plus any evidence of a friendly dog should be able to make the perception slightly less scary than the prior, and iterated over a long enough chain this should update the prior towards dog friendliness. I don’t know why this doesn’t happen in real life, beyond a general sense that whatever weighting function we use isn’t perfectly Bayesian and doesn’t fit in the class I would call “reasonable”. I realize this is a weakness of this model and something that needs further study.

From phobia to bias

I think this is a fruitful way to think of cognitive biases in general. If I'm a Republican, I might have a prior that Democrats are often wrong or lying or otherwise untrustworthy. In itself, that's fine and normal. It's a model shaped by my past experiences, the same as my prior against someone’s claim to have seen a polar bear. But if enough evidence showed up, bear tracks, clumps of fur, photographs, I should eventually overcome my prior and admit that the bear people had a point. Somehow in politics that rarely seems to happen.

For example, more scientifically literature people are more likely to have partisan positions on science (eg. agree with their own party's position on scientifically contentious issues, even when outsiders view it as science-denialist). If they were merely biased, they should start out wrong, but each new fact they learn about science should make them update a little toward the correct position. That's not what we see. Rather, they start out wrong, and each new fact they learn, each unit of effort they put into becoming more scientifically educated, just makes them wronger. That's not what you see in normal Bayesian updating. It's a sign of a trapped prior.

Political scientists have traced out some of the steps of how this happens, and it looks a lot like the dog example: zealots' priors determine what information they pay attention to, then distorts their judgment of that information.

So, for example, in 1979 some psychologists asked people to read pairs of studies about capital punishment (a controversial issue at the time), then asked them to rate the methodologies on a scale from -8 to 8. Conservatives rated the pro-punishment study at about +2 and the anti-execution study as about -2; liberals gave an only slightly smaller difference the opposite direction. Of course, the psychologists had designed the studies to be about equally good, and even switched the conclusion of each study from subject to subject to average out any remaining real difference in study quality. At the end of reading the two studies, both the liberal and conservative groups reported believing that the evidence had confirmed their position, and described themselves as more certain than before that they were right. The more information they got on the details of the studies, the stronger their belief:

http://fbaum.unc.edu/teaching/articles/jpsp-1979-Lord-Ross-Lepper.pdf

This pattern, increasing evidence just making you more certain of your pre-existing belief, regardless of what it is, is pathognomonic of a trapped prior. These people are doomed.

In the 2016 election, Ted Cruz said he was against Hillary Clinton's "New York values". This sounded innocent - sure, people from the Heartland think big cities have a screwed-up moral compass. Bu various news sources argued it was actually Cruz's way of signalling support for anti-Semitism (because New York = Jews). Since then, almost anything any candidate from any party says has been accused of being a dog-whistle for something terrible - for example, apparently Joe Biden's comments about Black Lives Matter were dog-whistling his support for rioters burning down American cities.

Maybe this kind of thing is real sometimes. But think about how it interacts with a trapped prior. Whenever the party you don't like says something seemingly reasonable, you can interpret in context as them wanting something horrible. Whenever they want a seemingly desirable thing, you secretly know it means they want a horrible moral atrocity. If a Republican talks about "law and order", it doesn't mean they're concerned about the victims of violent crime, it means they want to lock up as many black people as possible to strike a blow for white supremacy. When a Democrat talks about "gay rights", it doesn't mean letting people marry the people they love, it means destroying the family so they can replace it with state control over your children. I've had arguments with people who believe that no pro-life conservative really cares about fetuses, they just want to punish women for having sex by denying them control over their bodies. And I've had arguments with people who believe that no pro-lockdown liberal really cares about COVID deaths, they just like the government being able to force people to wear masks as a sign of submission. Once you're at the point where all these things sound plausible, you are doomed. You can get a piece of evidence as neutral as "there's a deadly pandemic, so those people think you should wear a mask" and convert it into "they're trying to create an authoritarian dictatorship". And if someone calls you on it, you'll just tell them they need to look at it in context.

Reiterating the cognitive vs. emotional distinction: conclusions

The basic idea of a trapped prior is purely epistemic. It can happen (in theory) even in someone who doesn't feel emotions at all. If you gather sufficient evidence that there are no polar bears near you, and your algorithm for combining prior with new experience is just a little off, then you can end up rejecting all apparent evidence of polar bears as fake and trapping your anti-polar-bear prior. This happens without any emotional component.

Where does the emotional component come in? I think van der Bergh argues that when something is so scary or hated that it's aversive to have to perceive it directly, your mind decreases bandwidth on the raw experience channel relative to the prior channel so that you avoid the negative stimulus. This makes the above failure mode much more likely. Trapped priors are a cognitive phenomenon, but emotions create the perfect storm of conditions for them to happen.

Along with the cognitive and emotional sources of bias, there's a third source: self-serving bias. People are more likely to believe ideas that would benefit them if true; for example, rich people are more likely to believe low taxes on the rich would help the economy; minimum-wage workers are more likely to believe that raising the minimum wage would be good for everyone. Although I don't have any formal evidence for this, I suspect that these are honest beliefs; the rich people aren't just pretending to believe that in order to trick you into voting for it. I don't consider the idea of bias as trapped priors to account for this third type of bias at all; it might relate in some way that I don't understand, or it may happen through a totally different process.

If this model is true, is there any hope? If you want to get out of a trapped prior, the most promising source of hope is the psychotherapeutic tradition of treating phobias and PTSD. These people tend to recommend very gradual exposure to the phobic stimulus, sometimes with special gimmicks to prevent you from getting scared or help you "process" the information (there's no consensus as to whether the eye movements in EMDR operate through some complicated neurological pathway, work as placebo, or just distract you from the fear). A lot of times the "processing" involves trying to remember the stimulus multimodally, in as much detail as possible, for example drawing your trauma, or acting it out.

Generally, like many uses of “no evidence”, they meant that one particular study of this complicated question had failed to reject the null hypothesis. Recently we saw this:

Evidence in general means information, facts or data supporting (or contradicting) a claim, assumption or hypothesis, like the use of 'evidence' in legal settings. In fact, anything might count as evidence if it's judged to be valid, reliable and relevant. A person's assumptions or beliefs about the relationship between observations and a hypothesis will affect whether that person takes the observations as evidence. These assumptions or beliefs will also affect how a person utilizes the observations as evidence. For example, the Earth's apparent lack of motion may be taken as evidence for a geocentric cosmology. However, after sufficient evidence is presented for heliocentric cosmology and the apparent lack of motion is explained, the initial observation is strongly discounted as evidence.

A more formal method to characterize the effect of background beliefs is Bayesian inference. One starts from an initial probability (a prior), and then updates that probability using Baye’s theorem after observing evidence. As a result, two independent observers of the same event will rationally arrive at different conclusions if their priors (previous observations that are also relevant to the conclusion) differ. However, if they are allowed to communicate with each other, they will end in agreement.

Philosophers, such as Popper, have provided influential theories of the scientific method within which scientific evidence plays a central role. In summary, Popper provides that a scientist creatively develops a theory that may be falsified by testing the theory against evidence or known facts. Popper's theory presents an asymmetry in that evidence can prove a theory wrong, by establishing facts that are inconsistent with the theory. In contrast, evidence cannot prove a theory correct because other evidence, yet to be discovered, may exist that is inconsistent with the theory.

There have been a variety of 20th century philosophical approaches to decide whether an observation may be considered evidence; many of these focused on the relationship between the evidence and the hypothesis. But, while the phrase "scientific proof" is often used in the popular media, many scientists have argued that there is really no such thing. For example, Karl Popper wrote that "In the empirical sciences, which alone can furnish us with information about the world we live in, proofs do not occur, if we mean by 'proof' an argument which establishes once and for ever the truth of a theory." Albert Einstein said: “The scientific theorist is not to be envied. For Nature, or more precisely experiment, is an inexorable and not very friendly judge of his work. It never says "Yes" to a theory. In the most favourable cases it says "Maybe", and in the great majority of cases simply "No". If an experiment agrees with a theory it means for the latter "Maybe", and if it does not agree it means "No". Probably every theory will someday experience its "No"—most theories, soon after conception.”

Evidence and what constitutes data and our prior thinking, all contribute to rapidly changing truths, and what is reasonable to us:


Justin Stebbing
Managing Director

Vaccination campaigns in most countries have reached phased progress

Vaccination campaigns in most countries have reached phased progress

Record cases in France, Portugal, Turkey, Italy, Sweden, the Netherlands and Israel; Italy will mandate vaccines in >50 year olds in public sector…

Read more
Omicron leads to milder disease compared to the Delta variant

Omicron leads to milder disease compared to the Delta variant

Macron said “I am not about pissing off the French people. But as for the non-vaccinated, I really want to piss them off. And we will continue…

Read more
Evidence of Omicron taking over and even breaking through lockdowns has led to cases rising ‘everywhere’

Evidence of Omicron taking over and even breaking through lockdowns has led to cases rising ‘everywhere’

The UK released vaccine efficacy data against Omicron hospitalisations this week, showing an estimated efficacy of 88% 2 weeks after a booster dose,…

Read more
Top