There is one argument today that seems able to trump all others. It is not really an argument, though we tend to treat it as one, and it is one that is often considered definitive: “A study has found …”
This phrase seems to many people to have an almost religious gravity to it, lending to whatever finding is cited a sort of oracular status. Just as someone in the Middle Ages might have believed anything uttered by someone wearing a mitre, so many people today will believe almost anything uttered by someone with a report in hand.
“A study has found …” serves the same function today as “Thus saith the Lord …” might have served five hundred years ago.
But there are two reasons we should treat any citation of a scientific study with skepticism. The first has to do with the inherently tentative nature of science and scientific reasoning. Such reasoning can render a conclusion probable at best. I dealt with this problem in a previous article “Science’s Useful Fallacy.”
The second has to do with the state of current scientific research. In his book, Rigor Mortis: How Sloppy Science Creates Worthless Cures, Crushes Hope, and Wastes Billions, NPR science writer Richard Harris explains the many factors that go into the bad science that gets passed on through research.
How big is the problem? “Each year,” he says,
about a million biomedical studies are published in the scientific literature. And many of them are simply wrong. Set aside the voice-of-God prose, the fancy statistics, and the peer review process, which is supposed to weed out the weak and errant. Lots of this stuff just doesn’t stand up to scrutiny…. Sometimes the scientist has unconsciously willed the data to tell a story that’s not in fact true. Occasionally there is outright fraud. But a large share of what gets published is wrong.
Harris focuses on biomedicine, but the problem is endemic throughout the research community. Sloppy practices, cut corners, ignorance of proper procedures, mischievous incentives, cheating, and sheer apathy abound.
The most significant aspect of the problem is called the “Replication Crisis”: the inability to replicate the results of the original study. Replication has been called the “gold standard” of scientific research. If a study has not been re-performed—and the original results replicated—it is not considered reliable. This replication is necessary because studies often fail to produce the same results twice.
When C. Glenn Begley was preparing to leave his job as a researcher with Amgen, he decided to scour the research literature for promising new drugs and found fifty-three studies that appeared groundbreaking. Company scientists repeated the experiments to see if they could reproduce the same results. Of the fifty-three studies, they could reproduce the results of only six. The German drug company Bayer conducted a similar survey and was able to replicate only 25% of the studies.
And often when someone decides to try and replicate an experiment, there are problems getting the raw data from the original research, or the original researcher is not interested in replicating the study—or even resistant to doing so and hostile to those who might try. Begley also points out that many experiments are so poorly designed that even if they could be successfully replicated, it wouldn’t mean anything.
But, even worse, faulty studies often continue to be cited long after they have failed replication. Steve Goodman, a former biostatistician and epidemiologist at Johns Hopkins University and founder of the Meta-Research Innovation Center at Stanford (METRICS), has discovered numerous examples of this.
“Years after two of the largest and most expensive medical studies ever undertaken had debunked the claim that vitamin E reduces heart disease,” says Harris of Goodman’s findings, “half of all articles on the subject still cited the original study favorably.”
“These results get entrenched,” said Goodman. “You cannot get rid of them.”
Begley said one of the studies he couldn’t reproduce has been cited more than two thousand times by other researchers, who have been building on or at least referring to it, without actually validating the underlying result.
Goodman’s METRICS co-director John Ioannidis, a professor of medicine and statistics at Stanford University, is well-known for his contention that most published research is badly done. He wrote a now infamous paper in 2005, entitled “Why Most Published Research Findings Are False,” one of the most highly read and downloaded scientific papers on the internet. Ioannidis argues that simply by looking at the way scientific research is designed and executed you can tell that most research conclusions are false-positives.
In their paper “Reproducibility in Science,” Begley and Ioannidis assert that 75-90% of preclinical medical research is irreproducible and that 85% of biomedical research is “wasted at-large” because of faulty research practices.
The replication problem in the soft sciences is even worse. Harris points to the work of University of Virginia psychology professor Brian Nosek, who opened the Center for Open Science to combat the problem of shoddy research in his field. Over a period of several years, he and his colleagues attempted to replicate one hundred studies in his field. The result was an August 28, 2015, New York Times headline: “Psychology’s Fears Confirmed: Rechecked Studies Don’t Hold Up.” Says Harris, “Two-thirds of the reproduced results were so weak that they didn’t reach statistical significance.”
There is the problem of studies that can’t be replicated, and then there is the problem of the number of studies that no one has even bothered to try and replicate, which is the majority of them. This is a particular problem in fields like education.
In a meta-study released in 2014 by the Educational Researcher, Matthew C. Makel of Duke University and Jonathan A. Plucker of the University of Connecticut, conducted a wide-ranging analysis of educational research to determine how many of the education studies published in the one hundred most prominent education journals had been replicated.
Of the 164,589 studies published in these education journals, only 221 of them were replications, an overall replication rate of .13%. Of the studies that were replicated, only 67.4% were successful, but 48.2%—nearly half of these replications—were conducted by the same people who did the original study, a bad research practice in and of itself.
In other words, less than 1% of education studies in the one hundred most prominent education journals meet the bedrock standard of replication, meaning that there is over a 99.9% chance that when someone quotes a study in the field of education, it cannot be relied upon as definitive.
The problem, of course, is that those who are now questioning the quality of scientific research are not popular with researchers themselves, since their questions dampen the excitement produced by the latest headline-grabbing finding. These questions also seldom draw the attention of the popular media, which benefits from publishing the latest study that promises to cure cancer or halt the aging process.
No one mentions these things when they say, “A study has found …” They should.