Mathematics and scientific fraud

By David H Bailey, on November 3rd, 2011

From time to time, the scientific community is rocked with cases of scientific fraud. Needless to say, such incidents do not help instill confidence in the public mind that is already predisposed to be skeptical of inconvenient scientific findings, including biological evolution and global warming.

One notable case of fraud came to light in 2002, when Bell Labs researcher Hendrik Schon, once described as a “rising star” in the field of nanoelectronics, was accused of fraud by a review panel consisting of several prominent scientists, including physicist Malcom Beasley of Stanford University [Samuel2002]. Most of the 25 papers in question were published in prestigious journals such as Science, Nature and Applied Physics Letters.

A 2008, Science article started:

The only two peer-reviewed scientific papers showing that electromagnetic fields (EMFs) from cell phones can cause DNA breakage are at the center of a misconduct controversy at the Medical University of Vienna (MUV). Critics had argued that the data looked too good to be real, and in May a university investigation agreed, concluding that data in both studies had been fabricated and that the papers should be retracted [Vogel2008].

In November 2011, Netherlands psychologist Diederik Stapel was accused of publishing “several dozen” articles with falsified data. Stapel’s papers were certainly provocative. One claimed that disordered environments such as littered streets make people more prone to stereotyping and discrimination [Stapel2011]. After being challenged in an “editorial expression of concern” in Science, Stapel confessed that the allegations were largely correct [Carey2011].

How could such frauds have happened? Firstly, scientific investigation is premised on open enquiry and treating every new result as a potential fraud is both antithetical and destructive. In general, false findings such as the cell phone case are easier to uncover than “prettifying” — which in some cases comes from enthusiastic assistants “cleaning” the data to assist the case. It is said that other monks knew Brother Mendel liked to see pretty plots of peas and would weed strays of the wrong colour.

Jonathan Schooler of the University of California, Santa Barbara, says that “the big problem is that the culture is such that researchers spin their work in a way that tells a prettier story than what they really found.” This culture is certainly cultivated by media reports in which every advance must be a breakthrough. In Stapel’s case, he was able to operate for so long because he was “lord of the data,” the only person who saw the data. What’s more, he did not make this data available for other researchers, a practice that Jelte Wicherts of the University of Amsterdam termed “a violation of ethics rules established in the field” [Carey2011].

It is worth examining the role of mathematics in general, and statistics in particular, in the disclosure of these frauds. In the case of Stapel’s work, researchers found “anomalies in this material, including suspiciously large experimental effects and a lack of ‘outliers’ in the data” [Aldous2011]. A lack of outliers and unlikely distributions are tell-tale signs of poorly constructed artificial data (see http://en.wikipedia.org/wiki/Benford’s_law).

Even setting aside outright fraud, statistical sloppiness pervades some fields. This is especially true in clinical medical research and in the social sciences where many of the researchers are poorly trained quantitatively. In a 2011 analysis published by Hekte Wicherts and Marjan Bakker of the University of Amsterdam, about half of 281 psychology journal papers they examined contained some statistical error, and that about 15 percent had at least one error that would have changed the reported finding, “almost always in opposition to the authors’ hypothesis” [Carey2011].

There is even at least one instance of statistical methods being used to detect a problem in a mathematical result. In 1872, Augustus De Morgan noted (in a posthumously published collection) that the digit 7 occurred too few times in the expansion of pi to 606 digits published in 1853 by Shanks (Shanks later extended his calculation to 707 digits, but these were not yet available; and he did not correct the earlier digits). De Morgan wrote, “It is 45 to 1 against the number of 7s being as distant from the probable value (say 61) as 44 on one side or 78 on the other. There must be a reason why the number 7 is thus deprived of its fair share in the structure.” Indeed, in 1945 Ferguson, in one of the first machine-assisted calculations, found that Shanks’ expansion was in error after the first 527 digits, evidently because he omitted two terms in the expansion.

However, as George Marsaglia recently noted, De Morgan’s analysis was somewhat faulty. Rather than singling out 7s, he should have assessed the chances that the least-frequent digit would appear 44 or fewer times among 606 digits. As it turns out, there is a roughly 10% chance of this occurring, so that the statistical case for error is not as compelling as De Morgan thought [Marsaglia2005].

Let us emphasize that such scientifc fraud is the exception not the rule. Our cursory search of Science’s archive showed about half-a-dozen headline cases in the past ten years. Business, politics or law would not fair as well.

In any event, it is clear that: (a) more care needs to be taken in using statistical methods in scientific and mathematical research; and (b) statistical methods can and should, to a greater extent, be used to detect fraud and manipulation of data (deliberate or not). Perhaps the considerable attention drawn to the recent incidents will lead to more rigorous analyses, and more circumspect behavior by scientists. We shall see.

This was also posted (co-authored with Jonathan M. Borwein) at Experimental math blog.

References

[Aldous2011] Peter Aldous, “Psychologist admits faking data in dozens of studies”, New Scientist, 2 Nov 2011, available at Online article.
[Carey2011] Benedict Carey, “Fraud Case Seen as a Red Flag for Psychology Research,” New York Times, 2 Nov 2011, available at Online article.
[Marsaglia2005] George Marsaglia, “On the Randomness of Pi and Other Decimal Expansions,” available at Online article.
[Samuel2002] Eugenie Samuel, “Rising star of electronics found to have fabricated his ground-breaking results,” New Scientist, 5 Oct 2002, available at Online article.
[Stapel2011] Diederik A. Stapel and Siegwart Lindenberg, “Coping with Chaos: How Disordered Contexts Promote Stereotyping and Discrimination,” Science, 8 Apr 2011, available at Online article.
[Vogel2008] Gretchen Vogel, “Fraud Charges Cast Doubt on Claims of DNA Damage From Cell Phone Lines, Science 29 August 2008, vol. 321 no. 5893 pp. 1144-1145, available at Online article

SMR blog

Mathematics and scientific fraud

References

Recent Posts

Meta