Landscape in Carina Nebula [Courtesy NASA]

Does probability refute evolution?

David H. Bailey
Updated 1 January 2020 (c) 2020


Both traditional creationists and intelligent design writers have invoked probability arguments in criticisms of biological evolution. They argue that certain features of biology are so fantastically improbable that they could never have been produced by a purely natural, "random" process, even assuming the billions of years of history asserted by geologists and astronomers. They often equate the hypothesis of evolution to the absurd suggestion that monkeys randomly typing at a typewriter could compose a selection from the works of Shakespeare, or that an explosion in an aerospace equipment yard could produce a working 747 airliner [Dembski1998; Foster1991; Hoyle1981; Lennox2009]. More recent studies of this genre, in an attempt to promote an "intelligent design" worldview, argue that functional biology operates on an exceedingly small subset of the space of all possible DNA sequences, and that any changes to the "computer program" of biology are, like changes to human computer programs, almost certain to make the organism non-functional [Axe2017; Marks2017].

One creationist-intelligent design argument goes like this: the human alpha-globin molecule, a component of hemoglobin that performs a key oxygen transfer function, is a protein chain based on a sequence of 141 amino acids. There are 20 different amino acids common in living systems, so the number of potential chains of length 141 is 20141, which is roughly 10183 (i.e., a one followed by 183 zeroes). These writers argue that this figure is so enormous that even after billions of years of random molecular trials, no human alpha-globin protein molecule would ever appear "at random," and thus the hypothesis that human alpha-globin arose by an evolutionary process is decisively refuted [Foster1991, pg. 79-83; Hoyle1981, pg. 1-20; Lennox2009, pg. 163-173].

The treacherous world of probability and statistics

While not generally appreciated by the public at large, it is a well-known fact in the world of scientific research that arguments based on probability and statistics are fraught with potential fallacies and errors, and even "expert" researchers can fool themselves with invalid reasoning. For these reasons, rigorous courses in probability and statistics are now required of students in virtually all fields of science, and in numerous other disciplines as well. Attorneys need to be at least moderately well-versed in probability and statistics arguments and how they can go awry in the courtroom arguments [Saini2009]. In the finance world, statistical overfitting and other errors of probability and statistics are now thought to be a leading reason behind the fact that many strategies and investment funds which look great on paper often fail miserably in real-world usage [Bailey2014].

To illustrate the difficulties with probability arguments, mathematics teachers often ask their class (let's say it has 30 students) if they think it is likely that two or more persons in the class have exactly the same birthday. Most students say that it is highly unlikely, thinking that the chances that two people have the same particular birthday is 1/365, and so 30 times this amount is only 30/365. But this argument is fallacious, since, for example, in a class of 30 students there are 435 pairs of students. When the probability calculation is done correctly for the case of 30 students [it is equal to 1 - (364/365 x 363/365 x ... x 336/365)], one obtains 70.6%. In general, if there are 23 or more students in the class, then the chances that two or more have the same birthday is greater than 50%.

For numerous other examples of how seemingly improbable "coincidences" can happen, see [Hand2014].

Fallacies in the creationist probability arguments

One major fallacy in the alpha-globin argument mentioned above, common to many others of this genre, is that it ignores the fact that a large class of alpha-globin molecules can perform the essential oxygen transfer function, so that the computation of the probability of a single instance is misleadingly remote. Indeed, most of the 141 amino acids in alpha-globin can be changed without altering the key oxygen transfer function, as can be seen by noting the great variety in alpha-globin molecules across the animal kingdom (see DNA). When one revises the calculation above, based on only 25 locations essential for the oxygen transport function (which is a generous over-estimate), one obtains 1033 fundamentally different chains, a enormous figure but incomparably smaller than 10183.

A calculation such as this can be refined further, taking into account other features of alpha-globin and its related biochemistry. Some of these calculations produce probability values even more extreme than the above. But do any of these calculations really matter? The main problem is that all such calculations, whether done accurately or not, suffer from the fatal fallacy of presuming that a structure such as human alpha-globin arose by a single all-at-once random trial event. But generating a molecule "at random" in a single shot is decidedly not the scientific hypothesis in question -- this is a creationist theory, not a scientific theory. Instead, available evidence from hundreds of published studies on the topic has demonstrated that alpha-globin arose as the end product of a long sequence of intermediate steps, each of which was biologically useful in an earlier context. See, for example, the survey article [Hardison2001], which cites 144 papers on the topic of hemoglobin evolution (note: this reference is now 17 years out of date -- many more have been published since then).

In short, the creationist-intelligent design argument claiming that scientists assert an all-at-once "at random" creation of various biomolecules, and then asserting that this is probabilistically impossible, is a classic "straw man" fallacy. Scientists do not believe this, so this line of argumentation is completely invalid. In other words, it does not matter how good or how bad the mathematics used in the analysis is, if the underlying model is a fundamentally invalid description of the phenomenon in question. Any simplistic probability calculation of evolution that does not take into account the step-by-step process by which the structure came to be is almost certainly fallacious and can easily mislead [Musgrave1998; Rosenhouse2018].

What's more, such calculations completely ignore the atomic-level biochemical processes involved, which often exhibit strong affinities for certain types of highly ordered structures. For example, molecular self-assembly occurs in DNA molecule duplication every time a cell divides. If we were to compute the chances of the formation of a human DNA molecule during meiosis, using a simple-minded probability calculation similar to that mentioned above, the result would be something on the order of one in 101,000,000,000, which is far, far beyond the possibility of "random" assemblage. Yet this process occurs many times every day in the human body and in every other plant and animal species.

Is evolution a "random" process?

It is also important to keep in mind that the process of natural biological evolution is not really a "random" process. Evolution certainly has some "random" aspects, notably mutations and genetic events during reproduction. But the all-important process of natural selection, acting under the pressure of an extremely competitive landscape, often involving thousands of other individuals of the same species and other species as well, together with numerous complicated environmental pressures such as climate change, is anything but random. This strongly directional nature of natural selection, which is the essence of evolution, by itself invalidates most of these probability calculations.

Hemoglobin and chlorophyll

With regards to hemoglobin, in particular, it has long been noted that heme, the key oxygen-carrying component of hemoglobin, is remarkably similar to chlorophyll, the molecule behind photosynthesis. The principal difference is that heme has a central iron atom, whereas chlorophyll has a central magnesium atom; otherwise they are virtually identical. This similarity can hardly be a coincidence, and in fact researchers concluded long ago, based on several lines of evidence, that these two biomolecules must have shared a common lineage (meaning, of course, that organisms which incorporate these biomolecules must have shared a common lineage) [Hendry1980]. Here is a diagram of the two molecules [from]:


Some of the difficulties with creationist probability arguments can be illustrated by considering snowflakes. Bentley and Humphrey's book Snow Crystals includes over 2000 high-resolution black-and-white photos of real snowflakes, each with intricate yet highly regular patterns that are almost perfectly six-way symmetric [Bentley1962]. A good online source with numerous high-resolution photographs has been compiled by Kenneth Libbrecht [Libbrecht2012]. Four of Bentley's photos are shown below. By employing a reckoning based on six-way symmetry, one can calculate the chances that one of these structures can form "at random" as roughly one part in 102500. This probability figure is even more extreme than some that have appeared in the creationist-intelligent design literature. So is this proof that each individual snowflake has been designed by a supernatural intelligent entity? Obviously not.

The fallacy here, once again, is presuming an all-at-once random assembly of molecules. Instead, snowflakes, like biological organisms, are formed as the product of a long series of steps acting under well-known physical laws, and the outcomes of such processes very sensitively depend on the starting conditions and numerous environmental parameters. It is thus folly to presume that one can correctly reckon the chances of a given outcome by means of superficial probability calculations that ignore the processes by which they formed.

image #1 image #2 image #3 image #4

Can English text be generated "at random"?

As mentioned above, some critics have equated the notion of natural evolution to the absurd suggestion that some monkeys typing randomly at a keyboard could generate a passage of Shakespeare. Others have argued that an evolutionary process could not possibly create a working "computer program," since any changes almost certainly would produce a non-functional result. But these too are fallacious arguments.

For example, a 2009 study by the present author exhibited results of a computer program simulating natural evolution, which "evolved" segments of English text very much akin to actual passages from Charles Dickens. In many instances, a class of college students were unable to distinguish the computer-generated text segments from real text segments taken from Dickens' Great Expectations. See English-text for details.

Computer programs produced by evolutionary processes

The fact that the information theory arguments against evolution cannot possibly be valid can be seen by the rise of computer programs that mimic the process of biological evolution to produce novel solutions to engineering problems, in many cases superior to the best human efforts. This approach has been termed "genetic algorithms" or "evolutionary computing." As a single example, in 2017 Google researchers generated 1000 image recognition algorithms, each of which were trained using state-of-the-art deep neural networks to recognize a selected set of images. They then used an array of 250 computers, each running two algorithms, to identify an image. Only the algorithm that scored higher proceeded to the next iteration, where it was changed somewhat, mimicking mutations in natural evolution. Google researchers found that their scheme could achieve accuracies as high as 94.6% [Gershgorn2017].

Closely related are advances in artificial intelligence, in which a set of computer programs "compete" to produce a superior program. One notable example is the 2016 defeat of the world's top Go player by a computer program named AlphaGo, developed by DeepMind (a subsidiary of Alphabet, Google's parent company), in an event that surprised observers who had not expected this for decades, if ever. Then in 2017, DeepMind announced even more remarkable results: their researchers had started from scratch, programming a computer with only the rules of Go, together with a "deep learning" algorithm, and then had the program play games against itself. Within a few days it had advanced to the point that it defeated the earlier champion-beating AlphaGo program 100 games to zero. After one month, the program's rating was as far above the world champion as the world champion was above a typical amateur [Greenmeier2017].

Improbable structures and features in nature

Numerous examples from the natural world can be cited to demonstrate the futility of trying to argue against evolution using probability -- nature can and often does produce highly improbable structures and features, by the well-understood evolutionary processes of mutation, shuffling of genes and natural selection:
  1. Lenski's 2012 E. coli experiment: In January 2012, a research team led by Richard Lenski at Michigan State University demonstrated that colonies of viruses can evolve a new trait in as little as 15 days. The researchers studied a virus, known as "lambda," which infects only the bacterium E. coli. They engineered a strain of E. coli that had almost none of the molecules that this virus normally attaches to, then released them into the virus colony. In 24 of 96 separate experimental lines, the viruses evolved a strain that enabled them to attach to E. coli, using a new molecule that they had never before been observed to utilize. All of the successful runs utilized essentially the same set of four distinct mutations. Justin Meyer, a member of the research team, noted that the chances of all four mutations arising "at random" in a given experimental line (based on a superficial probability argument) are roughly one in 1027 (one thousand trillion trillion) [Zimmer2012]. Note also that the chances for this to happen in 24 out of 96 experimental lines are roughly one in 10626.
  2. Synthesis of RNA nucleotides and other biomolecules: Many scientists hypothesize that RNA (a molecule similar to DNA) was involved in the origin of life (see Origin). As recently as 1999, the appearance of these nucleotides on the primitive Earth was widely thought to be a "near miracle" by researchers in the field [Joyce1999]. Nonetheless, in May 2009 a team led by John Sutherland of the University of Manchester discovered a particular combination of chemicals, very likely to have been plentiful on the early Earth, that synthesized the RNA nucleotides cytosine and uracil, which are known as the pyrimidines [Wade2009]. More recently, in 2016 a team led by Thomas Carell in Munich, Germany succeeded in synthesizing in the remaining two, cytosine and uracil, which known as the pyrimidines [Service2016]. Finally, in 2018, Carell's team demonstrated a single process that created all four nucleotides [Service2018]. In short, the natural production of the four RNA nucleotides, once thought to be "impossible," is now fairly well understood.
  3. Hawaiian crickets: In the 1990s, a population of crickets in Hawaii (a species introduced to the islands over 100 years ago) became victims of dive-bombing flies that targeted male crickets who were chirping to attract mates, then implanted their larvae in them. Recently, when researchers visited a region in Kauai that previously was the home to many of these chirping crickets, it was now completely silent, and they feared the crickets were now extinct in the area. Fortunately, nighttime searches found that in fact there were lots of crickets there, but very few of the males now chirped. Further study found that in just five years, or roughly 20 generations, a rather improbable mutation had arisen that inhibited the males from chirping, and this genetic trait had now spread to almost the entire population [Zuk2013, pg. 81-82].
  4. Tibetan high-altitude genes: In 2010, researchers analyzing DNA found that natives of the Tibetan highlands possess 30 unique genes that permit them to live well at very high altitudes: the genes foster more efficient metabolism, prevent the overproduction of red blood cells, and generate higher levels of substances that transmit oxygen to tissue. Given that the Tibetans separated from other Han Chinese only about 3,000 years ago, this is thought to be one of the fastest documented cases of evolution in humans [Wade2010b].
Many other examples could be listed -- see Novelty and Origin.

Dembski's information theory arguments

Intelligent design writer William Dembski invokes both probability and information theory (the mathematical theory of information content in data) in his arguments against Darwinism [e.g., Dembski2002; Marks2017]. However, mathematicians who have examined Dembski's works have identified major flaws in his reasoning [Elsberry2011]. For a detailed discussion of Dembski's theories, see Information theory.

Does creationism provide a reasonable alternative?

Does a creationist worldview, in particular the hypothesis of independent creation of each species with no common biological ancestry, provide a reasonable alternative in terms of probability?

Here it is instructive to consider transposons or "jumping genes," namely sections of DNA that have been "copied" from one part of an organism's genome and "pasted" seemingly at random in other locations. The human genome, for example, has over four million individual transposons in over 800 families [Mills2007]. In most cases transposons do no harm, because they "land" in an unused section of DNA, but because they are inherited they serve as excellent markers for genetic studies. Indeed, transposons have been used to classify a large number of vertebrate species into a family tree, with a result that is virtually identical to what biologists had earlier reckoned based only physical features and biological functions [Rogers2011, pg. 25-31, 86-92]. As just one example, consider the following table, where columns labeled ABCDE denote five blocks of transposons, and x and o denote that the block is present or absent in the genome [Rogers2011, pg. 89].

						Transposon blocks
			Species		A	B	C	D	E
        /---------	Human		o	x	x	x	x
       /----------	Bonobo		x	x	x	x	x
      / \---------	Chimp		x	x	x	x	x
     /------------	Gorilla		o	o	x	x	x
-----|------------	Orangutan	o	o	o	x	x
     \------------	Gibbon		o	o	o	o	o
It is clear from these data that our closest primate relatives are chimpanzees and bonobos. As another example, here is a classification of four cetaceans (ocean mammals) based on transposon data [Rogers2011, pg. 27]:
						Transposon blocks
		Species			A  B  C  D  E  F  G  H  I  J  K  L  M  N  O  P
    /------	Bottlenose dolphin	x  x  x  x  x  x  x  x  x  x  x  x  x  x  x  x
   /\------	Narwhal whale		x  x  x  x  x  x  x  x  x  x  x  x  x  x  x  x
---|-------	Sperm whale		x  x  x  x  x  o  o  o  o  o  o  o  o  o  o  o
   \-------	Humpback whale		x  x  o  o  o  o  o  o  o  o  o  o  o  o  o  o
Other examples could be listed, encompassing an even broader range of species [Rogers2011, pg. 25-31, 86-92].

Needless to say, these data, which all but scream "descent from common ancestors," are highly problematic for creationists and others who hold that the individual species were separately created without common biological ancestry. Transposons typically are several thousand DNA base pair letters long, but, since there are often some disagreements from species to species, let us be very conservative and say only 1000 base pair letters long. Then for two species to share even one transposon starting at the same spot, presumably only due to random mutations since creation, the probability (according to the creationist hypothesis) is one in 41000 or roughly one in 10600. For 16 such common transposons, the chances are one in 416000 or roughly one in 109600. What's more, as mentioned above, an individual species typically has at least several hundred thousand such transposons. Including even part of these in the reckoning would hugely multiply these odds.

But this is not all, because we have not yet considered the fact that in each diagram above, or in other tables of real biological transposon data, there is a clear hierarchical relationship. This is by no means assured, and in fact is quite improbable -- for almost all tables of "random" data, there is no hierarchical pattern, and no way to the rearrange the rows to be in a hierarchical pattern. For example, in a computer run programmed by the present author, each column of the above cetacean table was pseudorandomly shuffled (thus maintaining the same number of x and o in each column), and the program checked whether the rows of the resulting table could be rearranged to be in a hierarchical order. There were no successes in 10,000,000 trials. As a second experiment, a 4 x 16 table of pseudorandom data (with a 50-50 chance of x or o) was generated, and then the program attempted to rearrange the rows to be in a hierarchical pattern as before. There were only three successes in 10,000,000 trials.

Like the calculations mentioned earlier, these calculations are simplified and informal; more careful reckonings can be done, and one can vary the underlying assumptions. But, again, do the fine details of the calculations really matter? One way or the other, it is clear that the creationist hypothesis of separate creation does not resolve any probability paradoxes; instead it enormously magnifies them. The only other possibility, from a strict creationist worldview, is to posit that a supreme being separately created species with hundreds of thousands of transposons already in place, essentially just as we see them today.

But this merely replaces a scientific disaster (the utter failure of the creationist model to explain the vast phylogenetic patterns in intron data) with a theological disaster (why did a truth-loving supreme being fill the genomes of the entire biological kingdom with vast amounts of misleading DNA evidence, all pointing unambiguously to an evolutionary descent from common ancestors, if that is not the conclusion we are to draw?). Indeed, with regards to the discomfort some have about evolution, the creationist alternative of separate creation is arguably far worse, both scientifically and theologically.


It is true that there are some perplexing features of evolution from the point of view of probability. Even at the molecular level, structures are seen that appear to be exceedingly improbable. What is the origin of these structures? How did they evolve to the forms we see today? In spite of many published papers on these topics, researchers in the field would be the first to acknowledge that there is still much that is not yet fully understood.

However, arguments based on probability, statistics or information theory that have appeared in the creationist-intelligent design literature do not help unravel these questions, because these arguments have serious fallacies:

  1. They presume that a given biomolecule came into existence "at random" via an all-at-once chance assemblage of atoms. But this is not the scientific hypothesis of how they formed. Instead, numerous published studies, covering many biomolecules, indicate that these biomolecules were the result of a long series of intermediate steps over the eons, each useful in a previous biological context. Thus all such "straw man" arguments are fatally flawed from the beginning.
  2. They apply faulty mathematical reasoning, such as by ignoring the fact that a very wide range of biomolecules could perform a similar function to the given biomolecule. Thus the odds they provide against the formation of the given biomolecule are greatly exaggerated.
  3. They ignore the fact that biological evolution is fundamentally not a purely "random" process -- mutations may be random, but natural selection is far from random.
  4. They ignore reams of evidence from the natural world that evolution can and often does produce highly improbable structures and features.
  5. Some writers attempt to invoke advanced mathematical concepts (e.g., information theory), but derive highly questionable results and misapply these results in ways that render the conclusions invalid in an evolutionary biology context.
  6. The creationist hypothesis of separate creation for each species does not resolve any probability paradoxes; instead it enormously magnifies them.
It is ironic that to the extent that such probability-based arguments have any validity at all, it is precisely the creationist hypothesis of separate, all-at-once complete formation that is falsified.

Perhaps at some time in the distant future, a super-powerful computer will be able simulate with convincing fidelity the multi-billion-year biological history of the Earth, in the same way that scientists today attempt to simulate (in a much more modest scope) the Earth's weather and climate. Then, after thousands of such simulations have been performed, with different starting conditions, we might obtain some meaningful statistics on the chances involved in the origin of life, or in the formation of some class of biological structures such as hemoglobin, or in the rise of intelligent creatures such as ourselves.

Until that time, probability calculations that appear in creationist-intelligent design literature and elsewhere should be viewed with great skepticism, to say the least. As mathematician Jason Rosenhouse writes [Rosenhouse2018],

When biologists ascribe to evolution the ability to craft information-rich genomes, they are neither speculating nor guessing. The basic components of evolutionary theory are empirical facts. Genes really do mutate, sometimes leading to new functionalities. The process of gene duplication with subsequent divergence leads to the creation of information by any reasonable definition of the terms. Selection can string small variations together into directional change. On a small scale, this has all been observed. And if small increases in information are an empirical reality on human timescales, then what abstract principle of mathematics is going to rule out much larger increases on geological scales?

Then here come the ID [intelligent design] folks, full of swagger and bravado. They say the accumulated empirical evidence must yield before their back-of-the-envelope probability calculations and abstract mathematical modeling. Evolution should be abandoned in favor of the new theory of intelligent design. This theory states, in its entirety, that an intelligent agent of unspecified motives and abilities did something at some point in natural history. Not very useful.

In a larger context, one has to question whether highly technical issues such as calculations of probabilities have any place in a discussion of religion. Why attempt to "prove" God with probability, particularly when there are very serious questions as to whether such reasoning is valid? One is reminded of a passage in the New Testament: "For if the trumpet gives an uncertain sound, who shall prepare himself for the battle?" [1 Cor. 14:8]. It makes far more sense to leave such matters to peer-reviewed scientific research.

See DNA, English text, Information theory, Origin and Deceiver for additional related discussion.


[See Bibliography].