What does DNA reveal about evolution?

By David H Bailey, on November 29th, 2022

What does DNA reveal about evolution?

Updated 7 August 2023

Introduction

In the past few years, modern genome sequencing and computer technology have placed an enormous volume of DNA data only a mouse-click away from researchers worldwide. The first complete human genome sequence was completed in 2000, after a ten-year effort that cost approximately $2.7 billion. But now genomes can be sequenced at a cost of less than $300, and the price will likely fall to only $100 by 2023 [Vance2014; Fikes2017; [Coldewey2022]. Medical applications are the principal target of genome sequencing, but this same technology has also enabled biologists to study the genomes of thousands of other biological species, including many common (and not-so-common) plants and animals, thus permitting evolution to be studied at the most basic level.

Here are just a few of the recent results of these studies.

Amino acid data

One example of DNA-type data is the table below, which compares the 146-unit amino acid sequences of beta globin (a component of hemoglobin) among various species of animals. Amino acids are coded directly by triplets of DNA letters, and thus the study of amino acid sequences is very close to the study of DNA sequences themselves. In the table below, note that human beta globin is identical to that of chimpanzees, differs in only one location from that of gorillas, yet is increasingly distinct from that in red foxes, polar bears, horses, rats, chicken and salmon. Anyone with an Internet connection can generate similar data using online tools and databases:

Percent Agreement between Beta Globin of Various Species
Species	Human	Chimp	Gorilla	Red fox	Dog	Polar bear	Horse	Rat	Chicken	Salmon
Human	100.	100.	99.3	91.1	89.7	89.7	83.6	81.5	69.2	49.7
Chimp	100.	100.	99.3	91.1	89.7	89.7	83.6	81.5	69.2	49.7
Gorilla	99.3	99.3	100.	91.8	90.4	90.4	82.9	80.8	68.5	49.0
Red fox	91.1	91.1	91.8	100.	98.6	95.2	80.8	80.1	72.6	49.7
Dog	89.7	89.7	90.4	98.6	100.	94.5	80.1	79.5	71.2	49.0
Polar bear	89.7	89.7	90.4	95.2	94.5	100.	80.8	82.9	71.9	48.3
Horse	83.6	83.6	82.9	80.8	80.1	80.8	100.	76.0	67.8	46.3
Rat	81.5	81.5	80.8	80.1	79.5	82.9	76.0	100.	65.8	49.7
Chicken	69.2	69.2	68.5	72.6	71.2	71.9	67.8	65.8	100.	54.4
Salmon	49.7	49.7	49.0	49.7	49.0	48.3	46.3	49.7	54.4	100.

Mutations

The picture is the same if we consider the pattern of mutations between closely related species. For example, the gene that, when mutated, results in cystic fibrosis in humans is nearly identical to the corresponding gene in chimpanzees, but is progressively less similar to the corresponding gene in orangutans, baboons, marmosets, lemurs, mice, chicken and puffer fish [NAS2008, pg. 30]. As yet another example, cytochrome c, which is essential for cell respiration, differs only in one location out of 104 between humans and rhesus monkeys. Comparing humans and horses, there as 12 differences; comparing rhesus monkeys with horses, there are 11 differences. Evidently the single difference between humans and rhesus monkeys occurred after our hominid ancestors split from the lineage that led to present-day monkeys [Ayala2007, pg. 128-129].

One particularly interesting example that has recently been uncovered is the “GULO” gene, which is an essential part of the machinery that makes Vitamin C in most animals. Humans lack a functioning copy of this gene — our copy is highly mutated fragment, classified as a relic gene or pseudogene. Scurvy, that scourge of British seamen, Mormon pioneers crossing the Great Plains, and millions in poor regions worldwide even today, results when humans don’t get enough Vitamin C. Interestingly, although the GULO pseudogene is highly mutated and utterly useless, humans and chimpanzees have almost identical copies of it — the human and chimp versions are 98% identical. Evidently a common ancestor of humans and chimps adopted a diet rich in fruits and vegetables, and thus a chance mutation that disabled Vitamin C production was no longer a fatal one and was passed on to posterity [Fairbanks2007, pg. 53-55; Coyne2009, pg. 67-69].

Transposons

Transposons or “jumping genes” are sections of DNA that have been “copied” from one part of an organism’s genome and “pasted” seemingly at random in other locations. The human genome, for example, has over four million individual transposons in over 800 families [Mills2007]. In most cases transposons do no harm, because they “land” in an unused section of DNA, but because they are inherited they serve as excellent markers for genetic studies. Indeed, transposons have been used to classify a large number of vertebrate species into a family tree, with a result that is virtually identical to what biologists had earlier reckoned based only physical features and biological functions [Rogers2011, pg. 25-31, 86-92]. As just one example, consider the following table, where columns labeled ABCDE denote five blocks of transposons, and x and o denote that the block is present or absent in the genome [Rogers2011, pg. 89].

                                Transposon blocks
                    Species     A   B   C   D   E
        /--------   Human       o   x   x   x   x
       /---------   Bonobo      x   x   x   x   x
      / \--------   Chimp       x   x   x   x   x
     /-----------   Gorilla     o   o   x   x   x
-----|-----------   Orangutan   o   o   o   x   x
     \-----------   Gibbon      o   o   o   o   o

It is clear from these data that our closest primate relatives are chimpanzees and bonobos. As another example, here is a classification of four cetaceans (ocean mammals) based on transposon data [Rogers2011, pg. 27]:

                                 Transposon blocks
             Species             A B C D E F G H I J K L M N O P
    /------  Bottlenose dolphin  x x x x x x x x x x x x x x x x
   /\------  Narwhal whale       x x x x x x x x x x x x x x x x
---|-------  Sperm whale         x x x x x o o o o o o o o o o o
   \-------  Humpback whale      x x o o o o o o o o o o o o o o

Other examples could be listed, encompassing an even broader range of species [Rogers2011, pg. 25-31, 86-92].

Needless to say, these data, which all but scream “descent from common ancestors,” are highly problematic for creationists and others who hold that the individual species were separately created without common biological ancestry. Transposons typically are several thousand DNA base pair letters long, but, since there are often some disagreements from species to species, let us be very conservative and say only 1000 base pair letters long. Then for two species to share even one transposon starting at the same spot, presumably only due to random mutations since creation, the probability (according to the creationist hypothesis) is one in 4¹⁰⁰⁰ or roughly one in 10⁶⁰⁰. For 16 such common transposons, the chances are one in 4¹⁶⁰⁰⁰ or roughly one in 10⁹⁶⁰⁰. What’s more, as mentioned above, an individual species typically has at least several hundred thousand such transposons. Including even part of these in the reckoning would hugely multiply these odds.

But this is not all, because we have not yet considered the fact that in each diagram above, or in other tables of real biological transposon data, there is a clear hierarchical relationship. This is by no means assured, and in fact is quite improbable — for almost all tables of “random” data, there is no hierarchical pattern, and no way to the rearrange the rows to be in a hierarchical pattern. For example, in a computer run programmed by the present author, each column of the above cetacean table was pseudorandomly shuffled (thus maintaining the same number of x and o in each column), and the program checked whether the rows of the resulting table could be rearranged to be in a hierarchical order. There were no successes in 10,000,000 trials. As a second experiment, a 4 x 16 table of pseudorandom data (with a 50-50 chance of x or o) was generated, and then the program attempted to rearrange the rows to be in a hierarchical pattern as before. There were only three successes in 10,000,000 trials.

Chromosome fusion in humans

DNA evidence has also dramatically confirmed some earlier conjectures. For example, scientists noted long ago that humans have only 23 pairs of chromosomes, whereas other great apes — chimpanzees, bonobos, gorillas and orangutans — have 24. Thus they were led to conjecture that two of the human chromosomes have fused since the split between ancestral human and ape lineages. This hypothesis gained credence in 1982, when scientists found that chromosomes from humans, chimpanzees, gorillas and orangutans are highly similar and can be aligned with one another, with human chromosome #2 corresponding to the slightly overlapped union of ape chromosomes 2A and 2B. The final confirmation came in 1991 from a detailed analysis of human DNA, which found two complementary telomeres (repeated sequences of a certain DNA string that appear at the end of a chromosome) spanning the exact spot of union [Fairbanks2007, pg. 20-27; Fairbanks2012, pg. 135-139]:

                  Fusion site
                       |
... TTAGGGG TTAGGG TTAG CTAA CCCTAA CCCTAA ...
... AATCCCC AATCCC AATC GATT GGGATT GGGATT ...
                       |

(Note that the second row is almost exactly a reversal of the first, pivoted about the fusion site.)

As it turns out, there are numerous other examples of chromosome rearrangements in the human genome. For example, inversions (where a segment of DNA consisting of hundreds of thousands or millions of base pairs is inverted) have occurred within several of our chromosomes, and can be employed as a means of understanding our species’ evolutionary history. For example, there are nine inversions that distinguish human and chimpanzee chromosomes, each of which must have arisen in either the human or chimpanzee ancestral lineage after these two lineages diverged. Comparisons with other primates show that inversions that have been identified in human chromosome 1 and 18 occurred exclusively in the human lineage, while those in chromosomes 4, 5, 9, 12, 15, 16 and 17 happened exclusively in the chimpanzee lineage [Fairbanks2012, pg. 139]. Numerous other such examples are given in Fairbank’s book Evolution: The Human Effect and Why It Matters [Fairbanks2012].

The genetic code and evolution

Although the “genetic code,” namely the system of assigning 3-letter DNA sequences to one of the 20 amino acids employed in biology, is universal over almost all the biological kingdom, there are a few exceptions. For example, mitochondria, the little “islands” within a cell that generate most of the cell’s chemical energy, employ a slightly different version. In total, scientists have by now identified 34 different codes in the biological kingdom. Yet, as biologist Kenneth Miller observes, these variant genetic codes are all neatly arranged in a hierarchical pattern, like variant dialects of English, which pattern is compelling evidence for their common ancestry [Miller2001]. An overview of the genetic code and how biologists are now exploiting their knowledge of the code for a wave of new developments in medicine and pharmaceuticals, is given in a recent Nautilus article by biologist Carl Zimmer [Zimmer2013].

DNA data and the human-chimpanzee split

Researchers are also combining analyses of DNA sequences with paleontological (fossil) data, resulting in more precise determinations of various branches in the tree of life. For example, a study published in November 2010 that combined both paleontological and molecular data established that divergence of humans and chimpanzees very likely took place at least eight million years in the past instead of five to six million years, as generally believed until recently [SD2010d; Wilkinson2010]. In November 2012, these assertions gained greater empirical support from new studies that observed a rate of 36 new mutations per human generation (half the earlier estimate), which was obtained from whole-genome DNA sequencing of 78 children and their parents. As a result of such analyses, the current consensus is that the human-chimpanzee split occurred between 7 and 13 million years ago [Brahic2012]. This is shown in the following diagram, courtesy of New Scientist:

DNA and the human-Neanderthal-Denisovan family tree

In December 2013, a team of researchers led by Svante Paabo of the Max Planck Institute for Evolutionary Anthropology in Leipzig, Germany announced that they had extracted the entire genome of a 130-year-old Neanderthal from a single toe bone found in a cave in Siberia. Sarah Tishkoff of the University of Pennsylvania commented on this development by saying, “Twenty years ago, I would have thought this would never be possible.” This data, together with other recent studies, has established that humans, Denisovans (see next paragraph) and Neanderthals represent three different branches on a common tree that diverged roughly 600,000 years ago, but that there has been significant genetic sharing through interbreeding since then [Zimmer2013c].

The Denisovan branch of the prehuman tree was first revealed in March 2010, when Paabo and his team announced the results of analyzing mitochondrial DNA (mtDNA) found in a finger and tooth found in the Denisova region of Siberia. This species co-existed with humans until as recently as 50,000 years ago, yet is roughly twice as distant (measured in terms of the time since a common ancestor) from modern humans as Neanderthals. As explained in a Scientific American report [Wong2010a]. A follow-up study published in December 2010, based on an entire genome sequence of the specimen, found that not only do the Denisovans represent a “sister” species to Neanderthals, but that in fact this race of prehumans evidently interbred with Southeast Asian humans, since the genomes of modern-day New Guinea natives contain 4.8% Denisovan DNA [Zimmer2010a]. What’s more, an additional study published in August 2011 noted that interbreeding with Neanderthals and Denisovans actually boosted human immunity to viruses [McGrath2011].

In a surprising new development, announced in December 2013, researchers at the Max Planck Institute retrieved DNA from an ancient hominin fossil 400,000 years old found in Spain. It is easily the oldest specimen ever to have its DNA analyzed. These researchers had expected the specimen to be a forerunner of Neanderthals, but its DNA more closely resembles that of the Denisovan lineage, mentioned above. This raises the possibility that the Spanish specimen might belong to yet another branch of ancient prehumans, or even the remnant of Homo Erectus, which originated roughly 1.8 million years ago but was thought to be extinct more recently. Either way, researchers are puzzled and excited by the new discovery [Callaway2013b; Zimmer2013b].

Along this line, researchers have utilized DNA analysis to determine that the human colonization of Asia proceeded in two separate waves, the first of which, approximately 70,000 years ago, continued to Australia and formed the basis for the Aboriginal population in modern-day Australia. The second wave, approximately 30,000 years ago, proceeded more northerly and ended in Malaysia. As part of their analysis, these researchers sequenced the genome of an Aboriginal Australian man, using a 100-year-old lock of hair stored in London’s Natural History Museum [Marshall2011b].

The family tree of spiny-finned fish

Recently an international team of researchers presented research on the evolution and proliferation, over the past 150 million years, of spiny-rayed fish (also known as spiny-finned fish, but more formally as acanthomorphs), which constitute nearly one-third of all currently living vertebrate species on Earth. According to the authors, this group of fish species has long been “the last frontier” in reconstructing the family tree of modern vertebrates. The team’s research was based on DNA-genetic analysis, but they also correlated their results with known events in paleontological (fossil) history, such as the Cretaceous-Paleogene mass extinction event 66 million years ago. The authors summarized their findings in a lovely graphic, which may be viewed at Fossils, or directly at the PNAS site. For details on the research, see [Frazer2013; Near2013].

“Missing link” discovered between bacteria and eukaryotes

In May 2015, researchers at Upssala University in Sweden announced that they discovered a new species of archaea (microbes of a ancient branch of the biological kingdom) in sediment at the bottom of the Arctic Ocean. DNA analysis of these archaea found that they are more closely related to eukaryotes (the branch of the biological kingdom that includes all animals, plants, and fungi) than any other species of archaea, or anything else for that matter. Thus it may well be that this new species is a long-sought “missing link” between archaea and eukaryotes [Zimmer2015a].

DNA and phylogenetics

Another research arena that is exploding with activity is phylogenetics — analyzing DNA of groups of existing species to reconstruct their “family tree.” The latest techniques employ advanced statistical methods (e.g., “maximum likelihood analysis”), running on powerful computer systems. Soon much of evolutionary history will be deducible purely from this type of automatic computer-based analysis. Already, significant results have been obtained in this area. In May 2010, a researcher announced, on the basis of a very carefully performed statistical analysis, that the hypothesis of a “universal common ancestor” (a conjecture, dating back to Charles Darwin, that all life arose from a single common ancestral organism) has been resoundingly confirmed. The author, Prof. Douglas L. Theobald of Brandeis University, found that the universal common ancestor hypothesis is at least 10²⁸⁶⁰ times more likely to have produced the modern-day protein sequences that we observe in living organisms, compared to the next most probable scenario that involves multiple original ancestors [Harmon2010; Theobald2010].

It should be emphasized that DNA-based phylogenetics, like any other experimental discipline, does not produce infallible results. When different genes are analyzed, sometimes slightly different “trees” are produced. Even analyses of simple organisms such as yeast species sometimes produce slightly different results. Thus an active area of research at the present time is to apply advanced statistical methods to DNA data to identify sets of genes that are the most reliable to be used in phylogenetic determinations [Singer2013]. However, these discrepancies tend to be rather minor and do not in any way cast doubt on the overall methodology. No one familiar with this data questions that DNA sequences provide a reliable genetic record of evolution.

Along this line, a research study found that there are some 50 trillion trillion trillion (i.e., 5 x 10³⁷) DNA base pairs (“letters”) among all the biological organisms presently on planet Earth. So scientists have plenty of data yet to study! [Nuwer2015].

DNA in forensics

Along this line, it is worth pointing out that DNA evidence is widely used in criminal forensics. Biologist Sean Carroll mentions the case of Kevin Green, who in 1979 had been convicted of the attempted murder of his wife Dianna Green and the actual murder of her unborn child who died in the beating. But in 1996, forensic researchers at the California Department of Justice, after analyzing 17-year-old samples collected at the crime scene, concluded that the DNA matched not the convicted murderer but instead that of a different man who was then in prison on another charge. Green was thus released from prison [Carroll2006, pg. 13-14].

Indeed, DNA evidence, after decades of refinement of the underlying experimental procedures and methods of analysis, fully deserves its gold-standard reputation in the criminal forensic field. It is now commonplace for society to condemn accused persons to a lifetime in prison, and, in other cases, free convicted killers, all on the basis of DNA evidence. It is, quite literally, a matter of life and death. And yet the underlying principles and techniques widely used in DNA analysis in forensics are essentially the same as those employed in biological phylogenetics, and are just as reliable.

Summary

The explosion of genome sequences and DNA data banks in recent years has provided an enormous storehouse of data for biologists. Analyses of these data have dramatically confirmed the central tenets of evolution, including the common ancestry of all biological organisms, all arranged convincingly in a phylogenetic family tree, in most cases exactly as had been previously reckoned based solely on similarities of physical forms and biological functions. As anthropologist Alan R. Rogers recently noted, “Phylogenetic pattern is everywhere in nature. It makes sense only if all living things evolved from a single ancestor.” [Rogers2011, pg. 31].

Biologist Sean Carroll adds that DNA evidence “clinches the case for biological evolution as the basis for life’s diversity, beyond any reasonable doubt” [Carroll2006, pg. 17]. Similarly, geneticist Daniel J. Fairbanks emphasizes that [Fairbanks2007, pg. 170]:

[The] obvious hierarchical arrangement of life, and the literally millions of ancestral relics in our DNA — all undeniably attest to our common evolutionary origin with the rest of life. If someone can believe that all living organisms share the same creator, why not consider that all living organisms share a common genetic heritage?

SMR blog