Academic plagiarism and evolution

By David H Bailey, on January 6th, 2013

Introduction

Plagiarism, namely the copying of text or ideas of others, without explicit citation and/or permission, is considered a serious breach of ethics in the academic world. Even copying with permission is a breach of ethics in many environments, such as classroom instruction, where original, independent work is required of each individual. Nowadays many leading scientific journals and conferences employ sophisticated plagiarism-detecting software, which often can detect overlap of even a few consecutive words of text with previously published papers. Individual teachers and researchers often utilize Google searches and the like to ascertain whether written material is original.

Biologist Kenneth Miller recounts how he once detected an incidence of plagiarism in one of his biology classes. He found two papers that were curiously similar, even though there were attempts to disguise the fact — rearranged paragraphs, etc. But the clincher for the case was that each student had misspelled the same six words in exactly the same way. When confronted with this fact, the students recognized that the evidence was overwhelmingly against them, and they surrendered to the school’s disciplinary system [Miller2008, pg. 100].

“Thought experiment” of plagiarism

It is worth considering the following “thought experiment” of a more elaborate form of academic plagiarism that might arise in a classroom setting.

Suppose that a professor, who has given an assignment for a term paper, notes that six submitted papers have remarkable similarities, with extensive sections of identical or nearly identical material. Suppose further that the professor notices five instances of a particularly odd type of error, where a section of text that is otherwise nonsensical in context has been inserted into the text (say by accidental cut-and-paste), and that several of these error blocks appear in more than one of the term papers. Let the six students be denoted 1 through 6, the five error blocks be A through E, and let x or o denote whether this error is observed in the student’s paper. Suppose these error blocks can be organized as follows:

						Error blocks
			Student		A	B	C	D	E
        /---------	1		o	x	x	x	x
       /----------	2		x	x	x	x	x
      / \---------	3		x	x	x	x	x
     /------------	4		o	o	x	x	x
-----|------------	5		o	o	o	x	x
     \------------	6		o	o	o	o	o

It is clear that there is a hierarchical structure to these error blocks, as indicated in the diagram on the left: student #6’s paper appears close to the “original,” with student #5 copying from #6, student #4 copying from student #5, students #1, #2 and #3 copying from #4, and student #2 or #3 copying from the other. Another possibility is that these students copied from additional manuscripts, not seen here, which nonetheless also fell into this hierarchical pattern. But it is abundantly clear that extensive copying has occurred, and that the general who-copied-from-whom pattern shown above holds.

Few would argue with these conclusions, in part because otherwise it is exceedingly unlikely that any of these unusual errors would occur even once in a manuscript, let alone independently in different manuscripts, and it is even less likely that there would be large sections of identical material in all six papers.

Transposons and the primate lineage

“Transposons” or “jumping genes” are a highly unusual type of mutation where a section of DNA has been randomly copied from one part of an organism’s genome to another. Most of the time, these inserted genes do no damage, because they “land” in relatively unimportant sections of DNA. But they do provide an excellent means to classify species into their phylogenetic (“family tree”) relationship. This is because it is exceedingly unlikely that the same random insertion of an entire gene would occur at the same spot in the genomes of two or more different species, unless, of course, each inherited this curious feature from a common ancestor. Transposon data has been used in this way to classify species into a “family tree,” with a result that is virtually identical to what biologists had earlier reckoned based only physical features and biological functions [Rogers2011, pg. 25-30].

Here is an example of how transposon data can be used to determine the phylogenetic relationships (i.e., “family tree”) of various primates including humans. As is well known, the DNA of various primate species is almost identical, with well over 90% of DNA common between these species. Several transposons have been identified in human DNA and also in the DNA of gibbons, orangutans, gorillas, chimps and bonobos. In the table below, the columns labeled A through E denote five transposons, and x and o respectively denote that the transposon is present or absent in the genome of the given species:

						Transposon blocks
			Species		A	B	C	D	E
        /---------	Human		o	x	x	x	x
       /----------	Bonobo		x	x	x	x	x
      / \---------	Chimp		x	x	x	x	x
     /------------	Gorilla		o	o	x	x	x
-----|------------	Orangutan	o	o	o	x	x
     \------------	Gibbon		o	o	o	o	o

Needless to say, these data are precisely the same as the table above of hypothetical errors in student manuscripts. And just as it was abundantly clear from the above table that massive copying had occurred, with some students copying from others (or from other common sources), it is abundantly clear from this table that our closest primate relatives are chimpanzees and bonobos, with gorillas, orangutans and gibbons somewhat more distant, yet still closely related, and all six species deriving from a common ancestor. Note, for instance, that the chimp-bonobo-human lineage acquired one transposon after splitting from gorillas, and then the bonobo-chimp lineage acquired one after splitting from humans [Rogers2011, pg. 89; Salem2003].

Mutations in the Vitamin C gene

One other interesting example of this type is the “GULO” gene, which is an essential part of the machinery that makes Vitamin C in most animals. Humans lack a functioning copy of this gene — our copy is a highly mutated fragment, classified as a relic gene or pseudogene. Scurvy, that scourge of British sailors on the high seas and of Mormon pioneers crossing the plains, occurs in humans when they do not get enough Vitamin C in their diet. Interestingly, although the GULO pseudogene is highly mutated and utterly useless, humans and chimpanzees have almost identical copies of it — human and chimp versions are 98% identical. Evidently a common ancestor of humans and chimps adopted a diet rich in fruits and vegetables, and thus a chance mutation that disabled Vitamin C production was no longer a fatal one and was passed on to posterity [Fairbanks2007, pg. 53-55].

Equally compelling evidence can be seen by comparing other genes and biological proteins. For example, human beta globin, a component of hemoglobin in blood, is identical to that of chimpanzees, differs in only one location from that of gorillas, yet is increasingly distinct from that in red foxes, polar bears, horses, rats, chicken and salmon — see DNA. Anyone with an Internet connection can examine these databases first-hand to study the relative closeness of various species [Evolution2009].

DNA evidence for evolution

Evidence such as that mentioned above literally screams key assertions of biological evolution, namely common ancestry of related organisms (and indeed of the entire biological kingdom), and of the reality of mutations. So why are so many reluctant to accept these conclusions?

Consider for example the usage of DNA evidence in forensics. Biologist Sean Carroll mentions the case of Kevin Green, who in 1979 had been convicted of the attempted murder of his wife Dianna Green and the actual murder of her unborn child who died in the beating. But in 1996, forensic researchers at the California Department of Justice, after analyzing 17-year-old samples collected at the crime scene, concluded that the DNA matched not the convicted murderer but instead that of a different man who was then in prison on another charge. Green was thus released from prison [Carroll2006, pg. 13-14].

Indeed, DNA evidence, after decades of refinement of the underlying experimental procedures and methods of analysis, fully deserves its gold-standard reputation in the criminal forensic field. It is now commonplace for society to condemn accused persons to a lifetime in prison, and, in other cases, free convicted killers, all on the basis of DNA evidence. It is, quite literally, a matter of life and death.

And yet the underlying principles and techniques widely used in DNA analysis in forensics are essentially the same as those employed in biological phylogenetics, and are just as reliable. As Carroll has noted, DNA evidence “clinches the case for biological evolution as the basis for life’s diversity, beyond any reasonable doubt” [Carroll2006, pg. 17].

SMR blog