Triplet Arp 274 [Courtesy NASA] Sistine Chapel #1 [courtesy Wikimedia]

Can an evolutionary process generate English text?

David H. Bailey
1 Jan 2017 (c) 2017


One central issue in the debate over Darwinian evolution is the question of evolutionary novelty -- can evolution produce truly novel features? Creationist and intelligent design writers have insisted that whereas minor changes may occur within an established "kind," nothing fundamentally new can come through "random" or "undirected" evolution [Dembski2002]. On the other hand, the consensus of scientific researchers who have investigated this question is that evolution can and does generate truly novel features. Additional details on the issue of evolutionary novelty are given in Novelty.

Mathematical and computer models of evolution

Mathematical models have been constructed since the early 20th century to study the process of evolution. In one of the latest studies of this sort, University of Pennsylvania researchers Herbert Wilf (a mathematician) and Warren Ewens (a biologist) employed a sophisticated mathematical model to study the length of time required for sufficient numbers of mutations to take hold in an organisms. The authors note that critics of evolution often assume that a huge number of mutations must occur all at once, which, probabilistically speaking, is exceedingly unlikely. But, as the authors explain [Wilf2010]:
[A] more appropriate model is the following. After guessing each of the letters, we are told which (if any) of the guessed letters are correct, and then those letters are retained. The second round of guessing is applied only for the incorrect letters that remain after this first round, and so forth. This procedure mimics the "in parallel" evolutionary process. The question concerns the statistics of the number of rounds needed to guess all of the letters of the word successfully.

The authors conclude that when one takes account of natural selection in a reasonable way, there has been ample time for evolution as we observe it to have taken place [Wilf2010].

Along this line, computer programs mimicking the process of evolution have been utilized to construct computer algorithms and engineering designs that in many cases are superior to the best-known human efforts. Applications of this methodology, known as "genetic algorithms" or "evolutionary computation" have been found in aerospace, chemistry, electrical engineering, financial analysis, materials engineering, robotics, and others [Marczyk2004].

Other studies employ computer simulations to study the nature of biological evolution itself. One recent study utilized "digital organisms" -- i.e., computer programs that can mutate, compete, evolve and replicate. Numerous features of natural evolution were seen in these studies, including mutations that were temporarily deleterious, but which served as "stepping-stones" to the evolution of more complex features [Lenski2003; Isaak2007, pg. 64].

Evolutionary processes and English text

Some writers have drawn the analogy to English text. For example, David Foster, in a book skeptical of evolution, discusses and then refutes an argument he attributed to Thomas Huxley, namely that a few monkeys typing randomly for millions of millions of years would type all the books in the British Museum. Foster asserts that even a single line of 50 characters could not be produced in this way, since there are at least 8.5 x 1049 alphabetic strings of length 50 (based on an alphabet of 26 characters and some other assumptions), so that generating a specific given string of length 50 "at random" is unlikely even over the multi-billion-year history of the earth [Foster1991, pg. 57; Lennox2009, pg. 163-163].

In response to Foster, biologist Gert Kortof points out that Huxley could not possibly have told this story in 1860, because typewriters were not commercially available until 1874. Further-more, it was not known at the time that genetic information is contained in a string of symbols (DNA), so it is highly questionable that this argument would have been used at all in the 1800s [Wilf2010]. Furthermore, as both Gert Kortof and Peter Olofsson have noted, this type of argument suffers from failing to define precisely what should truly be counted as "surprising." To correctly assess the odds of such an occurrence, one should not calculate the probability of some single event (all of which may have the same probability), but instead the probability of all events in a class of similar events [Wilf2010; Olofsson2008].

In response to arguments of the type mentioned above, Oxford biologist Richard Dawkins has described a simple computer program he wrote to generate the Shakespearean sentence "Methinks it is like a weasel," starting from a randomly generated character string [Dawkins1986, pg. 43-50]. The program achieved this in 41 evolution-like iterations, where, at each iteration Dawkins' population of "sentences" were each scored based on how many letters were in agreement with his target phrase at the appropriate positions. Selective "breeding" slowly improved the score of the best sentence until there were no errors.

While this is an interesting exercise, it has significant flaws, some of which Dawkins himself acknowledged. To begin with, his experiment involved only a single "species," whereas in the biological kingdom the branching tree of evolution develops in many thousands of directions simultaneously. Secondly, Dawkins' process was defined by a single pre-specified target, whereas biological evolution is governed instead by a complicated "fitness landscape" involving hundreds of interacting factors such as climate, competing organisms in the same ecological niche, food supply, predators and diseases. Finally, Dawkins' experiment progressed to a fixed future goal, whereas real biological evolution does not operate with any future goal in mind -- each step must bestow some advantage. Nonetheless, Dawkins' demonstration is intriguing.

Can an evolutionary program generate Dickens-like text?

In another such study, conducted by the present author, a computer program modeling natural biological evolution was used to explore whether an evolutionary process could generate readable English text. The program started by generating a set of 1024 text strings, each 64 characters long, filled with random gibberish, such as the following:

o ao ,fludoy aocueu feidh,iaemehaiheyh daneny shpesaems y nhte
nrtnnbaa.nn hymeo t fiilunnw nt t,ntehg eu y' t h l dieosea ii
mbdsoee lueleciro ,ynaeenetg itln h srw l,pn uf svee,ee a'l sl
snd etke snoymnra lhs gdnu,nmrs e trlhueafpraa.c.ys f yjser g

The computer program then by successive evolution-like iterations "evolves" a set of English-like segments. In this experiment, the "fitness landscape" was the the text of the novel Great Expectations, written by Charles Dickens. In other words, at each step of the evolutionary process, each of the 1024 "organisms" (text strings) in the current "population" were rated in fitness by how closely they matched text patterns in Great Expectations. High-scoring text segments were permitted to "mate" with other high-scoring segments, and the resulting segments, after applying certain random mutations, constituted the population of "organisms" at the next stage of the process. Full details are presented in [Bailey2009].

To evaluate the success of this project, the author prepared a "quiz" consisting of a set of 20 text segments, ten of which were generated by the computer program described above, and ten of which were actual text segments taken from Great Expectations. The objective of the quiz was to identify which of the ten were genuine Dickens text and which were computer-generated. This quiz was then administered to a set of students at a major university in the western U.S., all of whom affirmed that they were at least moderately familiar with Dickens literature. The reader is invited to try his or her luck on this quiz:

  1. up at it for an instant. but he was down on the rank wet grass,
  2. or do any such job, i was favoured with the employment. in order,
  3. at the fire as she took up her work again, and said she would be
  4. the monster was even careless as to the word that i had him so.
  5. as to go with him to his father's house on a visit, that i might
  6. fitted it to nothing and get the ashes between me to the last.
  7. as no relation into another that it is the same room - a little
  8. a separation to be made for the desolater, like the man he was.
  9. we said that as you put it in your pocket very glad to get it, you
  10. that he had treated him to a little bee, he was to call the
  11. if he had for a time such an interest here and contented me.
  12. great iron coat-tails, as he had done, and then ran to that.
  13. he saw me going to ask him anything, he looked at me with his glass
  14. on my objecting to this retreat, he took us into another room with
  15. been born on there, or that i had the greatest indignature.
  16. the chimney as though it could not bear to go out into such a night
  17. later to settle to anything i had hesitated as to the sound.
  18. the greatest slight and injury that could be done to the many far
  19. of it on the hearth close to the fear that she had done rather
  20. out of my thoughts for a few moments together since the hiding had

Looking collectively at the 66 sets of responses that the author received for this quiz, indeed the "majority vote" among the 66 responses is correct for most of these 20 items, but it is wrong for items #8, 9, 11, 13, 20, and in two other cases (#1 and #15) the margin of the "vote" is slim. All of the computer-generated items had at least 18 incorrect responses out of 66. Items #8 and #9 proved especially troublesome to these students, with only 17 and 18 correct responses, respectively (#8 is computer-generated; #9 is from Dickens' Great Expectations).

It is interesting to note that the computer program generated many valid words not found in Great Expectations. Here are a few of the many examples:

administer, agitate, attraction, conspire, contentions, credited, deceived, discriminate, distances, enhance, formations, generation, inconvenient, intentionally, liberated, mission, possibilities, powered, releases, searches, spheres, termination, weathers

Full details of this study are given in [Bailey2009].


Many computational simulations of evolution have been performed. What's more, the overall methodology of utilizing an evolution-like scheme to search for an optimal point in a "landscape" of millions of possibilities has been applied to many problems in science and engineering, often with very successful results.

As one example of this approach, a detailed computational simulation has shown that English text segments reminiscent of Dickens literature can be generated. Some of the better resulting text segments are sufficiently good to fool human judges in an informal test -- college students were only correct in distinguishing true Dickens from computer-generated segments about 61% of the time (on average), only slightly better than the 50% that one would expect at random.

Thus the general realm of computer-based simulation has provided additional evidence that evolution, as is currently understood in biology, is a truly creative process.


[See Bibliography].