Is the Voynich Manuscript structured like written language?

This week has seen a bumper crop of news stories about a new piece of research in PLOS ONE by Marcelo Montemurro and Damian Zanette, who are both physicists who specialize in complex systems.     The paper in question is not about physics, however, but argues that the mysterious Voynich Manuscript has properties that suggest that it has language-like structure, based on an information-theoretic analysis of the structure of its words.    If correct, while this is certainly not a ‘decipherment’, this result would be counter-evidence to certain versions of the theory that the VM is a medieval hoax that is undecipherable because it is pseudo-writing, meant to have the appearance of language but having no decipherable content in any natural language.

Now, I am not a specialist in information theory, and I’m not truly a specialist on the Voynich Manuscript (although I have played one on TV), but I am a linguist and I do research on writing systems and allied representation systems like written numerals.      And several things bother me about this paper.  The first is that, as Gordon Rugg (the most significant modern proponent of a ‘hoax’ theory) has pointed out in a comment on the new paper, no one is seriously claiming that the VM is pure ‘noise’ – it clearly is structured, and simply because the VM has some structure, even one that resembles language in some ways, does not entail that it is likely to have a genuine linguistic structure, much less a decipherable one.  Rugg’s own (plausible) theory involves the use of a medieval ciphering system to rapidly produce language-like but meaningless text as part of a hoax, and Montemurro and Zanette have not evaluated this theory at all, as far as I can see, other than to dismiss it.

Furthermore, the only systems to which the VM is compared are two written languages in alphabetic scripts (English and Latin), one written language with a non-alphabetic script (Chinese), one computer language (Fortran), and one natural sequence (yeast DNA).  But there are a wide variety of nonlinguistic, quasilinguistic, and paralinguistic phenomena aside from these, and they haven’t compared the VM to any of them.   Montemurro and Zanette show conclusively that the VM has much more ‘information’ (structure) than the yeast DNA, which we would anticipate, but does not do a good job of accounting for the different types of encoded information, and structured non-information, which might be comparable to the VM.  What is the information structure of known codes and ciphers (both broken ones and undeciphered ones)?   What is the information structure of semasiographic systems like the glyphic system at Teotihuacan?   What is the information structure of the linguistic productions of psychiatric patients who suffer from graphomania?  What is the information structure of pseudo-writing like the Codex Seraphinianus which we know (since it’s a modern piece of conceptual art) carries no message?      None of these comparisons would be conclusive but all of them would be informative.   Right now the range of systems to which Montemurro and Zanette have compared the Voynich is simply too limited to be useful.

Montemurro and Zanette are also seemingly unaware of parallel efforts to use the information structure of undeciphered scripts to evaluate their language-like nature.  Two of the most significant such efforts are the effort to show that Iron Age Pictish graphic symbols from Scotland constituted a phonetic script (Lee, Jonathan and Ziman 2010) and efforts to show that the Indus script of Harappan-period India and Pakistan either does (Rao et al 2009) or does not (Farmer, Sproat and Witzel 2004) resemble linguistically-based writing systems.  These theories have attracted a reasonable degree of attention from linguists, and Richard Sproat, in particular, has done a lot of work trying to address the non-linguists’ methodological and conceptual approaches, some of which has been covered in extraordinary detail at the Language Log.    There’s a much longer discussion to be had there, but suffice it to say that most linguists are skeptical of studies undertaken without any linguistic expertise and assistance.   Again, without taking a position on any of these controversies, it strikes me as irresponsible literature-searching that the Montemurro and Zanette study is so fundamentally unaware of similar efforts in major publications such as Science and the Proceedings of the Royal Society.    If you’re going to use physics to study written language, even if you’re going to ignore every single linguist who’s written on these subjects, maybe you should at least be aware of high-impact articles written in the last ten years by physicists using very similar methods to your own.

For the record, I think that any information-based effort that does not involve linguists at a serious level is likely to make invalid assumptions and thus be highly prone to producing nice-looking gibberish.  For example, the Montemurro/Zanette theory seems to grant that the VM probably does not encode information alphabetically like English, and then suggests instead that it recalls “scripts where -as in the cases of Chinese and hierographical Ancient Egyptian- the graphical form of words directly derives from their meaning.” (Montemurro and Zanette 2013: 4).   Let’s assume we are prepared to set aside their use of the term hierographical, which is a bizarre nineteenth-century anachronism that was vaguely popular for a time prior to the decipherment of the Rosetta Stone, but which has never, in any European language, been a preferred term.   More significantly, it is a gross, entirely improper characterization of Chinese and Egyptian to argue that the “form of words directly derives from their meaning”.     Both scripts have massive phonographic components with some representation of morphemes, words, and semantic categories with signs, as every expert on writing systems has known for thirty years or more – certainly the work of John deFrancis shows this eminently clearly.    Even lumping the Egyptian hieroglyphic and Chinese scripts together in a single category ignores the massive differences between them.   So in essence, Montemurro and Zanette seem to be suggesting that the VM has properties similar to no writing system ever known to have been used on earth, because they do not seem to know what sorts of writing systems they are comparing things to.

In short, I’m afraid what we have here is another case of non-specialists applying the methods of one field inappropriately to some actually complex linguistics problems to evaluate a text whose decipherers (a group riddled with charlatans and cranks) have offered us everything except an actual decipherment.



  1. Pingback: The Montemurro and Zanette Voynich paper: summary and update | hyde and rugg

  2. Pingback: The Montemurro and Zanette Voynich paper: summary and update | searchvisualizer

  3. Pingback: Voynich-Manuskript: Wo die Forschung ansetzen muss (Teil 3) – Klausis Krypto Kolumne

  4. Pingback: Link love: language (55) | Sentence first

  5. Voynich manuscript is written and encrypted in the Czech language. Instructions for decryption is written on the multiple sides of the manuscript. Instructions are written in the Czech language.

    The manuscript describes the Czech history.

    ( Josej Zlatoděj Prof. Czech republic)

  6. Mnemonic text I think. The illustrations seem to describe female menstruation and medicin and recipees around female health. (From Salerno Trotula?) Bathing was considered very important for women. The astrological section may have to do with finding the proper date (21 april) for Easter, which was a problem around that time between Gregorian and Julian calendars. 9 could be a notae for cum, cun, com, con, and 8 seems to be an s.

    It was normal in those days to use a system to memorize data of numbers and calculations by symbols and words, which could be invented and applied randomly.

    This could be a simple scolar book as used in Holland around the time people still actually memorized all contents, but making it very complex for us to reverse it to it’s original meaning, which could be complex calculations. Also fingercounting and counting from numeral discs were often used.

    I hope this short info after some googling could add to the search direction and/or meaning of the MS.

    I may post more in the future after comparing some recipees from those days in Dutch to the ones in the MS.

  7. Pingback: Hoaxing the Voynich Manuscript, part 8: The illustrations and script | hyde and rugg

  8. My suggestion to decode the Voynich Manuscript is in the fact that each of its individual pages encodes some other information . Encryption is not just a written form . There’s a whole spectrum of gnosis , which, because of the limited capabilities (eg letter runicze – oldest inscriptions are from the second and third century AD, before the Egyptian hieratic writing , etc.) were also encoded in a different form – for example, by means of signs and symbols : see semiotics – from the Greek : ” semasticos ” – significant , ” semasia ” – meaning “,” semeion ” – a sign of ” sema ” – a sign , the image signal . And in such a manner is encoded Voynich manuscript – it is not my task , classic cipher written , only symbolic rebus – ideogram . Below to better illustrate the time- historical continuum in brief , a summary of the earlier descriptions of each manuscript illustration . ( From 1R to 19R )

    1R – Big Bang and Kolaps – cyclical nature of the universe.

    1V – Approximately 4.5 – 5 billion years ago – the formation of the Earth’s crust.

    2R – About 3.5 billion years ago – the first organisms .

    2V – About a billion years ago – the first single-celled organisms ( eukaryotes ) .

    3R – Approximately 900 – 700 million years ago – the first multi-cellular organisms .

    3V – approximately 700 – 600 million years ago – the first invertebrates .

    4R – 500 million years ago – the first vertebrates .

    4V – 400 million years ago – vertebrates came out of the water.

    5R – 220 million years ago – the beginning of the reign of the dinosaurs.

    5V – 65 million years ago – extinction of the dinosaurs , evolution of mammals .

    6R – About 65 – 30 million years ago – carnivores .

    6V – About 30 – 7 million years ago – the formation of plants and animals.

    7R – About 12 million years ago – the first hominids .

    7V – About 7 – 5 million years ago – the appearance of man .

    8R – About 100 thousand . years ago – the emergence of modern man .

    8V – Approximately 15-12 thousand . years ago – man hiking – “bridge” Bering .

    9R – Approximately 11.5 thousand . years ago – the end of the last ice age.

    9V – About 10 thousand . years ago – hunter -gatherers , the birth of agriculture.

    10R – Around 4000 , the BC – Development of urban community Mesopotamia.

    10V – Around 3000 , the BC – The beginnings of civilization of ancient Egypt.

    11R – The turn of the second and first millennium BC – Judaism , Jerusalem.

    11V – turn of the century – Christianity . Rome .

    12R – None. According to me – Ancient Greece .

    12 V – None. According to me – the Empire of Alexander the Great .

    13R – The Roman Empire .

    13V – Persian Empire .

    14R – Huns . Mongol Empire .

    14V – Byzantine Empire .

    15R – The State of the Franks.

    15V – The spread of Islam.

    16R – Vikings .

    16V – Slavs .

    17R – The Crusades .

    17V – The Hundred Years War .

    18R – Ottoman Empire .

    18V – War of the Roses .

    19R – The Order of the Teutonic Knights .

  9. Pingback: Wenn Physiker Voynich-Forschung betreiben | TEXperimenTales

  10. Pingback: Voynich-Manuskript: Wo die Forschung ansetzen muss (Teil 3) | Gegen den Strom

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s