This week has seen a bumper crop of news stories about a new piece of research in PLOS ONE by Marcelo Montemurro and Damian Zanette, who are both physicists who specialize in complex systems. The paper in question is not about physics, however, but argues that the mysterious Voynich Manuscript has properties that suggest that it has language-like structure, based on an information-theoretic analysis of the structure of its words. If correct, while this is certainly not a ‘decipherment’, this result would be counter-evidence to certain versions of the theory that the VM is a medieval hoax that is undecipherable because it is pseudo-writing, meant to have the appearance of language but having no decipherable content in any natural language.
Now, I am not a specialist in information theory, and I’m not truly a specialist on the Voynich Manuscript (although I have played one on TV), but I am a linguist and I do research on writing systems and allied representation systems like written numerals. And several things bother me about this paper. The first is that, as Gordon Rugg (the most significant modern proponent of a ‘hoax’ theory) has pointed out in a comment on the new paper, no one is seriously claiming that the VM is pure ‘noise’ – it clearly is structured, and simply because the VM has some structure, even one that resembles language in some ways, does not entail that it is likely to have a genuine linguistic structure, much less a decipherable one. Rugg’s own (plausible) theory involves the use of a medieval ciphering system to rapidly produce language-like but meaningless text as part of a hoax, and Montemurro and Zanette have not evaluated this theory at all, as far as I can see, other than to dismiss it.
Furthermore, the only systems to which the VM is compared are two written languages in alphabetic scripts (English and Latin), one written language with a non-alphabetic script (Chinese), one computer language (Fortran), and one natural sequence (yeast DNA). But there are a wide variety of nonlinguistic, quasilinguistic, and paralinguistic phenomena aside from these, and they haven’t compared the VM to any of them. Montemurro and Zanette show conclusively that the VM has much more ‘information’ (structure) than the yeast DNA, which we would anticipate, but does not do a good job of accounting for the different types of encoded information, and structured non-information, which might be comparable to the VM. What is the information structure of known codes and ciphers (both broken ones and undeciphered ones)? What is the information structure of semasiographic systems like the glyphic system at Teotihuacan? What is the information structure of the linguistic productions of psychiatric patients who suffer from graphomania? What is the information structure of pseudo-writing like the Codex Seraphinianus which we know (since it’s a modern piece of conceptual art) carries no message? None of these comparisons would be conclusive but all of them would be informative. Right now the range of systems to which Montemurro and Zanette have compared the Voynich is simply too limited to be useful.
Montemurro and Zanette are also seemingly unaware of parallel efforts to use the information structure of undeciphered scripts to evaluate their language-like nature. Two of the most significant such efforts are the effort to show that Iron Age Pictish graphic symbols from Scotland constituted a phonetic script (Lee, Jonathan and Ziman 2010) and efforts to show that the Indus script of Harappan-period India and Pakistan either does (Rao et al 2009) or does not (Farmer, Sproat and Witzel 2004) resemble linguistically-based writing systems. These theories have attracted a reasonable degree of attention from linguists, and Richard Sproat, in particular, has done a lot of work trying to address the non-linguists’ methodological and conceptual approaches, some of which has been covered in extraordinary detail at the Language Log. There’s a much longer discussion to be had there, but suffice it to say that most linguists are skeptical of studies undertaken without any linguistic expertise and assistance. Again, without taking a position on any of these controversies, it strikes me as irresponsible literature-searching that the Montemurro and Zanette study is so fundamentally unaware of similar efforts in major publications such as Science and the Proceedings of the Royal Society. If you’re going to use physics to study written language, even if you’re going to ignore every single linguist who’s written on these subjects, maybe you should at least be aware of high-impact articles written in the last ten years by physicists using very similar methods to your own.
For the record, I think that any information-based effort that does not involve linguists at a serious level is likely to make invalid assumptions and thus be highly prone to producing nice-looking gibberish. For example, the Montemurro/Zanette theory seems to grant that the VM probably does not encode information alphabetically like English, and then suggests instead that it recalls “scripts where -as in the cases of Chinese and hierographical Ancient Egyptian- the graphical form of words directly derives from their meaning.” (Montemurro and Zanette 2013: 4). Let’s assume we are prepared to set aside their use of the term hierographical, which is a bizarre nineteenth-century anachronism that was vaguely popular for a time prior to the decipherment of the Rosetta Stone, but which has never, in any European language, been a preferred term. More significantly, it is a gross, entirely improper characterization of Chinese and Egyptian to argue that the “form of words directly derives from their meaning”. Both scripts have massive phonographic components with some representation of morphemes, words, and semantic categories with signs, as every expert on writing systems has known for thirty years or more – certainly the work of John deFrancis shows this eminently clearly. Even lumping the Egyptian hieroglyphic and Chinese scripts together in a single category ignores the massive differences between them. So in essence, Montemurro and Zanette seem to be suggesting that the VM has properties similar to no writing system ever known to have been used on earth, because they do not seem to know what sorts of writing systems they are comparing things to.
In short, I’m afraid what we have here is another case of non-specialists applying the methods of one field inappropriately to some actually complex linguistics problems to evaluate a text whose decipherers (a group riddled with charlatans and cranks) have offered us everything except an actual decipherment.