For those of you who may not know, I run a sister site to this blog, The Phrontistery, which in one form or another has been around since 1996, and which features an online dictionary of rare words, glossaries on various topics, and other language-related resources. While the site has been more or less dormant for a few years – mostly I’ve just been keeping the place tidy without adding any new content, I’ve had a slow(ish) summer and so took the opportunity to get things up and running smoothly there again, with a bunch of new content and a new site layout. Over the years I’ve given a lot of thought to somehow combining the two sites, e.g., by moving Glossographia over there or something, but I’ve never had the energy to figure out how difficult that would be. Let me know if you think that would be a terrible (or great) idea, in which case I don’t have to think about it any more.
Archive for the ‘Linguistics’ Category
Posted by schrisomalis on August 14, 2013
Posted by schrisomalis on August 2, 2013
Stephen Figurative Chrisomalis. Has a nice ring to it, don’t you think?
In seriousness, an offhand remark I made to my wife this morning, that “Compliance is my middle name,” led me on a very interesting search for the origins of the figurative use of middle name to refer to a paramount or notable characteristic of a person. The entry for middle name in the OED has been relatively recently updated, and includes numerous instances of this figurative use going back to 1905, where the New York Journal has “For retiring you’re—well, that’s your middle name.” and other quotations going up to the present. I did a little further searching around and was able to find an earlier one going back to 1902, in the Manitoba Free Press, quoting a correspondent from Dawson, Yukon Territory (and you’ve got to know that I love it when I antedate something and it turns out to be Canadian):
This isn’t quite enough to associate it firmly with the Klondike Gold Rush, but it’s a possibility. I was able to find some others from the early 20th century (mostly American, but no others from Canada) and then onward from there. But I also was intrigued when I looked at the Google Ngram for ‘is my middle name’:
That spike peaking right around 1920 is really interesting; equally interesting is that thereafter, it drops back down to relatively modest levels until the 1960s, and then takes off again, reaching its historical peak in the late 1980s and keeping right on going. Now, it’s clear that the initial rise starts well before World War I, so this isn’t something directly associated with soldiers’ slang or the general mixing of dialects during and after the war, but looking at the Google Books results around 1920, this really seems to have been a fad at the time – most of the uses of “is [pronoun] middle name” are non-literal. But by the time that Agatha Christie wrote, in The Murder of Roger Ackroyd (1926: 144), “‘Modesty is certainly not his middle name.’ ‘I wish you wouldn’t be so horribly American, James.’”, the fad was already on the wane – although it never disappeared entirely, for the next several decades it was quite rare.
I was a little surprised to see that there was no immediate bump related to the country anthem Sixteen Tons, first recorded in 1946, and whose rendition by Tennessee Ernie Ford in 1955 reached #1 on the Billboard charts for six weeks, but its line, “Fightin’ and trouble are my middle name” is in a middle verse and perhaps had little linguistic effect (although it was re-recorded many times throughout the 60s and beyond). The post-1965 bump could equally have been inspired by Bobby Vinton’s 1963 single, “Trouble is My Middle Name“, although it peaked only at #33, and, if I may say so, is not really very good. Regardless of the specific impetus, once it took off, it became strongly idiomatic, and today the phrase has become so well-known that it is covered in TV Tropes and elsewhere. I’m confident that my readers will regale me with their favourite examples.
Posted by schrisomalis on July 21, 2013
As you may know, I work at Wayne State University in Detroit, Michigan. Detroit’s been in the news a lot lately, regarding its bankruptcy and a whole lot of other things that, if I were to start talking about them at any length, would just send me off into a rage. This would not be pretty.
So instead, let’s talk about the language of Detroit. Detroit has two main English language varieties: first, the local variety of African American Vernacular English (AAVE), which has been studied by John Rickford, Geneva Smitherman, Roger Shuy, and others; and second, the variety of English that has undergone the Northern Cities Vowel Shift. Here, Penny Eckert’s work is of the greatest significance, building on the work of Bill Labov, but with a specific focus on southeastern Michigan. Most black Detroiters speak the first variety, and most white, locally-born Detroiters speak the second, with some exceptions. Today I’m going to focus just on the second variety, but I’m not going to talk about the dialect as a whole. Instead, I want to talk about variation, and some innovations I’ve noticed, in the speech of Detroit-area residents in the pronunciation of a single word – the place-name Detroit itself.
Now, if you’re like me, and like most North Americans, you probably pronounce Detroit something like /dɪˈtɹɔɪt/, or (for those unfamiliar with IPA, di-TROIT, two syllables, with second syllable stress and rhyming closely with adroit (sound sample). You can hear a clear example of this pronunciation, for instance, in this commercial for the Detroit Zoo. Other examples could be found pretty readily, so I won’t belabour the point. This is the standard pronunciation of Detroit and the baseline for today’s discussion.
There is a second variant heard locally, which has first syllable primary stress rather than second-syllable stress, so, in other words, and where the unstressed first vowel becomes /i/, so, in other words, /ˈdiˌtɹɔɪt/ (DEE-troit). The second syllable then has secondary stress (it can’t be entirely unstressed or the vowel would have to reduce). You hear this pronunciation sporadically and without any particular association with any class or ethnic group, but it’s less common, and we’ll leave it alone.
Lately, however, I’ve been hearing a third pronunciation in a lot of commercials, local news, and the like, in which white Detroit-area residents pronounce their home city as /dɪˈtɹʌɪt/ or even /dɪˈtɹəɪt/, with the first part of the OI vowel unrounded and fronted almost to a schwa. For the non-phonologically-inclined, what seems to be happening is that di-TROIT starts to sound more like di-TRITE. Here’s a good example from a Youtube video from Detroit Real Estate Investing. If you’re not convinced, try loading up both this video and the one from the Detroit Zoo, and running them one after the other to compare a couple of times. Still not convinced? Try this video for America’s Best Value Inn, for another example. Still not convinced? Try this clip from a video made by a local man, with two very clear examples right in a row.
As far as I’ve heard, this variant is only used by people from the Detroit area who have the Northern Cities shift – i.e., it’s not used by people from Milwaukee or Cleveland or Buffalo. It’s not, as far as I can tell, part of the standard analysis of the Northern Cities shift or the specific changes found in the Detroit area, after a lot of time poking around the sound files on Penny Eckert’s website. I haven’t noticed this with any other words that contain the diphthong oi. I wondered whether it might be typical of other words ending in oit or in oi followed by other voiceless stops (e.g. p, t, k). But the problem is that very few words in English end in oit (really just exploit, which is moderately uncommon, and quoit and adroit, both of which are very rare) or oi followed by any voiceless stop consonant (we could add voip and hoick but that’s about it). So we don’t have a lot of other words to compare it to, without going and doing some sociolinguistic research. (This is a hint to any of my future students who may be reading this post).
I can’t find any publication discussing this phenomenon – I’m not a phonologist and would love to hear from someone who could link this up to the Northern Cities shift more broadly. I don’t have any explanation for it, but it’s widespread enough that it deserves some attention.
But wait – we’re not done yet! The reason I wondered about the role of voiceless stops is that while I work in Detroit, I live across the international border in Windsor, Ontario, which is essentially Detroit’s Canadian suburb, and I am a native speaker of Canadian English. Most speakers of Canadian English, including myself, have what’s called the Canadian raising, in which the /aɪ/ diphthong of right and ripe, and the /aʊ/ diphthong of about and house, is raised to /ʌɪ/ or /ʌʊ/ before voiceless consonants and especially voiceless stops – which is why some Americans think that Canadians say aboot or aboat. We don’t, but it might sound that way. And because I, like most Windsorites, have a pretty strong Canadian raising, I pronounce right as /ɹəɪt/ or /ɹʌɪt/, starting with a mid-vowel. Notice that this diphthong is exactly the same as the diphthong in the innovative pronunciation of Detroit. In other words, the ‘oi’ of Detroit for some Detroiters is the same vowel sound as Canadians have in right or trite. They start in completely different diphthongs – /aɪ/ and /ɔɪ/ – but end in the same place.
To make things even more complicated, there is a fourth variant of Detroit used only, as far as I can tell, by older speakers of Canadian English. This is a three-syllable version, /dɪˈtɹɔɪ.ɪt/, or di-TROY-it, to rhyme (non-ironically, I promise!) with destroy it. Most of the users of this variant are Ontario-born native speakers of Canadian English born in the 1950s or earlier. It can be heard, most famously, in the song, The Wreck of the Edmund Fitzgerald by folk singer Gordon Lightfoot, as heard here. It is also typical of the Canadian hockey commentator / blowhard / redneck Don Cherry. I have certainly heard it in Windsor although I suspect it is more common in central and eastern Ontario than here in southwestern Ontario. To this day, and despite all evidence to the contrary over the five years I’ve worked here, my mother (who is of a similar generation, born and raised east of Toronto) refuses to quite believe me that the two-syllable pronunciation is even acceptable or possible. I don’t have any explanation for the emergence of this variant either, but it’s obviously been around for many decades. I’d also love to know whether it’s found widely in any younger Canadian English speakers.
So, in summary, there are four distinct variants of the pronunciation of Detroit, all of which you might hear in the broader Detroit area on any given day:
- /dɪˈtɹɔɪt/ (di-TROIT), used by locals and most other English speakers
- /ˈdiˌtɹɔɪt/ (DEE-troit), used by locals sporadically
- /dɪˈtɹʌɪt/ or /dɪˈtɹəɪt/ (di-TRITE), used by locals who have a strong NCVS
- /dɪˈtɹɔɪ.ɪt/ (di-TROY-it), used by (some) older Canadians, including some in Windsor.
Posted by schrisomalis on July 14, 2013
Google Ngram Viewer is a great tool, especially for rough-and-ready searching and visualization of linguistic trends, and as a teaching tool to introduce students to lots of interesting questions we can ask about language variation and patterning. I use it all the time. The default search parameters are for 1800 – 2000, and the Culturomics project notes that, “the best data is the data for English between 1800 and 2000. Before 1800, there aren’t enough books to reliably quantify many of the queries that first come to mind; after 2000, the corpus composition undergoes subtle changes around the time of the inception of the Google Books project.” Elsewhere, the Culturomics FAQ notes that, “Before 2000, most of the books in Google Books come from library holdings. But when the Google Books project started back in 2004, Google started receiving lots of books from publishers. This dramatically affects the composition of the corpus in recent years and is why our paper doesn’t use any data from after 2000.”
OK, so we’ve been warned that the data from before 2000 is very different than the data from after 2000, and especially that 2004 marked a significant change in the corpus. Caveat lector, or whatever you will. But I want to know: In what ways have these ‘subtle changes’ changed the Google N-gram corpus, and therefore, what biases in word frequencies do scholars of language need to account for?
Lately, I’ve had some interest in post-2000 changes in word frequencies for my Lexiculture class project for the fall, and so I’ve been looking at N-gram data going up to 2008 (the last date you can search). I have found some very weird declines in words that probably aren’t actually declining in relative frequency:
It seems notable that all of these words start to decline shortly after 2000, with a particularly steep decline right around 2004-05. All of these words, I would argue, should be stable or increasing in frequency: these are words associated with modern technology and social life. Conversely, many timeless words (e.g., table, lamp, daughter) are flat or rising after 2000. It’s possible that intuitions about what should be happening to words can be wrong. But why are they all wrong in the same direction, and why do they all decline all at the same time?
- One possibility is that the data from 2000 onward aren’t complete yet. There could be some books published over the past few years that haven’t been integrated into Google Books and thus don’t end up in the Ngram viewer. But in any case, n-grams measure a word’s frequency relative to all words published in that year, so the fact that the collection isn’t complete should not affect relative word frequencies at all.
- It’s possible that Google Books has systematically missed archiving books oriented towards technology, but why would that be the case? In fact, if tech-savvy publishers are more likely to submit their works to Google Books (which I think is plausible) than your average publisher, the effect should be to increase these words’ frequency.
- It’s possible that, in the absence of the controlled digitization of books from libraries that characterized the early period of Google Books digitization, and the work done to manage metadata in creating the N-Gram Viewer’s early dataset, massive error has crept into the database. But again, why would this affect particularly modern words negatively, while not affecting words whose frequencies has not been changed?
I think I have a better answer. I think that the N-gram Viewer may be skewed, not because anything significant is being missed, but because something significant is being added. There is a growing tendency for cheap electronic reprints of public domain books to come out and be immediately included in Google Books, with the publication date listed as the date of its electronic reprinting. If Levi Leonard Conant’s book The Number Concept (1896) is scanned and reprinted by Echo Books in 2007, the Google Books metadata doesn’t recognize it as an 1896 book at all. It’s digitized and scanned twice, once (correctly) as an 1896 book and again as a 2007 book. In fact, because it’s in the public domain, I could make my own e-book version for sale as a 2013 book and have it listed again. And while that’s not likely to have a huge effect, imagine every reprint of A Tale of Two Cities or Wuthering Heights that has flooded the market since the invention of e-books, stimulated by and reinforced by projects like Google Books.
Now, I suppose there is a case to be made that the 2007 reprint of Conant is, in some way, a 2007 book. After all, reprints have never been excluded from Google Books and there are plenty of pre-electronic 20th century reprints of Wuthering Heights in the corpus. But each of those earlier reprints represents a costly decision by a publisher that a particular book is important enough and will be read widely enough to warrant its republication. From a ‘culturomics’ perspective, there’s a case to be made that these reprints really constitutes a cultural ‘signal’ in the year of its reprinting, and from a linguistic perspective, we presume that lots of readers will read the words, no matter if they are obsolescent at the time. But as the cost of producing reprints as e-books (or print-on-demand) declines, the ‘culturomic’ value of these books also declines, because publishers no longer need to be concerned about whether many (or even any) people buy these books. The author is long dead, so there are no royalties, and there are no or minimal up-front publishing costs. So Google Books is now being flooded with material that may be largely unread and does not reflect the linguistic or cultural values of the time. Its primary effect, for the N-gram viewer, is to skew relative word frequencies in a way that makes 2013 resemble 1913 more than it actually does. That’s a conservative bias, for those following along at home.
We can then derive a couple of corollaries to check if this theory is correct:
- There are likely to be some words that, while still increasing in frequency, do not increase in frequency quite as much as their actual use should indicate. These are words that have shot up out of nowhere over the past few years, and are continuing to accelerate, but their N-gram shows a tapering off. We see a great example of this in a word like transgender, where we see, right around 2004, a clear decline in the acceleration of its frequency, counter to expectations.
- If some word frequencies are artificially depressed, some other word frequencies must be artificially inflated. But which ones? There are likely to be other words that were very common in the 19th and early 20th centuries (the period where most of these reprints are going to come from), but have been on the decline for a long time and are now quite rare, that show an apparent ‘rejuvenation’ after 2004. Again, we find such a word: negro (uncapitalized), which is virtually non-existent in contemporary written English but was at its peak in the period from 1880-1920, and which shows a clear ‘bump’ after 2004 which can’t possibly be real. You can even see this to a lesser degree with a word like honesty, which (for reasons perhaps best left unanalyzed) had been in decline throughout the 20th century but experiences a bump, again, right around 2004.
In summary, because the Google Books corpus today is derived largely from publisher submissions, and because there is a major signal coming from reprints of public domain books published before 1922, n-grams from 2004 onward (and, to a lesser degree, from 2000-2004) are skewed to make modern words appear more infrequent than they actually are, and obsolescent words more common than they are. The moral is not that Google is evil or conservative or that culturomics is stupid or that the N-gram Viewer is fatally flawed. I do think, nonetheless, that we ought to be aware that the specific kinds of unintentional skewing that are being produced are ones that tend, in a conservative direction, to replicate the linguistic and cultural values of a century ago. This problem is not going away, absent a systematic effort to eliminate reprints from some future N-gram dataset, and it may even be getting worse as electronic reprints become more and more common. Stick to the pre-2000 data, though, just like they advise, and you’ll be in good shape.
Thanks to Julia Pope for her consultation and assistance on aspects of Google Books metadata and cataloging practices.
Posted by schrisomalis on July 1, 2013
Since I am, both by vocation and avocation, a word guy, it’s pretty rare for me to learn new English words. Since I am, in particular, a number words guy, it is especially rare for me to learn new English numerical words (my personal all-time favourites are tolfraedic and zenzizenzizenzic, for the record). So imagine my surprise upon reading the latest post from the fantastic Shady Characters blog on punctuation to encounter the word bithorpe, and then after some searching, its cousin quadrathorpe, both of which were new to me.
You won’t find either of these in any dictionary, but you will find them in dark corners of the Internet. You will find octothorpe (also spelled octalthorpe and octothorp, however – a word that emerged from the folks at Bell Labs in the late 60s / early 70s to refer to the sign #, known to most as pound or number sign or hash(tag). No one is really clear on its etymology, as there are a number of unconvincing competing theories, but it’s reasonably clear that the ‘octo’ is supposed to represent the eight points on the ends of the four lines. And thus, by jocular extension, a quadrathorpe is an equals sign (half an octothorpe) and a bithorpe is a hyphen, with four and two endpoints respectively.
Hoping to procrastinate from other, more important things, I spent some time this afternoon poking around on the origin of these strange terms, and the earliest I could find is this Usenet post from the group misc.misc from April 1989 (i.e., several years before most of us even had email and two years before Al Gore created the internet). Since this list was composed from the results of a survey, someone obviously coined them (in jest) before that time, but probably not much before. This list appears to have spawned many copies (some exact, others less so), almost all of which reproduce the rhetorical (possibly unanswerable) parenthetical question, “So what’s a monothorpe?”
Posted by schrisomalis on June 29, 2013
Well, it only took about 20 minutes for Dan Milton to solve the mystery of the Egyptian stamp: it has four distinct numerical notation systems on it: Western (Hindu-Arabic) numerals, Arabic numerals, Roman numerals, and most prominently but obscurely, the ‘Eye of Horus’ which served, in some instances, as fractional values in the Egyptian hieroglyphs:
At the time I posted it this morning, it was the only postage stamp I knew of to contain four numerical notation systems. (As Frédéric Grosshans quickly noted, however, a few of the stamps of the Indian state of Hyderabad from the late 19th century contain Western, Arabic, Devanagari, and Telugu numerals, and also meet that criterion, although all four of those systems are closely related to one another, whereas the Roman numerals and the Egyptian fractional numerals are not closely related to the Western or Arabic systems. So that’s kind of neat. I have a little collection of stamps with weird numerical systems (like Ethiopic or Brahmi), multiple numeral systems (like the above), unorthodox Roman numerals (Pot 1999), etc., and am looking to expand it, since it is a fairly delimited set and, as a pretty odd basis for a collection, isn’t going to break the bank. In case I have any fans who are looking for a cheap present for me. Just sayin’ …
We in the West tend to take for granted, today, that really there is only one numerical system worthy of attention, the Western or Hindu-Arabic system, which is normatively universal and standardized throughout the world. We also tend to feel the same way about, for instance, the Gregorian calendar. That’s a little sad but not that surprising. But we also take it for granted that, in general, throughout history, each speech community has only one set of number words, one script, and one associated numerical notation system. Of course, a moment’s reflection shows us this isn’t true: virtually any academic book still has its prefatory material paginated in Roman numerals, not to mention that we use Roman numerals for enumerating things we consider important or prestigious, like kings, popes, Super Bowls, and ophthalmological congresses. And this is not to mention other systems like binary, hexadecimal, or the fascinating colour-based system for indicating the resistance value of resistors. I’ve complained elsewhere that we put too much emphasis on comparing one system’s structure negatively against another, but to turn it around, we should ask what positive social, cognitive, or technical values are served by having multiple systems available for use.
We need to be more aware that the simultaneous use of multiple scripts, and multiple numeral systems, simultaneously in a given society is not particularly anomalous. In Numerical Notation (Chrisomalis 2010), I structured the book system by system, rather than society by society, which helps outline the structure and history of each individual representational tradition, and to organize them into phylogenies or families. But one of the potential pitfalls of this approach is that it de-emphasizes the coexistence of systems and their use by the same individuals at the same time by under-stressing how these are actually used, and how often they overlap. Just as sociolinguists have increasingly recognized the value of register choice within speech communities, we ought to think about script choice (Sebba 2009) in the same way. With numerals, we also have the choice to not use number symbols at all but instead to write them out lexically, which then raises further questions (is it two thousand thirteen or twenty thirteen?) – many languages have parallel numeral systems (Ahlers 2012; Bender and Beller 2007). We need to get over the idea that it is natural or good or even typical for a society to have a single language with a single script and a single numerical system, because in fact that’s the exception rather than the norm.
The stamp above is a quadrilingual text (French, Arabic, Latin, Egyptian) in three scripts (Roman, Arabic, hieroglyphic) and four numerical notations (Western, Arabic, Roman, Egyptian). We should think about the difficulty of composing and designing such a linguistically complex text – it really is impressive in its own right. We should also reflect on the social context in which the language of a colonizer (French), the language of the populace (Arabic), and two consciously archaic languages (Latin and Egyptian), and their corresponding notations, evoke a complex history in a single text. Once we start to become aware of the frequency of multiple languages, scripts, and numeral systems within a single social context, we have taken an important step towards analyzing social and linguistic variation in these traditions.
Ahlers, Jocelyn C. 2012. “Two Eights Make Sixteen Beads: Historical and Contemporary Ethnography in Language Revitalization.” International Journal of American Linguistics no. 78 (4):533-555.
Bender, Andrea, and Sieghard Beller. 2007. “Counting in Tongan: The traditional number systems and their cognitive implications.” Journal of Cognition and Culture no. 7 (3-4):3-4.
Chrisomalis, Stephen. 2010. Numerical Notation: A Comparative History. New York: Cambridge University Press.
Pot, Hessel. 1999. “Roman numerals.” The Mathematical Intelligencer no. 21 (3):80.
Sebba, Mark. 2009. “Sociolinguistic approaches to writing systems research.” Writing Systems Research no. 1 (1):35-49.
Posted by schrisomalis on June 28, 2013
To the best of my knowledge, this Egyptian postage stamp, along with the two other denominations in the same 1937 series (5 mills and 15 mills), are unique in a very specific way. My puzzle to you is: what makes these stamps so special?
Place your guess by commenting below (one guess per person). If you are the respondent with the correct answer, your ‘prize’ is that you may ask me any question relating to the themes of this blog and I will write a separate post on that subject. Happy hunting!
Edit: Well, that didn’t take long. In just over 20 minutes, Dan Milton successfully determined the answer. In case you still want to figure it out on your own, I won’t post the answer here in the main post, but you can find it in the comments if you’re stumped. I will follow up with some analysis later.
Posted by schrisomalis on June 24, 2013
This week has seen a bumper crop of news stories about a new piece of research in PLOS ONE by Marcelo Montemurro and Damian Zanette, who are both physicists who specialize in complex systems. The paper in question is not about physics, however, but argues that the mysterious Voynich Manuscript has properties that suggest that it has language-like structure, based on an information-theoretic analysis of the structure of its words. If correct, while this is certainly not a ‘decipherment’, this result would be counter-evidence to certain versions of the theory that the VM is a medieval hoax that is undecipherable because it is pseudo-writing, meant to have the appearance of language but having no decipherable content in any natural language.
Now, I am not a specialist in information theory, and I’m not truly a specialist on the Voynich Manuscript (although I have played one on TV), but I am a linguist and I do research on writing systems and allied representation systems like written numerals. And several things bother me about this paper. The first is that, as Gordon Rugg (the most significant modern proponent of a ‘hoax’ theory) has pointed out in a comment on the new paper, no one is seriously claiming that the VM is pure ‘noise’ – it clearly is structured, and simply because the VM has some structure, even one that resembles language in some ways, does not entail that it is likely to have a genuine linguistic structure, much less a decipherable one. Rugg’s own (plausible) theory involves the use of a medieval ciphering system to rapidly produce language-like but meaningless text as part of a hoax, and Montemurro and Zanette have not evaluated this theory at all, as far as I can see, other than to dismiss it.
Furthermore, the only systems to which the VM is compared are two written languages in alphabetic scripts (English and Latin), one written language with a non-alphabetic script (Chinese), one computer language (Fortran), and one natural sequence (yeast DNA). But there are a wide variety of nonlinguistic, quasilinguistic, and paralinguistic phenomena aside from these, and they haven’t compared the VM to any of them. Montemurro and Zanette show conclusively that the VM has much more ‘information’ (structure) than the yeast DNA, which we would anticipate, but does not do a good job of accounting for the different types of encoded information, and structured non-information, which might be comparable to the VM. What is the information structure of known codes and ciphers (both broken ones and undeciphered ones)? What is the information structure of semasiographic systems like the glyphic system at Teotihuacan? What is the information structure of the linguistic productions of psychiatric patients who suffer from graphomania? What is the information structure of pseudo-writing like the Codex Seraphinianus which we know (since it’s a modern piece of conceptual art) carries no message? None of these comparisons would be conclusive but all of them would be informative. Right now the range of systems to which Montemurro and Zanette have compared the Voynich is simply too limited to be useful.
Montemurro and Zanette are also seemingly unaware of parallel efforts to use the information structure of undeciphered scripts to evaluate their language-like nature. Two of the most significant such efforts are the effort to show that Iron Age Pictish graphic symbols from Scotland constituted a phonetic script (Lee, Jonathan and Ziman 2010) and efforts to show that the Indus script of Harappan-period India and Pakistan either does (Rao et al 2009) or does not (Farmer, Sproat and Witzel 2004) resemble linguistically-based writing systems. These theories have attracted a reasonable degree of attention from linguists, and Richard Sproat, in particular, has done a lot of work trying to address the non-linguists’ methodological and conceptual approaches, some of which has been covered in extraordinary detail at the Language Log. There’s a much longer discussion to be had there, but suffice it to say that most linguists are skeptical of studies undertaken without any linguistic expertise and assistance. Again, without taking a position on any of these controversies, it strikes me as irresponsible literature-searching that the Montemurro and Zanette study is so fundamentally unaware of similar efforts in major publications such as Science and the Proceedings of the Royal Society. If you’re going to use physics to study written language, even if you’re going to ignore every single linguist who’s written on these subjects, maybe you should at least be aware of high-impact articles written in the last ten years by physicists using very similar methods to your own.
For the record, I think that any information-based effort that does not involve linguists at a serious level is likely to make invalid assumptions and thus be highly prone to producing nice-looking gibberish. For example, the Montemurro/Zanette theory seems to grant that the VM probably does not encode information alphabetically like English, and then suggests instead that it recalls “scripts where -as in the cases of Chinese and hierographical Ancient Egyptian- the graphical form of words directly derives from their meaning.” (Montemurro and Zanette 2013: 4). Let’s assume we are prepared to set aside their use of the term hierographical, which is a bizarre nineteenth-century anachronism that was vaguely popular for a time prior to the decipherment of the Rosetta Stone, but which has never, in any European language, been a preferred term. More significantly, it is a gross, entirely improper characterization of Chinese and Egyptian to argue that the “form of words directly derives from their meaning”. Both scripts have massive phonographic components with some representation of morphemes, words, and semantic categories with signs, as every expert on writing systems has known for thirty years or more – certainly the work of John deFrancis shows this eminently clearly. Even lumping the Egyptian hieroglyphic and Chinese scripts together in a single category ignores the massive differences between them. So in essence, Montemurro and Zanette seem to be suggesting that the VM has properties similar to no writing system ever known to have been used on earth, because they do not seem to know what sorts of writing systems they are comparing things to.
In short, I’m afraid what we have here is another case of non-specialists applying the methods of one field inappropriately to some actually complex linguistics problems to evaluate a text whose decipherers (a group riddled with charlatans and cranks) have offered us everything except an actual decipherment.
Posted by schrisomalis on June 20, 2013
Thanks to all those, either in the comments or elsewhere, who helped with additional suggestions for my Lexiculture project for my undergraduate course this fall. I now have over 50 words on my long-list for the students to choose from, which should be enough, but more ideas are, of course, welcome, especially if I decide I want to assign this project multiple years.
Posted by schrisomalis on June 19, 2013
Combining two of my favourite passions, mathematics and linguistics, in a fascinating social analysis of prescriptivism, national identity, and scientific vocabulary, is this video from the Numberphile Youtube channel, entitled, ‘Is it Math or Maths?’
Numberphile regularly features short, popular videos about interesting mathematical stuff, mostly at a layperson’s level. This video features Dr Lynne Murphy, who teaches lexical semantics at the University of Sussex, and blogs about American/British English differences at Separated by a Common Language.
For the record, as a Canadian, I say ‘math’ but I also say ‘zed’, because that’s the way we roll.