Conservative skewing in Google N-gram frequencies

Google Ngram Viewer is a great tool, especially for rough-and-ready searching and visualization of linguistic trends, and as a teaching tool to introduce students to lots of interesting questions we can ask about language variation and patterning.  I use it all the time.  The default search parameters are for 1800 – 2000, and the Culturomics project notes that, “the best data is the data for English between 1800 and 2000. Before 1800, there aren’t enough books to reliably quantify many of the queries that first come to mind; after 2000, the corpus composition undergoes subtle changes around the time of the inception of the Google Books project.” Elsewhere, the Culturomics FAQ  notes that, “Before 2000, most of the books in Google Books come from library holdings. But when the Google Books project started back in 2004, Google started receiving lots of books from publishers. This dramatically affects the composition of the corpus in recent years and is why our paper doesn’t use any data from after 2000.”

OK, so we’ve been warned that the data from before 2000 is very different than the data from after 2000, and especially that 2004 marked a significant change in the corpus. Caveat lector, or whatever you will. But I want to know: In what ways have these ‘subtle changes’ changed the Google N-gram corpus, and therefore, what biases in word frequencies do scholars of language need to account for?

Lately, I’ve had some interest in post-2000 changes in word frequencies for my Lexiculture class project for the fall, and so I’ve been looking at N-gram data going up to 2008 (the last date you can search).  I have found some very weird declines in words that probably aren’t actually declining in relative frequency:










It seems notable that all of these words start to decline shortly after 2000, with a particularly steep decline right around 2004-05.    All of these words, I would argue, should be stable or increasing in frequency: these are words associated with modern technology and social life. Conversely, many timeless words (e.g., table, lamp, daughter) are flat or rising after 2000. It’s possible that intuitions about what should be happening to words can be wrong. But why are they all wrong in the same direction, and why do they all decline all at the same time?

– One possibility is that the data from 2000 onward aren’t complete yet.  There could be some books published over the past few years that haven’t been integrated into Google Books and thus don’t end up in the Ngram viewer.  But in any case, n-grams measure a word’s frequency relative to all words published in that year, so the fact that the collection isn’t complete should not affect relative word frequencies at all.

– It’s possible that Google Books has systematically missed archiving books oriented towards technology, but why would that be the case?  In fact, if tech-savvy publishers are more likely to submit their works to Google Books (which I think is plausible) than your average publisher, the effect should be to increase these words’ frequency.

– It’s possible that, in the absence of the controlled digitization of books from libraries that characterized the early period of Google Books digitization, and the work done to manage metadata in creating the N-Gram Viewer’s early dataset, massive error has crept into the database.  But again, why would this affect particularly modern words negatively, while not affecting words whose frequencies has not been changed?

I think I have a better answer.  I think that the N-gram Viewer may be skewed, not because anything significant is being missed, but because something significant is being added.  There is a growing tendency for cheap electronic reprints of public domain books to come out and be immediately included in Google Books, with the publication date listed as the date of its electronic reprinting.    If Levi Leonard Conant’s book The Number Concept (1896) is scanned and reprinted by Echo Books in 2007, the Google Books metadata doesn’t recognize it as an 1896 book at all.    It’s digitized and scanned twice, once (correctly) as an 1896 book and again as a 2007 book. In fact, because it’s in the public domain, I could make my own e-book version for sale as a 2013 book and have it listed again.     And while that’s not likely to have a huge effect, imagine every reprint of A Tale of Two Cities or Wuthering Heights that has flooded the market since the invention of e-books, stimulated by and reinforced by projects like Google Books.

Now, I suppose there is a case to be made that the 2007 reprint of Conant is, in some way, a 2007 book.  After all, reprints have never been excluded from Google Books and there are plenty of pre-electronic 20th century reprints of Wuthering Heights in the corpus.   But each of those earlier reprints represents a costly decision by a publisher that a particular book is important enough and will be read widely enough to warrant its republication.  From a ‘culturomics’ perspective, there’s a case to be made that these reprints really constitutes a cultural ‘signal’ in the year of its reprinting, and from a linguistic perspective, we presume that lots of readers will read the words, no matter if they are obsolescent at the time.  But as the cost of producing reprints as e-books (or print-on-demand) declines, the ‘culturomic’ value of these books also declines, because publishers no longer need to be concerned about whether many (or even any) people buy these books.   The author is long dead, so there are no royalties, and there are no or minimal up-front publishing costs.   So Google Books is now being flooded with material that may be largely unread and does not reflect the linguistic or cultural values of the time.  Its primary effect, for the N-gram viewer, is to skew relative word frequencies in a way that makes 2013 resemble 1913 more than it actually does.  That’s a conservative bias, for those following along at home.

We can then derive a couple of corollaries to check if this theory is correct:

– There are likely to be some words that, while still increasing in frequency, do not increase in frequency quite as much as their actual use should indicate.  These are words that have shot up out of nowhere over the past few years, and are continuing to accelerate, but their N-gram shows a tapering off.  We see a great example of this in a word like transgender, where we see, right around 2004, a clear decline in the acceleration of its frequency, counter to expectations.

– If some word frequencies are artificially depressed, some other word frequencies must be artificially inflated.  But which ones?  There are likely to be other words that were very common in the 19th and early 20th centuries (the period where most of these reprints are going to come from), but have been on the decline for a long time and are now quite rare, that show an apparent ‘rejuvenation’ after 2004.  Again, we find such a word: negro (uncapitalized), which is virtually non-existent in contemporary written English but was at its peak in the period from 1880-1920, and which shows a clear ‘bump’ after 2004 which can’t possibly be real.    You can even see this to a lesser degree with a word like honesty, which (for reasons perhaps best left unanalyzed) had been in decline throughout the 20th century but experiences a bump, again, right around 2004.

In summary, because the Google Books corpus today is derived largely from publisher submissions, and because there is a major signal coming from reprints of public domain books published before 1922, n-grams from 2004 onward (and, to a lesser degree, from 2000-2004) are skewed to make modern words appear more infrequent than they actually are, and obsolescent words more common than they are.    The moral is not that Google is evil or conservative or that culturomics is stupid or that the N-gram Viewer is fatally flawed.   I do think,  nonetheless, that we ought to be aware that the specific kinds of unintentional skewing that are being produced are ones that tend, in a conservative direction, to replicate the linguistic and cultural values of a century ago.  This problem is not going away, absent a systematic effort to eliminate reprints from some future N-gram dataset, and it may even be getting worse as electronic reprints become more and more common.  Stick to the pre-2000 data, though, just like they advise, and you’ll be in good shape.

Thanks to Julia Pope for her consultation and assistance on aspects of Google Books metadata and cataloging practices.


Michael Sandel, MOOCs, and the public good: On the justice of Justice


Michael Sandel is one of the world’s most prominent political philosophers.  He’s been involved with some of the most profound scholarly debates about justice for over thirty years, debating with such prominent figures as John Rawls and Robert Nozick.  I’ve been reading Sandel since I was an undergraduate twenty years ago, and am extremely sympathetic to his argument against a purely rights-based conception of justice.  His latest book, What Money Can’t Buy, stresses the importance of non-market values in assessing justice, and argues that commodification fundamentally changes the nature of the commodity being exchanged (this video expresses his position well).  While I’m not exactly a communitarian, neither is he, precisely, and I suspect that we agree on an awful lot.

Since 1980, Sandel has been teaching a large course at Harvard entitled Justice.  It’s a fantastic course, and one of the most popular undergraduate courses there.  It is, like many courses in political philosophy, lecture interspersed with Socratic discussion, and is pretty traditional in that respect.  In 2009, he made the entirety of one set of lectures for Justice (12 hours total) available online and on PBS.  So basically, you can go to the Justice Harvard website, or just to Youtube, and watch the lectures all the way through, for free.  This is, in my view, a wonderful use of Sandel’s time and energy and a real contribution to public intellectual life.

edX is a non-profit, massively open online course (MOOC) provider run by MIT and Harvard.  It is not a for-profit venture like Coursera or Udacity, and it does not directly provide credit-bearing courses at any institution.  It releases, free to anyone, course materials such as video lectures, in an online format, and also provides free (for now) ‘certificates of mastery’ if you do some online multiple-choice tests.  So a high school junior who’s keen on politics or philosophy, for instance, can register and get a huge head-start on college-level courses.  Anyone can follow along for their own personal enrichment, and any instructor can use these videos if they choose, as part of course content.

In March 2013, edX started to offer ER22x, JusticeX, online for free.  Basically, this consists of the entirety of the previously-released video lectures, as well as a set of five multiple-choice quizzes (5 questions) and a 25-multiple-choice question exam, a discussion board, poll questions, and a couple of other features.  edX also forms agreements with other institutions, licensing its content to be used by other nonprofit institutions (like universities) for credit – in other words, to allow other schools to integrate ER22x or other MOOCs into existing curricula.  So edX recoups its costs, and other schools get access to this high-quality content, or so the theory goes.

In May 2013, a controversy arose when San Jose State University administrators announced that they had signed a deal with edX to licence JusticeX for its philosophy students.  Administrators then asked the philosophy department at SJSU to be part of a program in which the equivalent course would use content from ER22x.  It was stressed, at the time, that no individual faculty member would be compelled to use JusticeX in part or in whole – faculty would retain academic freedom to that degree.  At the same time, it was clear that the goal of this contract was cost-saving: SJSU is putting a lot of money into this and only get a return if they can teach larger numbers of students than previously, with fewer faculty resources, in this and other courses.  So the idea that this was just going to be a free choice of faculty to use or not use this resource is not quite right either.

In response, the SJSU philosophy faculty did two things.  First, they declined the opportunity to participate in the use of ER22x in their teaching.  Second, they wrote an impassioned letter directly to Michael Sandel, and more broadly to the public, discussing their principled refusal to do so, on several grounds.  It’s clear and well-written and deserves close attention.  Sandel responded with a short but unequivocal letter indicating his support for the SJSU philosophers’ right to use his materials, or not, as they saw fit, and recognizing the potential for MOOCs to damage higher education, but also defending his own right to distribute his material through edX.  The chair of the SJSU philosophy department now has indicated that apparently ER22x materials will be used by the English department, although how that could work is entirely unclear to me and, I suspect, to anyone else.

An Intervention

I teach at an institution that in some ways (but not others) is like San Jose State: a public urban university with lots of minority students in a cash-strapped state.  On that basis, it is obviously a matter of interest to me how this debate proceeds.  I also think, on the basis of general principles of fairness and interest in the public good, it’s reasonable for anyone to have an interest about how MOOCs like ER22x are integrated into higher education and particularly public higher education.

It occurred to me, upon hearing about the SJSU / Justice controversy, and reading the SJSU philosophy department’s response, that this is indeed a question of justice, one that can be addressed directly within the terms raised by moral and political philosophy, and specifically the main issues addressed by Sandel himself in his Justice lectures.  While I am not a philosopher (of any stripe), I have certainly read in that field to some degree.  But I also recognize that I am not an undergraduate, and that I have never taken any online course (massive or otherwise), so I am writing, to some degree, from a position of ignorance.

Thus, in the interest of not spouting off mindlessly on that about which I do not know, and because I have a longstanding interest in the subject matter, and because (to be honest) I found myself with a light schedule this past month, I decided that I would go back to school (minimally, at least), and register in ER22x, watch the videos, and take the quizzes and the exam.  I have now done that.

Now what I’d like to do is to discuss ER22x, and the SJSU controversy, in light of some approaches that we might take to the key questions of justice involved, following the structure of Sandel’s Justice lectures themselves, using the sequential framework which he uses with his own students.  Ultimately I’m going to contend that there are several independent grounds for thinking that it is a bad idea for colleges to make these sorts of deals.

Preliminary Thoughts

First, even if you watch all 12 hours of Justice online, you are not getting a full course worth of lecture material.  Many university-level courses are three credit-hours, which means (supposedly) three hours of class time per week, over a term of thirteen weeks.  Let’s put it at 35 hours which is roughly what most of us (including me) actually teach in an average course.  So we note, from a purely arithmetical standpoint, that 12 is not 35.  At Harvard, there are tutorial sessions, taught by graduate students, to supplement the lectures, and there is also a discussion forum/blog.  Any course using the licensed material is going to need to do something else – so no one is talking about just having students watch videos and take multiple-choice tests for credit.

Second, other than the 12 hours of videos, the additional material offered by edX is extremely sparse, to say the least.  Minimally monitored discussion boards, canned ‘poll questions’, short multiple-choice exams, weekly ‘forum digests’ constitute the value that edX has added to the free, openly available videos which have been around since 2009.  It’s not entirely clear how that sort of material would ever be used by SJSU faculty or faculty anywhere else.  At best, it duplicates things that any university already has a license to (like Blackboard), or can easily/freely reproduce.  So basically what this contract does is turn into a market commodity that which is already freely available.

Third, and let me be very clear: Michael Sandel is a fantastic teacher and thinker whose presentation in Justice is outstanding.  I suppose that is part of the argument – that in some way, this brings one of America’s most popular lecturers, who also happens to be an exceptionally clear-headed scholar, to a massive audience.  It’s clear and well-presented, working towards a general argument presented in the final lecture of the series.  Sandel engages seriously with his students without interrupting them, but also keeps careful control of what (by the nature of the subject) could get out of hand.  The class is huge (something like 1000 in the room!) but he still learns quite a few names over the term.  He’s funny without devolving into a madcap style.  I learned a lot from it, because Sandel puts together the contrasts between thinkers in some novel ways.

Fourth, I am taking it as a given that Michael Sandel is not directly receiving any financial incentive when edX sells his content to other institutions.  I don’t have any direct knowledge of the matter, but I take him at his word that he doesn’t want to put other faculty out of a job, and obviously if he had such a stake in the enterprise then he would be inconsistent in that claim.  So let’s assume that edX is charging SJSU and, as a nonprofit, are recouping costs, but that no one at Harvard or edX is making a direct financial profit from this deal.

So I am starting from these premises and asking you to accept them as facts in evidence.  I am going to now proceed to make the case that there are serious grounds on which to suspect that this is, or has a high potential to be, a deeply problematic scenario.

An Argument from Utility

Sandel starts his course with the famous ‘trolley problem’ and a fascinating discussion of utilitarianism, following Bentham. Now, I’m not an especially good utilitarian, but I acknowledge (along with virtually everyone) that questions of utility do, indeed, come into play in matters of justice.  So let’s just ask, at a very basic level, who benefits and who is harmed, and how can we weigh these against one another?  This list is not meant to be exhaustive.

–    SJSU students might benefit from having the lectures of Michael Sandel as the basis for their coursework in philosophy;
–    edX clearly benefits from the licensing of the material (they are nonprofit but do need to recoup costs);
–    Sandel himself might benefit from greater exposure (which could ultimately lead to lucrative speaking engagements, etc.);
–    SJSU faculty might benefit from not having to construct their own lectures;
–    SJSU administrators might save money for the institution because they can have larger enrollments.

–    SJSU philosophy students might be harmed by not having ‘their own’ teachers – by the lack of engagement with the person delivering the course content;
–    SJSU faculty might be harmed if their academic freedom were compromised; they might be replaced by graduate students or adjuncts or ‘facilitators’, or they might be denied pay increases or other privileges;
–    SJSU as an institution pays edX and these costs, if not recouped in some way, would force cuts elsewhere, which would also harm other faculty and students;
–    Other departments at SJSU might be harmed if their enrollments were to drop due to the use of ER22x as a credit-bearing course;
–    Sandel’s reputation could suffer if he is perceived (rightly or wrongly) as a crass commercialist cashing in on his fame, or as a narcissist;
–    The discipline of philosophy might be harmed if Sandel’s lectures became normative throughout many institutions, due to reduced intellectual diversity.

We could go on for a very long time trying to weigh these benefits and harms, and I think that reasonable people could disagree on their relative weights, and in any event, I do not think we know, fully, what the consequences are or would be.  We (including Sandel himself) certainly could not have predicted them in detail when ER22x was created.  Right at this very moment, all that has happened is a single department was invited to use this licensed material, and they said no.  We certainly do need to think about potential consequences, and we are right to be concerned about the harms, but the details matter to such a great degree that it is very difficult to proceed from this point using utility alone.  We need more than just some sort of hypothetical cost-benefit analysis to decide if this is a morally just course of action.

An Argument from Liberty

Now, if I’m not a good utilitarian, I’m particularly not a good libertarian, in that I see liberty as principally as a means to good ends decided by some other criteria, and thus not universally valuable.  In his Justice lectures, and in his own scholarship, Sandel pays close attention to the pure libertarian arguments of people like Nozick, and the more nuanced, rights-based approaches of people like Rawls.  Any positive argument for the use of edX is clearly going to need to invoke market principles and individual choice, so I need to pay it some attention.  In the case of the JusticeX controversy, I think this is, if not a morally wrong market-based decision, a bone-headed, idiotic decision by SJSU.

The simplest reason that the decision is stupid is that, by anyone’s reckoning, the principal value of this deal lies in Sandel’s lectures.  And these are all available for free, for students, for faculty, or for anyone else.  They have millions of views on Youtube.  And given that edX really isn’t providing much more than the lectures, it is a rather foolish decision for SJSU to pay to license this content for its own use, when it doesn’t have to do that at all.  Any professor at SJSU could have decided, at any time, to assign the Justice lectures as required viewing, without anyone having to pay anything.  Because they did not choose to do so, we can conclude (from this perspective) that they decided it was not a good idea.

I am confident that the terms of employment of SJSU faculty grant them this academic freedom without any significant qualification, and so, given that this contract was entered into freely, it is illegitimate for SJSU to compel faculty, against this agreement, to use any particular materials.  SJSU can of course spend its money however it wants, including to pay edX, but having done so without consulting its own philosophers, or asking them why they chose not to use the material in the first place, they can hardly be surprised when their purchase is viewed as foolhardy and ill-advised, and they can hardly be disappointed that their philosophers continue to choose not to use the edX material that they had been choosing not to use all along.

Of course, in actuality SJSU has not compelled anyone to do anything (yet?), and has couched its decision in terms of providing a new resource (in the same way that they might subscribe to a journal, or purchase access to a research database).  Where SJSU faculty see a difficulty is in the informal processes of power that exist within institutions, the potential for informal coercion either now or in the future, and the broader risk to what they see as a great social good in the current framework of learning in higher education.  Libertarian political philosophy doesn’t have much to say about those sorts of issues, so let’s turn to them now from different perspectives.

An Argument from Telos

In lecture 20, ‘Aristotle: Freedom vs. fit’, Sandel begins to outline a conception of justice that focuses on the good rather than the right.  Ultimately he’s going to use that in favour of a conception that integrates rights and virtue – this is his central contribution to the field.  He notes that debates over justice frequently are unresolvable solely by reference to some generic abstract right, but also require some conception of what is good.  Following Aristotle (and others), he insists that we think seriously about the purpose/goal (Greek telos) of things and institutions as part of our discussion of what is right.  I agree.

He illustrates this with the example of the golfer Casey Martin, who due to a disability could not walk the course and requested the PGA to be allowed to use a golf cart.  After the PGA denied his request, he took them to court, and eventually won at the Supreme Court.  Sandel uses this example to illustrate that deciding what is right in this case required the court to make a judgement about what is essential to the game of golf, and what is extrinsic or irrelevant.  Is walking the course essential to the game?  If so, then part of what determines the distribution of honours in the game must be one’s ability to walk the course.  If not, then not. But not answering the question isn’t really an option.

In a similar vein, we could ask: what is the purpose or essential nature of a professor?  If lecturing is the critical element in question, one could argue that any set of lectures will do, and that Sandel’s lectures are better than most, and certainly have demonstrated appeal, so why not adopt them?  Under this logic, SJSU professors who decide not to adopt these lectures are merely being self-interested and protecting their own jobs to the detriment of their students.  But if, in contrast, the role of professor is centrally about engaging students with one’s own views, and professing those views, then replacing a professor’s lectures with canned material is improper.  And similarly, if a student’s role is centrally passive, then it doesn’t much matter if the Socratic responses are given by SJSU students or Harvard students (from 2007).  But if actual engagement with the ideas is centrally important, then SJSU students are getting a very different learning experience than their Harvard peers.

Most faculty who I talk to insist, rightly, that their research makes their teaching better and vice versa.  Both the SJSU letter and Sandel’s response stress the importance of engagement with students, and this is, I believe, entirely correct.  Now, we can imagine a course that uses some Justice content in the same way that one might use, say, a textbook or a blog post as required reading, but in which students still meet three hours a week with full-time faculty, engage with different sorts of philosophical ideas, and write papers.  This is the sort of use which Sandel seems to find acceptable in his response, and I agree.  Although ER22x is not in my field, I’d strongly consider a course where I play the foil to some prominent figure’s video lectures in anthropology or linguistics, working as hard as I can to present opposing views, much as team-taught courses often rely on debate between instructors.  But the only scenario in which I see administrators having a real interest in me doing so is one in which they simultaneously are seeking to limit my engagement with students, e.g. by increasing class sizes or decreasing other resources.  I welcome the possibility that I may be wrong.

Now of course, one might argue that the university (who hire and pay faculty) and students (who pay tuition) are the only ones who can decide what is the nature and purpose of a professor, or of higher education in general.  Such a thinker would also have to agree that if what students demand is cheap canned content tested through multiple-choice exams that provides a credential, and universities want to supply it because it’s cheap, there are no grounds to question those choices.  We see a hint of this in this case already; this article on edX’s business model (referring to a different course) notes that “The early results were encouraging. The semester before the edX pilot, 60 percent of students passed the San Jose State course; 91 percent passed the edX-infused version.”  Students are happy, the institution is happy.  But is happiness the relevant consideration?  I can guarantee you that I can raise my pass rate for any course very easily, by watering down content or testing less rigorously.  My student evaluations will also go up, which could benefit me financially, since my raises are based in part on excellence in teaching.  And my (contractually-guaranteed) academic freedom gives me the right to do so.  Even so, I suspect that no one is going to argue that this would be the right decision.

On similar grounds, I argue that SJSU administrators have an obligation to ensure that the content they are purchasing from edX actually improves learning regardless of their cost savings or student interest.  That they have not done so – for instance, by closely working with philosophy faculty to see what needs they have –  suggests that they are improperly thinking only about factors that are not essential to the teaching of philosophy at their institution.  I’m not saying that they can’t think about costs, but the only way that this scenario would actually save money is by taking away a central and essential part of the mission of the teaching of philosophy at their institution.

An Argument from Sandel

As I mentioned at the beginning of this essay, Michael Sandel’s special intervention in contemporary moral and political philosophy is to emphasize the ways in which over-commodification of things creates difficulties, and to seek limits for where markets should and should not intervene, or at least need to be limited.  He argues (and here I am not talking principally about Justice, but his more recent scholarship) that not only do we have a market economy, but we are turning into a market society, where even asking whether something should be turned into a commodity is seen as an improper or unaskable question.  It is taken as natural that market transactions are the only proper way to proceed.  He questions this using a series of key examples of entities and relationships that should not be commodified.

I contend that the professor-student relationship is exactly one of those relationships.  The reason that many faculty object to the ‘customer is always right’ model of education is that it produces an expectation that a professor is providing a specific product in exchange for money.  It suggests that if a student doesn’t learn, or isn’t happy with what they have learned, or doesn’t receive the ‘correct’ product (grade) for their perceived effort, that there is a problem with the teacher or the course.  From time to time I have tried to (at least) use a different analogy, which is that a professor is like a personal trainer (insofar as the product is largely dependent on customer effort), but even then, when I say things like that, I still fall into that same trap of treating the relationship as principally a commercial one.  This is exactly the sort of logic that Sandel’s work rejects, and he is right to do so.

Similarly, we would rightly reject a world where you could pay your professors extra for extra teaching time.  Slip me a twenty, and I’ll meet you after class for some extra tutoring.  Of course professors get paid for the work they do, which includes teaching, and students pay tuition based on the courses they take.  But we place professors’ work at a remove from the commercial transaction out of the recognition that some greater good is served by doing so.  From a consequentialist view, directly paying professors for their time deprives poor students from access to the extra tutoring and thus produces an unfair playing field.

It also, I think Sandel would argue, devalues the professor’s intellectual work of teaching by treating it as work for hire. This is not some sort of highfalutin appeal to the wonderful nature of professors.  Some professors, quite frankly, stink.  Some of them recycle old lectures.  Some of them harass their students.  And, of course, we all get paid.  The idea that the student has no entitlements whatsoever is even more nonsensical than the consumer model presented above.  But, in a fundamental way, turning to purchased course content devalues that content, and also the work that actual professors do each and every day.  It produces an environment where professors are constantly at the beck and call of incentives – financial ones, and also the sorts of pressures that adding this content creates within an institutional framework.

Sandel also notes (again, rightly, in my view) that because we are not social atoms, but carry around with us a set of experiences, obligations, duties, identities, and other qualities, that there are real risks entailed by social isolation from those who have different experiences from us.  In particular, he argues that growing isolation and inequality between social classes is threatening American democracy because of an inability to have meaningful exchanges between, for instance, rich and poor.  But this is exactly what SJSU philosophers fear will happen in the future: elite universities will provide high-quality instruction to a select few, while the vast majority of students (right now 80% of all students attend public colleges and universities) increasingly learn from canned content with glorified teaching assistants leading discussion groups, and with the resulting credentials being viewed as radically different in nature.  I grant that this is not a vision that has come to fruition yet, but anyone who functions within public higher education is clearly aware of these sorts of pressures to save money, to claim that sacrifices are necessary and then to offload those sacrifices onto students.  It is not an irrational fear, and it is one that the SJSU faculty are right to be concerned about when their institution claims, without evidence, that this purchased commercial material is going to be good for their students.

If I were to meet Michael Sandel on the street, I would ask him, “Aren’t you concerned that in monetizing your course, by licensing free content to edX so that they can sell it to institutions, you are crowding out other values, by the very terms of your own scholarship?  Aren’t you changing the character of Justice by turning it into a monetized product to be sold to other institutions?”  It’s entirely possible that he was unaware, when he permitted the creation of ER22x, of these sorts of potential abuses of edX, but now that they are clear, all of us need to be unequivocal.  For Sandel to publicly and freely release his lectures on the JusticeHarvard website in 2009 was an extraordinarily virtuous act that corresponds with his vision of the social good.  For him to then turn around and release it in a format that, while free, allows Harvard / edX to sell it to cash-strapped institutions, is inconsistent with that vision.


I don’t think you need to be convinced by all of the above arguments in order to find that the situation in which SJSU philosophy professors find themselves is deeply problematic.  I am not claiming, nor do I think it is sustainable to claim, that all online course content is unjust, or that all MOOCs are inherently bad.  I do think that, insofar as the edX business model is to sell otherwise free content to other colleges so that they can put pressure on faculty to dilute their content as a cost-saving measure, it is a serious problem, and it is not one that can be dismissed using the rhetoric of choice.  Understanding exactly why  it is a problem, and why so many faculty find these sorts of bargains to be untenable even when they are couched in the language of choice, is a challenge that we, as faculty, should expect to confront as such efforts become more pervasive.  I am glad that a set of lectures like Justice are available as a public good to help us in making that case.

Controversy at the AAAS: to ABD or not to ABD?

Leslie Berlowitz, who is the head of the American Academy of Arts and Sciences, is stepping aside temporarily after (apparently) claiming on some federal grants that she had a doctorate when in fact she did not.    An inquiry will follow. The AAAS is not a household name but within academia it’s a big deal – selection as a fellow of the AAAS is restricted to a small fraction of scholars in a particular field, and the presidential salary is nearly $600,000 annually, plus perks.    Ms. Berlowitz was a doctoral student at NYU from 1967 to 1978, so presumably was relatively close to completion and was an ABD (all but dissertation) student.   But that was 30 years ago.   Apparently the claim being made is that some anonymous staffer wrongly put the information on some grants, which is plausible but suggests a serious administrative failure.

I don’t know the details of this case beyond what’s been reported in the news, though, so we obviously need to let the inquiry take its course.  I will say that I’ve seen similar situations with senior PhD students describing themselves on their CV or cards or grant applications as  John Doe, PhD (ABD) or John Doe, PhD(c) – c for ‘candidate’, or even, on a CV or biosketch, with a PhD in hand but with a date of the current year, in cases where they expect to defend in that year.   I don’t like any of these practices, for the simple reason that they are potentially ambiguous or deceptive even when there is no ill intent.    Simply the fact that there are such different practices in different disciplines and countries – I, for one, had never heard of PhD(c) until a couple years ago, and if asked might have thought it stood for ‘clinical’ or something else.  I don’t think it’s an ethical lapse, but it could lead someone else to wrongly think that you do have the PhD, and this is never to your advantage, and potentially to your detriment.   I understand how, after six or eight or eleven years of work, it’s tempting to want to claim *something* on your business card beyond an MA or whatever other degree you earned, but putting those three letters after your name means something (out there in the world) that you don’t want to falsely claim, even innocuously.    It is fair, of course, on your CV to put under your dissertation title, something like ‘defense expected August 2013’ or some such thing.    I have no idea whether anything like this happened in the Berlowitz case, or something more pernicious, or something more innocuous.  But let’s all just exercise some common sense and use ‘PhD’ only when it’s earned.

Paleography at KCL

Over the last week there has been a groundswell of action in opposition to the decision to eliminate the paleography program at King’s College London, most significantly the position of the Chair of Paleography, Professor David Ganz, which is the only such position in the UK and perhaps in the English-speaking world. Paleography, the science of manuscripts and handwriting, lacks the direct economic and political impact of other fields but has enormous influence on work throughout the historical disciplines. My new book relied significantly on Professor Ganz’s co-edition/translation of Bischoff’s Latin Paleography. More broadly, the notion that any scholar’s research should be narrowly dictated by budgetary considerations – that evaluations of scholarly merit ought to be conducted on the grounds of immediate financial impact – is anathema to the principles of academic freedom.

A Facebook group and an online petition have already been organized to oppose this misguided bureaucratic decision. I encourage any of you who may be concerned about the impact of this decision to become involved through these or other means. A parallel effort has been organized opposing the firing of several KCL philosophers.

Citation anxiety

I am always very careful to indicate, in guidelines for essays and papers, that I don’t care what bibliographic or citation format my students use. APA, MLA, AAA, NWA … I always say that as long as they pick one format and use it consistently, they’ll be just fine. I have a soft spot for Chicago style (author-date) but I certainly don’t ask anyone to use it. Yet every term, I get at least one student who speaks to me or emails me in concern about bibliographic or citation format. Even after I insist that I have no preference, they just can’t quite be convinced that I won’t deduct grades for failure to conform with an arbitrary set of guidelines, including things like whether to capitalize every word of book titles, or whether to put parentheses around dates. They can’t quite believe me, either, when I tell them that many journals and presses use minute variations of the major styles, so that whatever I do as an author will eventually require professional attention.

Everywhere I’ve taught, I’ve seen this phenomenon, again and again. I also see, again and again, students who are apparently indifferent to serious writing or analytical problems but still get stuck on fine points of some style guide. What gives? Is it really the case that most professors are such sticklers for formatting issues that it is rational for students to be so concerned? Maybe, but I’m not convinced. Alternately, maybe citation style is something that seems more objective than other, more significant aspects of paper-writing. When you’re unsure of other issues, or know you have problems with them, hanging on to the one thing that you know you can get just right is a security blanket. Whatever else may be wrong with your paper, at least you got the citations right. I don’t know about this either, though – if it were really true, wouldn’t more students actually use a single style correctly and consistently, even after inquiring?

So, colleagues and students, what do you think? Is citation anxiety ubiquitous? If so, is it reasonable? And what can be done about it?

A feisty embuggerance

When I grade my students’ paper proposals, I make a point of doing a brief Google Scholar search for each student’s proposal, which a) helps me evaluate how thorough they have been; b) helps me help them find additional material (I then give them the sources I found, but also the keywords I used to find them). One of my students in my introductory linguistic anthropology course this term is doing a paper on linguistic aspects of laughter and humor. During my search, I encountered the following citation (direct from Google Scholar to you):

Embuggerance, E., and H. Feisty. 2008. The linguistics of laughter. English Today 1, no. 04: 47-47.

After I stopped laughing, I set to figuring out what was going on.

1) I quickly discarded the theory that an unlikely duo of scholars actually had this pair of names – although that would have been too awesome for words. In fact, no other article listed in Google Scholar has an author named ‘Embuggerance’ (although there are a couple other Feistys).

2) I also considered the possibility that this was one of the many metadata errors in Google Scholar; for instance, there are thousands of articles whose purported authors are named Citations or Introduction or Methods, due to errors where it interprets headings like “IV. Methods” as a name “Dr. I.V. Methods”. But this seemed unlikely in the extreme in this case.

3) This left the possibility that these were pseudonyms adopted by particularly amusing authors as part of a parody article.

In this case the article is in fact a book review (which I could tell because it’s all on one page), so I didn’t recommend it to the student, but I did request it for my own edification. Lo and behold, it arrived today as a PDF.

‘The linguistics of laughter’ is a book review of a The Language of Humour by Walter Nash. It’s perfectly ordinary and non-satirical, and it does not contain the words Embuggerance or Feisty. But next to it is another book review, entitled ‘Concise and human’ which contains the following passage (emphasis added):

Silverlight’s concise and human reports cover a surprising range of curious items, from Acid Rain through Bottom Line, Catch 22, Dinner/Supper, Embuggerance, Escalate, Feisty, Holistic, Krasis, Ms, Naff, Quorate, Shambles and Viable to Yomping.

The four bolded words appear on a single line, and the fact that the Google Scholar metadata thinks that the initials of the ‘authors’ are Dr. E. Embuggerance and Dr. H. Feisty seals the deal. This is the source, and so something like option 2 above is correct. But this is really weird. Not only do the pseudo-authors appear in the middle of a contextualized sentence (not in headings), but the sentence is in the wrong review – a review that itself is found (mostly correctly) in Google Scholar!

To make matters even worse, at the end of the reviews section the phrase ‘Reviews by Tom McArthur’ appears – an attribution which is found in the metadata for ‘Concise and human’ but not for ‘The linguistics of laughter’. And, as if this were not bad enough, even though both reviews are listed as being from 2008, the PDF clearly shows them as being from 1985. If I were a gambling man, I’d wager that 2008 is the year when the metadata was added and/or the file was scanned.

Now, mostly this is just a humorous anecdote; I don’t mean this as an indictment of Google Scholar, which I consider to be the most useful way for most scholars to find academic literature, and which I use virtually every day. But one has to wonder at the process (automated or otherwise) that leads to this comedy of errors. A great deal of virtual ink has been spilled over at Language Log (here and here, for instance) on the metadata problems with Google Books / Google Scholar and its implications for linguistic research, for tenure cases that rest on faulty citation records, and other potential problems. Until there is a way for these sorts of errors to be corrected by end users, we may all be well and truly embuggered.

Ig Nobel 2009

The annual Ig Nobel awards “for achievements that first make people laugh, then make them think” were given out last night, and once again, anthropology has been well-represented. Catherine Bertenshaw Douglas and Peter Rowlinson won the award for veterinary medicine for their demonstration that cows that are humanized by giving them names produce more milk than those that remain, uh, anonymous. Although they are veterinary scientists their work appears in the interdisciplinary anthropological journal Anthrozoös. Meanwhile, the Ig Nobel for physics went to the biological anthropologists Katherine Whitcome, Liza Shapiro and Daniel Lieberman for their work (which appeared in Nature a couple of years ago) explaining why pregnant women don’t tip over. This is extremely important as it bears directly on the evolutionary costs and benefits of bipedalism, among other issues.

See the full list of winners here.

Bertenshaw, Catherine and Peter Rowlinson. 2009. Exploring Stock Managers’ Perceptions of the Human-Animal Relationship on Dairy Farms and an Association with Milk Production. Anthrozoös, vol. 22, no. 1, pp. 59-69.
Whitcome, Katherine, Liza J. Shapiro & Daniel E. Lieberman. 2007. Fetal Load and the Evolution of Lumbar Lordosis in Bipedal Hominins. Nature, vol. 450, 1075-1078.