Monday, April 20, 2009

Why the Gourmet Report is a Failure

(by guest blogger Manuel Vargas)

I’m a long-time fan of the Gourmet Report.* Nevertheless, I’ve recently started to wonder whether the Report fails to measure faculty quality, even when it is construed in roughly reputational terms, that is, in terms of concrete judgments of faculty quality as seen by the mainstream of research-active elements in the Anglophone portion of the profession.

(Before you start to roll your eyes let me note I’m still a fan of the report, and despite the problem I’m about to note, I think it is like democratic government— deeply problematic, but better than any of the alternatives. Moreover, it isn’t like my department or my work is at stake in anything the report does— I’m in a department with no graduate program and my career, such as it is, is beyond the point at which the reputation of the institution that awarded me a Ph.D. is of much consequence to it. So there.)

Here’s why I suspect that the Report is a failure at measuring faculty quality: we are bad judges of our own estimates of quality. That is, I suspect that we are unreliable reporters about the work that we regard as best, in something like a stable, all-things-considered sense. (I certainly think students are unreliable judges of what teaching they learn the most from, and I suspect something analogous is true of philosophers.**) I suspect the quality of my quality assessment is a function of lots of different things— what I’ve read recently, what first springs to mind when I see their name, whether I had reason to attend very closely to something of theirs, what I’ve forgotten about their work, and if so, whether I disagreed vehemently or lightly with it, and so on.

Even bracketing framing effects, though, I suspect that my explicit deliberative judgments of quality fail to perfectly track my actual positive regard of quality for philosophers and their work in some complex ways. Here’s one way my judgments might fail to track my actual regard: X’s work was underappreciated by me simply because the ideas sat in the back of my mind, and later played a role in my own judgments about what would work and what wouldn’t, but I never picked up on the fact that it was X’s arguments about Y that did that for me.

Here’s another way that might happen: I could be aware of X’s work, and think well of that person’s work, but underrate its importance to my own thoughts in the following way: I might not realize how much of that person’s work I cite and respond to in a way that takes it seriously. That is, I could think that work is of very high quality (perhaps worth more of my time than any other work on the subject matter!) but unless I counted up citations or counted up the number of times I focus on responding to that figure, I might simply fail to realize how significant that person’s work really is for me, and so I might fail to accurately assess the quality of work. (Of course: I might also overinflate importance for a related reason—I spent a lot of time criticizing someone’s work because it is easy, but that makes their name loom larger in my mind than my actual regard for it.)

Here’s another way “under-regarding” might happen: I could be subject to implicit bias effects of a peculiar sort. That is, I could unconsciously downgrade (or upgrade) my global assessment of quality on the basis of perceived race/class/gender/age etc., even if, when asked, I sincerely disavow that these things have anything to do with it. On this picture, the relevant test might be closer to something like: what would I think of this work if I had never known anything about the author? A: We’ll never know.

(Relatedly, implicit bias might work in a more targeted way, only affecting my overall assessments of worth, and not my assessments of a particular argument, or even a specific paper even when conscious of race/class/gender/age/etc.)

Here’s another way that might happen: I could be less good than I think at blocking halo effects of various sorts. So, knowing that X is at Wonderful Institution Y may inflate my estimate of that person’s work unconsciously. Or, my agreement with X on matter M may lead me to think better of X than someone else when filling out a survey, because we share the same beleaguered position on some matter. Or, knowing X has published many times in some journal I think well of might lead me to cast doubt on my own assessments of the quality of the work.

Suppose you thought people in general are subject to these effects. Are philosophers vulnerable to such effects? I think yes, but I’ve been repeatedly told that philosophers are special, and alone among humans immune to these sorts of effects because of our marginally greater reflectiveness. So, I must be wrong.

Still, there is some evidence that at-a-time global self-assessments are subject to priming and framing effects. There is some literature on the way in which people are good at monitoring their own discriminatory behavior only when they have reason to think it will be observed (so, for example, you probably aren’t very good at monitoring your discrimination against groups whose salience is not raised for you: think age, disability, non-black/non-white racial groups, etc.). There is also the fast-growing literature on implicit bias and the way it operates. And, there is a large body of work in cognitive science and psychology casting doubt on the accuracy and efficacy of conscious, deliberative judgments with respect evaluative matters (something that Leiter himself, writing with Joshua Knobe, wrote about in the context of Nietzschean moral psychology!).

I don’t know how to correct for any of this, given the Report’s aim of measuring faculty quality in terms of conscious, explicit, global judgments of quality. Keeping track of citation impact corrects for at least one of the possible misalignments I mentioned, but not all of them. And anyway, citation impact rankings are subject to their own difficulties as well. (Although I think it would be a useful supplement to the Report to track this data, too.)

In sum, although I think the Gourmet Report probably fails to accurately report in fact estimations of faculty quality, it nevertheless is likely the best thing we’ve got going for judging philosophical reputation of departments and their specializations, as seen by the mainstream of research-active elements in the Anglophone portion of the profession.

*Indeed, I may be one of the longest of the long-time fans of the PGR: somehow I stumbled across an early version of it, back when Mosaic was my browser of choice, using email required some degree of sophistication with UNIX commands, and the Report appeared to be something produced on a typewriter. Anyhow, the Report was a big help when thinking about graduate schools and a nice supplement to local advice about where I should consider applying. In several cases the Report highlighted departments than individual advisors had never mentioned, but when I asked them (because it was listed on the Report), the response was invariably something like “Oh yeah— so-and-so is there; that place would be pretty good, too.” I think the report has improved in numerous ways since those early days, and I think that it continues to be excellent at its ostensive function as one of several tools for those thinking about graduate school in philosophy. Indeed, it is out of a sense of its ongoing utility for graduate students that I’m happy to serve as one of the folks providing specialty rankings in philosophy of action.

** Regarding student unreliability, the matter is complicated. But see Mayer et al. “Increased Interestingness of Extraneous Details in a Multimedia Science Presentation Leads to Decreased Learning” Journal of Experimental Psychology: Applied (2008) Vol. 14, No. 4, 329–339. And think about research on what teaching evaluations track. One might worry that too often teaching evals track those things irrelevant to learning, or even—if the Mayet et. al. data proves correct, impediments to learning!)


jonathan weinberg said...

Eric, for a number of those problems that you cite, one might well think that averaging across a couple-dozen philosophers might do well for evening out a lot of those kinks. In particular, anything that involved an "implicit bias of a _peculiar_ sort" might well be expected to come out in the wash. More worrying, though, are the effects that will likely be found, uncompenated, across a wide sample of the judges, e.g., halo effects.

Manuel Vargas said...

Hi Jonathan- (I was the poster, not Eric!):

Yeah, I think some of that stuff will probably wash out with enough philosophers involved in rankings, though in some of the smaller area rankings I'm less convinced. If everyone has read that great article by X in the past 3 months and the great article by Y was some time ago, and Y's next great article isn't out yet, I wouldn't be surprised if there was some slight skewing towards X. (But maybe not?) And, I suspect, that there are some other general effects here: people at less visible places will not be thought of as easily; and in general, I think there is probably some separation between rankings by the generation of the raters, with people tending to favor folks from their own generation more than they are favored by folks from different generations. But your main point, that some of those effects will wash out with averaging across raters is surely right.

And, of course, I'm with you that those effects that persist uncompensated across a wide sample of judges are more worrisome for ratings.

Anonymous said...


I think you raise many very interesting points about the Philosophical Gourmet. I think issues of “bias” are why the Report is a “failure.” While I do not disagree with your conception of bias and how it affects the rankings—it seems to me that there are even more egregious, general, and straightforward types of bias that make the Report suspect as an “objective” indicator of True quality.

One of your main points is that racial/class/gender/age bias is probably unconsciously at work in those who are judging and ranking programs. I grant this point and think you are correct.

Another one of your arguments is an interesting piece of moral psychology involving an agent’s explicit deliberative judgments of quality failing to perfectly track the agent’s actual positive regard of “quality for philosophers and their work in some complex way.” While I’m not 100% convinced by your example of how the process might run, I’ll grant that point as well.

Yet, granting these points (which are fairly recondite arguments stemming from skepticism about the reliability of those who are making the judgments about X, Y, Z, etc.)—I think that the Gourmet Report suffers from more obvious problems concerning “bias” and “prejudice.”

The Report takes for granted (--and this would be an egregious form of bias--) that a certain group of philosophers (A, B, C, etc.) are philosophical hotshots and that their work is of great importance. Let us call this group (who are the judges that do the ranking) the “in-group.” It is kind of like junior high. The in-group has the privilege of admitting and excluding people from being apart of the in-group.

It is important to note, here, Leiter’s methodology: He asks in-group members to rank each other; but he does not ask the philosophical population at large for their input (i.e., he does not ask the out-group what they think). The same people appear as the judges, year after year. And they vote for each other, excluding non-ingroup philosophers. This is allegedly the result of “quality.” But in reality it is just the result of “in-group” thinking, popularity, following majority opinion, etc. This feature of the Gourmet Report seems to me so egregiously outrageous, but nobody ever questions or even calls attention to these facts.

Second, the Gourmet Report takes for granted (--again, this is a major form of bias--) that certain branches of philosophy, topics, issues, and problems are “the real” meat and potatoes of true/good/rigorous/ philosophy. What programs are always touted as the best? Answer: Those programs that concentrate in philosophy of logic, language, science, math, etc. These are the usual suspects. Is this not bias? If USF had a graduate program, USF would hardly even get recognized (if at all) unless USF started making a bunch of hires in logic, philosophy of language, etc. (And USF better hire members of the “ingroup.”) In other words: USF better shell out big bucks to get a few members of the in-crowd. Branches of philosophy (issues, problems, concentrations) that fall outside the scope of the interests of the in-crowd are mocked, disregarded, unvalued, etc. without intelligence or good reason. Have you ever noticed that some the highly ranked programs are extraordinarily narrow in faculty competence? Has this ever made you wonder why?

In short, my point is that the Gourmet Report is little more than a popularity report. It tells the public what the “in-crowd” thinks of each other. It marginalizes scores of programs that have no (or almost no) vote / voice in the proceedings. The in-group dictates what areas of philosophy are valuable. Again, think of junior high: Which brands of shoes are cool? What are the “cool kids” doing? Same type of marginalization happens in the Gourmet Report. So, it is a good tool in keeping up with in-crowd thinking. And it is a good indicator of what ingroup members are moving around. (Secret: The rankings change as in-group members move from one department to another. That is the biggest reason for a schools “climbing” in the rankings.)

Nobody seems to care much about the bias implicit in the Gourmet Report because it is written by the in-crowd for the in-crowd. However, it has a prejudicing effect in that the rest of the philosophical world just assumes that its claims are true and unproblematic. After all, professors A, B, and C are hotshots. Questioning their authority is not allowed.

So, I agree with most of your assessment—I just think your complex explanation of the deliberation of in-crowd members misses the most obvious points in which prejudice enters into the game.

Nick Baiamonte, UCR Alumni